Reading Sheeri's MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation, I was baffled at this change of floating point number notation.
However, I was also concerned about the final action taken: using "--ignore-columns" to avoid comparing the FLOAT/DOUBLE types.
The --float-precision option for pt-table-checksum currently only uses ROUND() so as to disregard minor rounding issues. But it can very easily extend to handle the difference in floating point notation. Consider again the problem:
mysql> create table tf(f float); Query OK, 0 rows affected (0.11 sec) mysql> insert into tf values(0.0000958084); Query OK, 1 row affected (0.04 sec) mysql-5.1> select * from tf; +-------------+ | f | +-------------+ | 9.58084e-05 | +-------------+ mysql-5.5> select * from tf; +--------------+ | f | +--------------+ | 0.0000958084 | +--------------+
How can we normalize the notation?
Easily: CAST it as DECIMAL. Consider:
mysql-5.1> SELECT f,ROUND(IF(f BETWEEN -1 AND 1, CAST(f AS DECIMAL(65,30)), f), 10) as fn from tf; +-------------+--------------+ | f | fn | +-------------+--------------+ | 9.58084e-05 | 0.0000958084 | +-------------+--------------+ mysql-5.5> SELECT f,ROUND(IF(f BETWEEN -1 AND 1, CAST(f AS DECIMAL(65,30)), f), 10) as fn from tf; +--------------+--------------+ | f | fn | +--------------+--------------+ | 0.0000958084 | 0.0000958084 | +--------------+--------------+
The normalization works well in both cases; also, taking care to only normalize values in the range [-1, 1].
The change in pt-table-checksum? One line of code:
elsif ( $float_precision && $type =~ m/float|double/ ) {
$result = "ROUND($result, $float_precision)";
}
Turns to
elsif ( $float_precision && $type =~ m/float|double/ ) {
$result = "ROUND(IF($result BETWEEN -1 AND 1, CAST($result AS DECIMAL(65,30)), $result), $float_precision)";
}
I've just submitted a blueprint, but I would think changing one line of code shouldn't wait till next release of Percona Toolkit: one can edit in-place their /usr/bin/pt-table-checksum and proceed to validate their database integrity.

I think it might be better to use FORMAT() instead. But that is just from my memory, I haven't looked at the documentation to remind myself of how it works.
Link | January 24th, 2013 at 4:29 pm
Now if you have extremely small (atomar physics) or extremely large numbers (astronomics) a CAST to DECIMAL may truncate data I think? And in that case after truncation different values may appear as identical values.
Right/wrong?
Link | January 24th, 2013 at 5:59 pm
Peter,
Well, I don't touch large numbers as you can see. With small numbers I'm casting with up to 30 digits after the period.
Moreover, consider I've put this into the float-precision option, whose target in the first place is to do round ups... So yes, there's round up, and it is intentional.
Link | January 24th, 2013 at 6:18 pm
Baron,
FORMAT() seems to work well for small numbers (for larger numbers, which are of no interest in this problem, it adds commas to denote orders of magnitude).
It apparently doesn't have a limit on the number of digits after the floating point (FLOAT and DOUBLE max at 30 digits when casted to DECIMAL) -- but this is just silent: when FORMATting numbers with over 30 digits, result is rounded to 30 digits no matter what.
So in conclusion I don't see the difference between the two.
Link | January 24th, 2013 at 7:21 pm
Interesting - I'd tried FORMAT() and CAST (and CONVERT) by itself, not with rounding, and nothing I'd tried there worked. I appreciate that workaround!
Link | January 24th, 2013 at 7:23 pm
FLOAT goes down to about 1e-38, so DECIMAL(x, 30) can lose precision, even lose _all_ precision. DOUBLE is much further.
What is needed is to FORMAT to some number of _significant_ digits, not some number of _decimal places_.
If the numbers are different on the two machines, they should be flagged as different. Converting to strings, then comparing the strings is the source of the problem.
Unfortunately, there is not a CAST AS BINARY to produce the internal representation. (A trick like that works to bypass character set issues.) Probably 99.99% of MySQL installations use hardware that uses IEEE 754, so I suggest that it is not reasonable to argue about differing representations.
Link | January 24th, 2013 at 10:18 pm
Data Industry news round up, Log Buffer #304 wrote:
[...] Sheeri’s MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation, Shlomi Noach was baffled at this change of floating point number [...]
Link | January 25th, 2013 at 4:08 pm
Note that depending on float precision, this still might not work. I tried to use your workaround and it still showed differences, and I think it's because we're not setting the float precision (and by default there is 'none').
I found that with CAST, when using a float precision of 30 or less on a double field, it works, but not 31 or more:
mysql> select @@version; SELECT
ROUND(IF(double_field BETWEEN -1 AND 1, CAST(double_field AS DECIMAL(65,30)), double_field), 30) as float_p30,
ROUND(IF(double_field BETWEEN -1 AND 1, CAST(double_field AS DECIMAL(65,30)), double_field), 31) as float_p31 from my_table where id IN (val1,val2,val3);
+------------+
| @@version |
+------------+
| 5.1.61-log |
+------------+
1 row in set (0.00 sec)
+----------------------------------+-------------+
| float_p30 | float_p31 |
+----------------------------------+-------------+
| 0.000075488799999999981024841056 | 7.54888e-05 |
| 0.000090369599999999989390145338 | 9.03696e-05 |
| 0.000059410600000000003402360338 | 5.94106e-05 |
+----------------------------------+-------------+
3 rows in set (0.00 sec)
This kind of truncation also happens on MariaDB, but in a different way - it truncates to 10 decimal places.
MariaDB [addons_allizom_org]> SELECT ROUND(IF(movers BETWEEN -1 AND 1, CAST(movers AS DECIMAL(65,30)), movers), 30) as float_p30,ROUND(IF(movers BETWEEN -1 AND 1, CAST(movers AS DECIMAL(65,30)), movers), 31) as float_p31 from addons_allizom_org.personas where id in (4313,33619,66706);
+----------------------------------+--------------+
| float_p30 | float_p31 |
+----------------------------------+--------------+
| 0.000075488800000000000000000000 | 0.0000754888 |
| 0.000090369599999999990000000000 | 0.0000903696 |
| 0.000059410600000000000000000000 | 0.0000594106 |
+----------------------------------+--------------+
3 rows in set (0.00 sec)
MariaDB truncates without any warnings, which is odd.
Link | February 11th, 2013 at 7:22 pm