Re: MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation

January 24, 2013

Reading Sheeri's MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation, I was baffled at this change of floating point number notation.

However, I was also concerned about the final action taken: using "--ignore-columns" to avoid comparing the FLOAT/DOUBLE types.

The --float-precision option for pt-table-checksum currently only uses ROUND() so as to disregard minor rounding issues. But it can very easily extend to handle the difference in floating point notation. Consider again the problem:

mysql> create table tf(f float);
Query OK, 0 rows affected (0.11 sec)

mysql> insert into tf values(0.0000958084);
Query OK, 1 row affected (0.04 sec)

mysql-5.1> select * from tf;
| f           |
| 9.58084e-05 |

mysql-5.5> select * from tf;
| f            |
| 0.0000958084 |

How can we normalize the notation?

Easily: CAST it as DECIMAL. Consider:

mysql-5.1> SELECT f,ROUND(IF(f BETWEEN -1 AND 1, CAST(f AS DECIMAL(65,30)), f), 10) as fn from tf;
| f           | fn           |
| 9.58084e-05 | 0.0000958084 |

mysql-5.5> SELECT f,ROUND(IF(f BETWEEN -1 AND 1, CAST(f AS DECIMAL(65,30)), f), 10) as fn from tf;
| f            | fn           |
| 0.0000958084 | 0.0000958084 |

The normalization works well in both cases; also, taking care to only normalize values in the range [-1, 1].

The change in pt-table-checksum? One line of code:

         elsif ( $float_precision && $type =~ m/float|double/ ) {
            $result = "ROUND($result, $float_precision)";

Turns to

         elsif ( $float_precision && $type =~ m/float|double/ ) {
            $result = "ROUND(IF($result BETWEEN -1 AND 1, CAST($result AS DECIMAL(65,30)), $result), $float_precision)";

I've just submitted a blueprint, but I would think changing one line of code shouldn't wait till next release of Percona Toolkit: one can edit in-place their /usr/bin/pt-table-checksum and proceed to validate their database integrity.

  • Baron

    I think it might be better to use FORMAT() instead. But that is just from my memory, I haven't looked at the documentation to remind myself of how it works.

  • Now if you have extremely small (atomar physics) or extremely large numbers (astronomics) a CAST to DECIMAL may truncate data I think? And in that case after truncation different values may appear as identical values.


  • Peter,
    Well, I don't touch large numbers as you can see. With small numbers I'm casting with up to 30 digits after the period.

    Moreover, consider I've put this into the float-precision option, whose target in the first place is to do round ups... So yes, there's round up, and it is intentional.

  • Baron,

    FORMAT() seems to work well for small numbers (for larger numbers, which are of no interest in this problem, it adds commas to denote orders of magnitude).
    It apparently doesn't have a limit on the number of digits after the floating point (FLOAT and DOUBLE max at 30 digits when casted to DECIMAL) -- but this is just silent: when FORMATting numbers with over 30 digits, result is rounded to 30 digits no matter what.
    So in conclusion I don't see the difference between the two.

  • Interesting - I'd tried FORMAT() and CAST (and CONVERT) by itself, not with rounding, and nothing I'd tried there worked. I appreciate that workaround!

  • FLOAT goes down to about 1e-38, so DECIMAL(x, 30) can lose precision, even lose _all_ precision. DOUBLE is much further.

    What is needed is to FORMAT to some number of _significant_ digits, not some number of _decimal places_.

    If the numbers are different on the two machines, they should be flagged as different. Converting to strings, then comparing the strings is the source of the problem.

    Unfortunately, there is not a CAST AS BINARY to produce the internal representation. (A trick like that works to bypass character set issues.) Probably 99.99% of MySQL installations use hardware that uses IEEE 754, so I suggest that it is not reasonable to argue about differing representations.

  • Pingback: Data Industry news round up, Log Buffer #304()

  • Note that depending on float precision, this still might not work. I tried to use your workaround and it still showed differences, and I think it's because we're not setting the float precision (and by default there is 'none').

    I found that with CAST, when using a float precision of 30 or less on a double field, it works, but not 31 or more:

    mysql> select @@version; SELECT
    ROUND(IF(double_field BETWEEN -1 AND 1, CAST(double_field AS DECIMAL(65,30)), double_field), 30) as float_p30,
    ROUND(IF(double_field BETWEEN -1 AND 1, CAST(double_field AS DECIMAL(65,30)), double_field), 31) as float_p31 from my_table where id IN (val1,val2,val3);
    | @@version |
    | 5.1.61-log |
    1 row in set (0.00 sec)

    | float_p30 | float_p31 |
    | 0.000075488799999999981024841056 | 7.54888e-05 |
    | 0.000090369599999999989390145338 | 9.03696e-05 |
    | 0.000059410600000000003402360338 | 5.94106e-05 |
    3 rows in set (0.00 sec)

    This kind of truncation also happens on MariaDB, but in a different way - it truncates to 10 decimal places.

    MariaDB [addons_allizom_org]> SELECT ROUND(IF(movers BETWEEN -1 AND 1, CAST(movers AS DECIMAL(65,30)), movers), 30) as float_p30,ROUND(IF(movers BETWEEN -1 AND 1, CAST(movers AS DECIMAL(65,30)), movers), 31) as float_p31 from addons_allizom_org.personas where id in (4313,33619,66706);
    | float_p30 | float_p31 |
    | 0.000075488800000000000000000000 | 0.0000754888 |
    | 0.000090369599999999990000000000 | 0.0000903696 |
    | 0.000059410600000000000000000000 | 0.0000594106 |
    3 rows in set (0.00 sec)

    MariaDB truncates without any warnings, which is odd.

  • Currently testing a migration to 5.6. Ran a checksum and was smacked in the face by a wall of changes.

    After doing dumps+dumps of random tables, it looks like it's this issue.

    @shlomi: thanks for a suitable work around.

  • Anonymous

    This bug still seems to exist 1 year later ...

Powered by Wordpress and MySQL. Theme by