Re: MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation

Reading Sheeri’s MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation, I was baffled at this change of floating point number notation.

However, I was also concerned about the final action taken: using “–ignore-columns” to avoid comparing the FLOAT/DOUBLE types.

The –float-precision option for pt-table-checksum currently only uses ROUND() so as to disregard minor rounding issues. But it can very easily extend to handle the difference in floating point notation. Consider again the problem:

mysql> create table tf(f float);
Query OK, 0 rows affected (0.11 sec)

mysql> insert into tf values(0.0000958084);
Query OK, 1 row affected (0.04 sec)

mysql-5.1> select * from tf;
+-------------+
| f           |
+-------------+
| 9.58084e-05 |
+-------------+

mysql-5.5> select * from tf;
+--------------+
| f            |
+--------------+
| 0.0000958084 |
+--------------+

How can we normalize the notation?

Easily: CAST it as DECIMAL. Consider:

mysql-5.1> SELECT f,ROUND(IF(f BETWEEN -1 AND 1, CAST(f AS DECIMAL(65,30)), f), 10) as fn from tf;
+-------------+--------------+
| f           | fn           |
+-------------+--------------+
| 9.58084e-05 | 0.0000958084 |
+-------------+--------------+

mysql-5.5> SELECT f,ROUND(IF(f BETWEEN -1 AND 1, CAST(f AS DECIMAL(65,30)), f), 10) as fn from tf;
+--------------+--------------+
| f            | fn           |
+--------------+--------------+
| 0.0000958084 | 0.0000958084 |
+--------------+--------------+

The normalization works well in both cases; also, taking care to only normalize values in the range [-1, 1].

The change in pt-table-checksum? One line of code:

         elsif ( $float_precision && $type =~ m/float|double/ ) {
            $result = "ROUND($result, $float_precision)";
         }

Turns to

         elsif ( $float_precision && $type =~ m/float|double/ ) {
            $result = "ROUND(IF($result BETWEEN -1 AND 1, CAST($result AS DECIMAL(65,30)), $result), $float_precision)";
         }

I’ve just submitted a blueprint, but I would think changing one line of code shouldn’t wait till next release of Percona Toolkit: one can edit in-place their /usr/bin/pt-table-checksum and proceed to validate their database integrity.

10 thoughts on “Re: MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation

  1. FLOAT goes down to about 1e-38, so DECIMAL(x, 30) can lose precision, even lose _all_ precision. DOUBLE is much further.

    What is needed is to FORMAT to some number of _significant_ digits, not some number of _decimal places_.

    If the numbers are different on the two machines, they should be flagged as different. Converting to strings, then comparing the strings is the source of the problem.

    Unfortunately, there is not a CAST AS BINARY to produce the internal representation. (A trick like that works to bypass character set issues.) Probably 99.99% of MySQL installations use hardware that uses IEEE 754, so I suggest that it is not reasonable to argue about differing representations.

  2. Note that depending on float precision, this still might not work. I tried to use your workaround and it still showed differences, and I think it’s because we’re not setting the float precision (and by default there is ‘none’).

    I found that with CAST, when using a float precision of 30 or less on a double field, it works, but not 31 or more:

    mysql> select @@version; SELECT
    ROUND(IF(double_field BETWEEN -1 AND 1, CAST(double_field AS DECIMAL(65,30)), double_field), 30) as float_p30,
    ROUND(IF(double_field BETWEEN -1 AND 1, CAST(double_field AS DECIMAL(65,30)), double_field), 31) as float_p31 from my_table where id IN (val1,val2,val3);
    +————+
    | @@version |
    +————+
    | 5.1.61-log |
    +————+
    1 row in set (0.00 sec)

    +———————————-+————-+
    | float_p30 | float_p31 |
    +———————————-+————-+
    | 0.000075488799999999981024841056 | 7.54888e-05 |
    | 0.000090369599999999989390145338 | 9.03696e-05 |
    | 0.000059410600000000003402360338 | 5.94106e-05 |
    +———————————-+————-+
    3 rows in set (0.00 sec)

    This kind of truncation also happens on MariaDB, but in a different way – it truncates to 10 decimal places.

    MariaDB [addons_allizom_org]> SELECT ROUND(IF(movers BETWEEN -1 AND 1, CAST(movers AS DECIMAL(65,30)), movers), 30) as float_p30,ROUND(IF(movers BETWEEN -1 AND 1, CAST(movers AS DECIMAL(65,30)), movers), 31) as float_p31 from addons_allizom_org.personas where id in (4313,33619,66706);
    +———————————-+————–+
    | float_p30 | float_p31 |
    +———————————-+————–+
    | 0.000075488800000000000000000000 | 0.0000754888 |
    | 0.000090369599999999990000000000 | 0.0000903696 |
    | 0.000059410600000000000000000000 | 0.0000594106 |
    +———————————-+————–+
    3 rows in set (0.00 sec)

    MariaDB truncates without any warnings, which is odd.

  3. Currently testing a migration to 5.6. Ran a checksum and was smacked in the face by a wall of changes.

    After doing dumps+dumps of random tables, it looks like it’s this issue.

    @shlomi: thanks for a suitable work around.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.