Re: MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation

Reading Sheeri’s MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation, I was baffled at this change of floating point number notation.

However, I was also concerned about the final action taken: using “–ignore-columns” to avoid comparing the FLOAT/DOUBLE types.

The –float-precision option for pt-table-checksum currently only uses ROUND() so as to disregard minor rounding issues. But it can very easily extend to handle the difference in floating point notation. Consider again the problem:

mysql> create table tf(f float);
Query OK, 0 rows affected (0.11 sec)

mysql> insert into tf values(0.0000958084);
Query OK, 1 row affected (0.04 sec)

mysql-5.1> select * from tf;
+-------------+
| f           |
+-------------+
| 9.58084e-05 |
+-------------+

mysql-5.5> select * from tf;
+--------------+
| f            |
+--------------+
| 0.0000958084 |
+--------------+

How can we normalize the notation?

Easily: CAST it as DECIMAL. Consider:

mysql-5.1> SELECT f,ROUND(IF(f BETWEEN -1 AND 1, CAST(f AS DECIMAL(65,30)), f), 10) as fn from tf;
+-------------+--------------+
| f           | fn           |
+-------------+--------------+
| 9.58084e-05 | 0.0000958084 |
+-------------+--------------+

mysql-5.5> SELECT f,ROUND(IF(f BETWEEN -1 AND 1, CAST(f AS DECIMAL(65,30)), f), 10) as fn from tf;
+--------------+--------------+
| f            | fn           |
+--------------+--------------+
| 0.0000958084 | 0.0000958084 |
+--------------+--------------+

The normalization works well in both cases; also, taking care to only normalize values in the range [-1, 1].

The change in pt-table-checksum? One line of code:

         elsif ( $float_precision && $type =~ m/float|double/ ) {
            $result = "ROUND($result, $float_precision)";
         }

Turns to

         elsif ( $float_precision && $type =~ m/float|double/ ) {
            $result = "ROUND(IF($result BETWEEN -1 AND 1, CAST($result AS DECIMAL(65,30)), $result), $float_precision)";
         }

I’ve just submitted a blueprint, but I would think changing one line of code shouldn’t wait till next release of Percona Toolkit: one can edit in-place their /usr/bin/pt-table-checksum and proceed to validate their database integrity.

10 thoughts on “Re: MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation

  1. I think it might be better to use FORMAT() instead. But that is just from my memory, I haven’t looked at the documentation to remind myself of how it works.

  2. Now if you have extremely small (atomar physics) or extremely large numbers (astronomics) a CAST to DECIMAL may truncate data I think? And in that case after truncation different values may appear as identical values.

    Right/wrong?

  3. Peter,
    Well, I don’t touch large numbers as you can see. With small numbers I’m casting with up to 30 digits after the period.

    Moreover, consider I’ve put this into the float-precision option, whose target in the first place is to do round ups… So yes, there’s round up, and it is intentional.

  4. Baron,

    FORMAT() seems to work well for small numbers (for larger numbers, which are of no interest in this problem, it adds commas to denote orders of magnitude).
    It apparently doesn’t have a limit on the number of digits after the floating point (FLOAT and DOUBLE max at 30 digits when casted to DECIMAL) — but this is just silent: when FORMATting numbers with over 30 digits, result is rounded to 30 digits no matter what.
    So in conclusion I don’t see the difference between the two.

  5. Interesting – I’d tried FORMAT() and CAST (and CONVERT) by itself, not with rounding, and nothing I’d tried there worked. I appreciate that workaround!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.