Re: MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation

Reading Sheeri’s MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation, I was baffled at this change of floating point number notation.

However, I was also concerned about the final action taken: using “–ignore-columns” to avoid comparing the FLOAT/DOUBLE types.

The –float-precision option for pt-table-checksum currently only uses ROUND() so as to disregard minor rounding issues. But it can very easily extend to handle the difference in floating point notation. Consider again the problem:

mysql> create table tf(f float);
Query OK, 0 rows affected (0.11 sec)

mysql> insert into tf values(0.0000958084);
Query OK, 1 row affected (0.04 sec)

mysql-5.1> select * from tf;
+-------------+
| f           |
+-------------+
| 9.58084e-05 |
+-------------+

mysql-5.5> select * from tf;
+--------------+
| f            |
+--------------+
| 0.0000958084 |
+--------------+

How can we normalize the notation?

Easily: CAST it as DECIMAL. Consider:

mysql-5.1> SELECT f,ROUND(IF(f BETWEEN -1 AND 1, CAST(f AS DECIMAL(65,30)), f), 10) as fn from tf;
+-------------+--------------+
| f           | fn           |
+-------------+--------------+
| 9.58084e-05 | 0.0000958084 |
+-------------+--------------+

mysql-5.5> SELECT f,ROUND(IF(f BETWEEN -1 AND 1, CAST(f AS DECIMAL(65,30)), f), 10) as fn from tf;
+--------------+--------------+
| f            | fn           |
+--------------+--------------+
| 0.0000958084 | 0.0000958084 |
+--------------+--------------+

The normalization works well in both cases; also, taking care to only normalize values in the range [-1, 1].

The change in pt-table-checksum? One line of code:

         elsif ( $float_precision && $type =~ m/float|double/ ) {
            $result = "ROUND($result, $float_precision)";
         }

Turns to

         elsif ( $float_precision && $type =~ m/float|double/ ) {
            $result = "ROUND(IF($result BETWEEN -1 AND 1, CAST($result AS DECIMAL(65,30)), $result), $float_precision)";
         }

I’ve just submitted a blueprint, but I would think changing one line of code shouldn’t wait till next release of Percona Toolkit: one can edit in-place their /usr/bin/pt-table-checksum and proceed to validate their database integrity.

10
Leave a Reply

avatar
9 Comment threads
1 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
7 Comment authors
AnonymousAlfie JohnSheeriRick Jamesshlomi Recent comment authors

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  Subscribe  
Notify of
Baron
Guest
Baron

I think it might be better to use FORMAT() instead. But that is just from my memory, I haven’t looked at the documentation to remind myself of how it works.

Peter Laursen
Guest

Now if you have extremely small (atomar physics) or extremely large numbers (astronomics) a CAST to DECIMAL may truncate data I think? And in that case after truncation different values may appear as identical values.

Right/wrong?

Sheeri
Guest

Interesting – I’d tried FORMAT() and CAST (and CONVERT) by itself, not with rounding, and nothing I’d tried there worked. I appreciate that workaround!

Rick James
Guest

FLOAT goes down to about 1e-38, so DECIMAL(x, 30) can lose precision, even lose _all_ precision. DOUBLE is much further. What is needed is to FORMAT to some number of _significant_ digits, not some number of _decimal places_. If the numbers are different on the two machines, they should be flagged as different. Converting to strings, then comparing the strings is the source of the problem. Unfortunately, there is not a CAST AS BINARY to produce the internal representation. (A trick like that works to bypass character set issues.) Probably 99.99% of MySQL installations use hardware that uses IEEE 754,… Read more »

trackback

[…] Sheeri’s MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation, Shlomi Noach was baffled at this change of floating point number […]

Sheeri
Guest

Note that depending on float precision, this still might not work. I tried to use your workaround and it still showed differences, and I think it’s because we’re not setting the float precision (and by default there is ‘none’). I found that with CAST, when using a float precision of 30 or less on a double field, it works, but not 31 or more: mysql> select @@version; SELECT ROUND(IF(double_field BETWEEN -1 AND 1, CAST(double_field AS DECIMAL(65,30)), double_field), 30) as float_p30, ROUND(IF(double_field BETWEEN -1 AND 1, CAST(double_field AS DECIMAL(65,30)), double_field), 31) as float_p31 from my_table where id IN (val1,val2,val3); +————+ |… Read more »

Alfie John
Guest

Currently testing a migration to 5.6. Ran a checksum and was smacked in the face by a wall of changes.

After doing dumps+dumps of random tables, it looks like it’s this issue.

@shlomi: thanks for a suitable work around.

Anonymous
Guest
Anonymous

This bug still seems to exist 1 year later …