Comments on: DELETE, don’t INSERT https://shlomi-noach.github.io/blog/mysql/delete-dont-insert Blog by Shlomi Noach Mon, 23 Jul 2012 13:50:15 +0000 hourly 1 https://wordpress.org/?v=5.3.3 By: Eelko de Vos https://shlomi-noach.github.io/blog/mysql/delete-dont-insert/comment-page-1#comment-109154 Mon, 23 Jul 2012 13:50:15 +0000 https://shlomi-noach.github.io/blog/?p=5008#comment-109154 Due to the fact that large tables are usually continually used by a live system, I found that lock-delays of a few seconds or more (minutes, hours) are unacceptable. Therefore I took a different approach to this problem.

I’ve created a script that:
– creates a new table MY_TABLE_NEW
– creates a merge-table MY_TABLE_MERGE combining MY_TABLE_NEW and MY_TABLE whereby inserts are done in MY_TABLE_NEW
– renames MY_TABLE into MY_TABLE_OLD and at the same time MY_TABLE_MERGE into MY_TABLE

This effectively creates a merge table with all the data, still accessible through the name MY_TABLE where all new records are moved into MY_TABLE_NEW – which is optimized of course as it is empty.

Then the script slowly starts moving records from MY_TABLE to MY_TABLE. It takes a record (or fifty), deletes them and inserts them. That effectively moves them from MY_TABLE_OLD to MY_TABLE_NEW. That slowly builds a MY_TABLE_NEW which is optimized, while the system continues to work.

The chances of the system needing a specific record while it’s being deleted and inserted are very, very small with the tables I’ve used. Of course that depends highly on the way your system processes data. But for me, it worked flawlessly.

I’ve used this script dozens of times on production systems. It’s coping with interruptions, can be run in parallel on several tables at the same time and has virtually no downsides.

Of course you cannot merge merge-tables, so when you already have merge-tables, this solution may not be for you.

If you’d like to take a look at the script, let me know in a comment below.

Cheers,
Eelko de Vos

]]>
By: Aaron Brown https://shlomi-noach.github.io/blog/mysql/delete-dont-insert/comment-page-1#comment-103710 Thu, 28 Jun 2012 01:29:36 +0000 https://shlomi-noach.github.io/blog/?p=5008#comment-103710 @Rick I agree, if you know ahead of time that you are going to purging large amounts of data regularly, using partitioned tables is an ideal solution.

]]>
By: Rick James https://shlomi-noach.github.io/blog/mysql/delete-dont-insert/comment-page-1#comment-103685 Wed, 27 Jun 2012 22:14:33 +0000 https://shlomi-noach.github.io/blog/?p=5008#comment-103685 My discussion on the problem:

http://mysql.rjweb.org/doc.php/deletebig

For time-based purging, PARTITION by “range” is an excellent solution to large DELETEs.

]]>
By: shlomi https://shlomi-noach.github.io/blog/mysql/delete-dont-insert/comment-page-1#comment-103564 Wed, 27 Jun 2012 15:56:03 +0000 https://shlomi-noach.github.io/blog/?p=5008#comment-103564 @Art + Aaron,
Nor did I mean to suggest Aaron’s solution was universally inferior.
There are always cases where the standard solution to a problem would perform worse than an unconventional ad-hoc one.
For that matter, I think Aaron’s trick is cool!

]]>
By: Aaron Brown https://shlomi-noach.github.io/blog/mysql/delete-dont-insert/comment-page-1#comment-103545 Wed, 27 Jun 2012 15:13:11 +0000 https://shlomi-noach.github.io/blog/?p=5008#comment-103545 Hi Shlomi,

Thanks for responding to my post. I agree with all the points that you have made above. I didn’t intend to imply that the INSERT technique was universally better, only that it was faster in some cases, which was an important factor to me. It certainly has a number of limitations, particularly if you have foreign key relationships or cannot stop writes to the table. My specific use was to get rid of about 50MM rows in over a dozen tables with only soft foreign key relationships (Rails application) and also have an easy rollback in the event that something broke. By doing this, you end up with a complete copy of your table which you can drop at your leisure or move back into place if something goes wrong.

@Art – I brought up the stall issue in my post along with problems caused by ext filesystems. I can’t see this technique being very effective on a live server unless you can guarantee no writes to the table.

]]>
By: Art van Scheppingen https://shlomi-noach.github.io/blog/mysql/delete-dont-insert/comment-page-1#comment-103511 Wed, 27 Jun 2012 12:59:51 +0000 https://shlomi-noach.github.io/blog/?p=5008#comment-103511 I largely agree with your statements, but we do have a corner case where the lockdown isn’t an issue: the application only writes to the tables at a set interval, so we were able to utilize Aaron’s suggested solution a couple of times. In that particular case we needed to remove 95% of all records and deleting them in a loop would have taken us days instead of minutes.

At the same I do want to add that Aaron’s suggestion also has a problem on high concurrent MySQL systems where the create/drop will certainly cause stalls in the LRU.

]]>