REPLACE INTO: think twice

The REPLACE [INTO] syntax allows us to INSERT a row into a table, except that if a UNIQUE KEY (including PRIMARY KEY) violation occurs, the old row is deleted prior to the new INSERT, hence no violation.

Sounds very attractive, and has a nice syntax as well: the same syntax as a normal INSERT INTO’s. It certainly has a nicer syntax than INSERT INTO … ON DUPLICATE KEY UPDATE, and it’s certainly shorter than using a SELECT to see if a row exists, then doing either INSERT or UPDATE.

But weak hearted people as myself should be aware of the following: it is a heavyweight solution. It may be just what you were looking for in terms of ease of use, but the fact is that on duplicate keys, a DELETE and INSERT are performed, and this calls for a closer look.

Whenever a row is deleted, all indexes need to be updated, and most importantly the PRIMARY KEY. When a new row is inserted, the same happens. Especially on InnoDB tables (because of their clustered nature), this means much overhead. The restructuring of an index is an expensive operation. Index nodes may need to be merged upon DELETE. Nodes may need to be split due to INSERT. After many REPLACE INTO executions, it is most probable that your index is more fragmented than it would have been, had you used SELECT/UPDATE or INSERT INTO … ON DUPLICATE KEY

Also, there’s the notion of “well, if the row isn’t there, we create it. If it’s there, it simply get’s updated”. This is false. The row doesn’t just get updated, it is completely removed. The problem is, if there’s a PRIMARY KEY on that table, and the REPLACE INTO does not specify a value for the PRIMARY KEY (for example, it’s an AUTO_INCREMENT column), the new row gets a different value, and this may not be what you were looking for in terms of behavior.

Many uses of REPLACE INTO have no intention of changing PRIMARY KEY (or other UNIQUE KEY) values. In that case, it’s better left alone. On a production system I’ve seen, changing REPLACE INTO to INSERT INTO … ON DPLICATE KEY resulted in a ten fold more throughput (measured in queries per second) and a drastic decrease in IO operations and in load average.

26 thoughts on “REPLACE INTO: think twice

  1. It is worth noting, however, that performing an INSERT/ON DUPLICATE KEY UPDATE, because of its dual nature, will not give predictable results when requesting the “last insert ID” from the DB engine. The last insert ID is only updated when the engine performs an update.

  2. The two statements have different impact for INNODb table.
    The REPLACE INTO acts as DELETE/INSERT for duplicates.
    The INSERT ON DUPLIACTE UPDATE is true update.

    If you have a child table defined with ā€œon delete CASCADEā€, the REPLACE INTO will delete the child record too.

    I will use INSERT ON DUPLICATE rather than the REPLACE INTO for the above reason.

  3. i think you’re missing the point of the replace query. It is not meant to substitute insert on duplicate key update, but it is the only solution if you want to do bulk updates coming from a subquery. insert on duplicate key update works well if you want to update a few rows at a time, not sure how u can use it to update thousands of rows w/ a single subquery.

  4. @ricardo,

    Agreed on bulk updates.

    Nevertheless it is common to see people using REPLACE INTO when they could have used INSERT INTO … ON DUPLICATE KEY UPDATE for single rows.

  5. I find the syntax for ON DUPLICATE KEY UPDATE to be quite obnoxious. I should be able to write a single SQL statement that either inserts or updates.

    Lets say I have an object with some properties. I want to store those properties in a table. In my application I have to write two queries: One query to INSERT the row to the table, and another query to UPDATE that same row. To make matters worse, the syntax for ‘UPDATE’ is very arcane.

    The INSERT INTO … ON DUPLICATE KEY UPDATE really does not help here much because I still have to *write* both the INSERT statement and the UPDATE statement even though they are in the same query now, for example:

    INSERT INTO objects (val1, val2, val3, val4) VALUES (1, 2, 3, 4) ON DUPLICATE KEY UPDATE val1=1, val2=2, val3=3, val4=4;

    I should not *need* to specify val1=1, val2=2, val3=3, val4=4 at the end of the query. I want to update ALL of the columns, why isn’t there an option for that?

    REPLACE INTO also does not help me? Why? People seem to overlook the fact that a “relational database” is *supposed* to have foreign keys and the restraints that go with them! If you use REPLACE INTO, it first DELETES the existing record (and all child records), and then re-inserts a new copy of the record. How useless is that? At least give me an option to disable the foreign key constraints when using REPLACE, or give me a usable ON DUPLICATE KEY UPDATE syntax.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.