REPLACE INTO: think twice

The REPLACE [INTO] syntax allows us to INSERT a row into a table, except that if a UNIQUE KEY (including PRIMARY KEY) violation occurs, the old row is deleted prior to the new INSERT, hence no violation.

Sounds very attractive, and has a nice syntax as well: the same syntax as a normal INSERT INTO’s. It certainly has a nicer syntax than INSERT INTO … ON DUPLICATE KEY UPDATE, and it’s certainly shorter than using a SELECT to see if a row exists, then doing either INSERT or UPDATE.

But weak hearted people as myself should be aware of the following: it is a heavyweight solution. It may be just what you were looking for in terms of ease of use, but the fact is that on duplicate keys, a DELETE and INSERT are performed, and this calls for a closer look.

Whenever a row is deleted, all indexes need to be updated, and most importantly the PRIMARY KEY. When a new row is inserted, the same happens. Especially on InnoDB tables (because of their clustered nature), this means much overhead. The restructuring of an index is an expensive operation. Index nodes may need to be merged upon DELETE. Nodes may need to be split due to INSERT. After many REPLACE INTO executions, it is most probable that your index is more fragmented than it would have been, had you used SELECT/UPDATE or INSERT INTO … ON DUPLICATE KEY

Also, there’s the notion of “well, if the row isn’t there, we create it. If it’s there, it simply get’s updated”. This is false. The row doesn’t just get updated, it is completely removed. The problem is, if there’s a PRIMARY KEY on that table, and the REPLACE INTO does not specify a value for the PRIMARY KEY (for example, it’s an AUTO_INCREMENT column), the new row gets a different value, and this may not be what you were looking for in terms of behavior.

Many uses of REPLACE INTO have no intention of changing PRIMARY KEY (or other UNIQUE KEY) values. In that case, it’s better left alone. On a production system I’ve seen, changing REPLACE INTO to INSERT INTO … ON DPLICATE KEY resulted in a ten fold more throughput (measured in queries per second) and a drastic decrease in IO operations and in load average.

26 thoughts on “REPLACE INTO: think twice

  1. Two things…

    REPLACE is completely different from the INSERT … ON DUPLICATE KEY UPDATE … because replace removes and replaces rows complete whereas the no dup key update construct allows you to update values in the existing row.

    If I remember correctly, MySQL will actually re-use the index slot, if possible, so it’s not as inefficient as you might think. In any case, the alternative equivalent to REPLACE is DELETE+INSERT, which is definitely no faster and quite likely slower.

    By the way, REPLACE can actually delete more than one row, this can happen if you also have other UNIQUE constraints. Again, this can be a very useful feature, when used appropriately.

  2. Arjen, thanks,

    With regard to re-using the index slot, consider the following:

    DROP TABLE IF EXISTS `person`;
    CREATE TABLE `person` (
    `person_id` int(11) NOT NULL auto_increment,
    `name` varchar(60) character set utf8 NOT NULL,
    `social_security_number` bigint(20) NOT NULL,
    PRIMARY KEY (`person_id`),
    UNIQUE KEY `social_security_number` (`social_security_number`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

    TRUNCATE TABLE person;
    INSERT INTO person (name, social_security_number) VALUES (‘Alice’, 123456);
    INSERT INTO person (name, social_security_number) VALUES (‘Bob’, 787878);
    REPLACE INTO person (name, social_security_number) VALUES (‘Bob’, 787878);

    Both on InnoDB and on MyISAM, the results are:

    +-----------+-------+------------------------+
    | person_id | name | social_security_number |
    +-----------+-------+------------------------+
    | 1 | Alice | 123456 |
    | 3 | Bob | 787878 |
    +-----------+-------+------------------------+

    If the primary key value changes, I don’t think the slot can be reused. What do you think?

    With regard to having more than one row removed – nice! I’ve never encountered such a situation. But still it seems to me more a hack. I agree that it must be used appropriately.

  3. This has bitten me a couple times, and usually what I really want is the ON DUPLICATE KEY, not the REPLACE INTO. Not for space reusage but for the row with the unique key I wanted to update if it existed.

  4. I would add that the ON DUPLICATE KEY performs the operation much faster then a REPLACE INTO. It’s probably obvious that this true, but it can cut the processing time more then half if you don’t have that many updates to run.

    On just a 256 row database, my page execution times went from 1.2 seconds to .43 seconds. This was doing a REPLACE into on all the rows (running the same sql twice). By updating it to do the ON DUPLICATE, the same script is much faster.

    The database is running some triggers and stored procedures to update a random row scheme. By not having the DELETE/INSERT these triggers are never called. YMMV on the actual speed increase.

    It takes a bit more planning with PHP scripts to account for the unique keys and not include them in the ON DUPLICATE portion, but it’s worth it.

  5. Mike – exactly my point. I have the (unconfirmed) feeling most people are just using REPLACE INTO to do an “insert if not exists or update if exists”, in which case it is not the optimal solution.

    Steve – thanks for sharing

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.