REPLACE INTO: think twice

The REPLACE [INTO] syntax allows us to INSERT a row into a table, except that if a UNIQUE KEY (including PRIMARY KEY) violation occurs, the old row is deleted prior to the new INSERT, hence no violation.

Sounds very attractive, and has a nice syntax as well: the same syntax as a normal INSERT INTO’s. It certainly has a nicer syntax than INSERT INTO … ON DUPLICATE KEY UPDATE, and it’s certainly shorter than using a SELECT to see if a row exists, then doing either INSERT or UPDATE.

But weak hearted people as myself should be aware of the following: it is a heavyweight solution. It may be just what you were looking for in terms of ease of use, but the fact is that on duplicate keys, a DELETE and INSERT are performed, and this calls for a closer look.

Whenever a row is deleted, all indexes need to be updated, and most importantly the PRIMARY KEY. When a new row is inserted, the same happens. Especially on InnoDB tables (because of their clustered nature), this means much overhead. The restructuring of an index is an expensive operation. Index nodes may need to be merged upon DELETE. Nodes may need to be split due to INSERT. After many REPLACE INTO executions, it is most probable that your index is more fragmented than it would have been, had you used SELECT/UPDATE or INSERT INTO … ON DUPLICATE KEY

Also, there’s the notion of “well, if the row isn’t there, we create it. If it’s there, it simply get’s updated”. This is false. The row doesn’t just get updated, it is completely removed. The problem is, if there’s a PRIMARY KEY on that table, and the REPLACE INTO does not specify a value for the PRIMARY KEY (for example, it’s an AUTO_INCREMENT column), the new row gets a different value, and this may not be what you were looking for in terms of behavior.

Many uses of REPLACE INTO have no intention of changing PRIMARY KEY (or other UNIQUE KEY) values. In that case, it’s better left alone. On a production system I’ve seen, changing REPLACE INTO to INSERT INTO … ON DPLICATE KEY resulted in a ten fold more throughput (measured in queries per second) and a drastic decrease in IO operations and in load average.

26
Leave a Reply

avatar
25 Comment threads
1 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
18 Comment authors
Shlomi NoachSacTiwThat Oz Guyparminder SinghSunset Recent comment authors

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  Subscribe  
Notify of
Arjen Lentz
Guest

Two things… REPLACE is completely different from the INSERT … ON DUPLICATE KEY UPDATE … because replace removes and replaces rows complete whereas the no dup key update construct allows you to update values in the existing row. If I remember correctly, MySQL will actually re-use the index slot, if possible, so it’s not as inefficient as you might think. In any case, the alternative equivalent to REPLACE is DELETE+INSERT, which is definitely no faster and quite likely slower. By the way, REPLACE can actually delete more than one row, this can happen if you also have other UNIQUE constraints.… Read more »

mike
Guest

This has bitten me a couple times, and usually what I really want is the ON DUPLICATE KEY, not the REPLACE INTO. Not for space reusage but for the row with the unique key I wanted to update if it existed.

Steve W
Guest
Steve W

I would add that the ON DUPLICATE KEY performs the operation much faster then a REPLACE INTO. It’s probably obvious that this true, but it can cut the processing time more then half if you don’t have that many updates to run. On just a 256 row database, my page execution times went from 1.2 seconds to .43 seconds. This was doing a REPLACE into on all the rows (running the same sql twice). By updating it to do the ON DUPLICATE, the same script is much faster. The database is running some triggers and stored procedures to update a… Read more »

Brian
Guest
Brian

It is worth noting, however, that performing an INSERT/ON DUPLICATE KEY UPDATE, because of its dual nature, will not give predictable results when requesting the “last insert ID” from the DB engine. The last insert ID is only updated when the engine performs an update.

saviola
Guest

The two statements have different impact for INNODb table.
The REPLACE INTO acts as DELETE/INSERT for duplicates.
The INSERT ON DUPLIACTE UPDATE is true update.

If you have a child table defined with “on delete CASCADE”, the REPLACE INTO will delete the child record too.

I will use INSERT ON DUPLICATE rather than the REPLACE INTO for the above reason.

ricardo
Guest
ricardo

i think you’re missing the point of the replace query. It is not meant to substitute insert on duplicate key update, but it is the only solution if you want to do bulk updates coming from a subquery. insert on duplicate key update works well if you want to update a few rows at a time, not sure how u can use it to update thousands of rows w/ a single subquery.

Jeremy
Guest
Jeremy

I find the syntax for ON DUPLICATE KEY UPDATE to be quite obnoxious. I should be able to write a single SQL statement that either inserts or updates. Lets say I have an object with some properties. I want to store those properties in a table. In my application I have to write two queries: One query to INSERT the row to the table, and another query to UPDATE that same row. To make matters worse, the syntax for ‘UPDATE’ is very arcane. The INSERT INTO … ON DUPLICATE KEY UPDATE really does not help here much because I still… Read more »

Z
Guest
Z

Jeremy: You can easy reuse values in the UPDATE part from the INSERT.

INSERT INTO objects (val1, val2, val3, val4) VALUES (1, 2, 3, 4) ON DUPLICATE KEY UPDATE val1=VALUES(val1), val2=VALUES(val2), val3=VALUES(val3), val4=VALUES(val4);

http://dev.mysql.com/doc/refman/5.1/en/miscellaneous-functions.html#function_values

Some Dude
Guest
Some Dude

@Z

How is that any better? It’s still a lot of extra typing.

Jeremy
Guest
Jeremy

@Z,

I know about the val1=VALUES(val1), val2=VALUES(val2), val3=VALUES(val3), val4=VALUES(val4); option, however this increases the size of the SQL dramatically. It’s not uncommon for there to be tables with 100+ columns, updating all of those columns would require my application to generate SQL statements that are many times longer, which slows down the query generation, and increases the time it takes to parse the query on the other end. It’s not necessarily extra ‘typing’, but it’s just inefficient all the way around, especially when you need to update many rows at a time.

tandu
Guest

Nice article. REPLACE INTO is dangerous, as it can actually cause you to lose data unpredictably. This has happened to me before. Didn’t know about INSERT INTO .. ON DUPLICATE KEY UPDATE before this article. Seems okay, but I don’t know. I like to update and then do an insert if no rows were affected. Silly, but not dangerous.

Renowned Media
Guest

I vote for MySQL implementing this UPSERT command… There are a lot of things in MySQL which don’t follow the ANSI SQL standard!

Tony
Guest
Tony

Newly onboard as the lead developer for my company, I have already run into this issue BIGTIME. The previous developer(s) were quite fond of the “REPLACE INTO” syntax; the company’s web app was littered with it. Many problems plagued the system, the most common was the “random” loss of data. As it turned out, the database has been in use for many years; some tables holding more than 1M records, as well as keyed with composites. Inevitably, someone would complain to the mother company’s I.T. dept about the app being slow. Mother company’s I.T. would just simply restart the MySQL… Read more »

rsandwick3
Guest
rsandwick3

One of the most useful features of ON DUPLICATE KEY UPDATE is that not all columns need to be specified in the UPDATE clause. This allows adding new rows which either are fully qualified or use some or all column defaults while only modifying the desired subset of columns when the key is already present. REPLACE INTO fails in this respect — once the row is deleted, only the values in the new definition will be populated, so you’d better have somehow managed to define everything you don’t want unset/defaulted for existing rows, in which case you’re basically reinserting the… Read more »

Sunset
Guest

Thanks guys, yesterday I spent 3 hours trying to get the best method of INSERT INTO/REPLACE INTO working for my site tweet “approver” where we scan for tweets and manually approve the good ones that can be displayed on our site. I eventually settled for a bit of a mix of on duplicate col=VALUES(col) and col2=col2 as before we save the tweets we do some preg_replace stuff on it so no need to update it with the raw tweet again: $sql = “INSERT INTO tweets (id, tweet, date, who, approved) VALUES (\”$twitter_tweet_id\”, \”{$this->CI->db->escape($tweet_val->title)}\”, \”{$this->CI->db->escape($tweet_val->pubDate)}\”, \”{$this->CI->db->escape($tweet_val->author)}\”, ‘no’) ON DUPLICATE KEY UPDATE… Read more »

trackback

[…] had been using mysql’s replace into syntax until I read this article https://shlomi-noach.github.io/blog/mysql/replace-into-think-twice which makes a lot of […]

parminder Singh
Guest
parminder Singh

REPLACE INTO is very good i often use this But the problem is that if you have used foreign key constraints On delete cascade than this will be big headache for you , actually i have already faced this problem , That’s why i sharing with you guys, coz may be it will be helpful for some body

That Oz Guy
Guest
That Oz Guy

I’ve just hit this too – I was getting over 100ms to execute the REPLACE INTO and using a stored procedure with the obvious “SELECT IFNULL(pk) INTO… IF found THEN UPDATE … ELSE INSERT …” was more than 10x faster. If you do use it and speed matters, measure it.

SacTiw
Guest
SacTiw

Quick question:
REPLACE INTO will be an atomic operation right? That is, for a row “delete and insert” will executed as a part of a transaction, right?

Shlomi Noach
Guest
Shlomi Noach

@SacTiw correct.