Had a case recently where I had to merge data from two identically structured tables containing nearly identical data.
“Nearly identical” meaning most table data is identical in both; sometimes a row is missing from one of the tables; sometimes same row (according to PK) appears in both, but some columns are NULL is one tables (while others can be NULL in the second).
Otherwise no contradicting data: it was not possible for some data to be “3” in one table and “4” in the other.
How do you create a merge of the tables, such that all missing rows are completed, and NULLs replaced by actual values when possible?
pt-table-sync comes to mind: one can do a bidirectional syncing of two tables, and actually stating how to resolve ambiguities (like “greater value wins”). Very powerful! An example would be:
pt-table-sync --bidirectional --conflict-column=a --conflict-comparison=greatest --tables ...
However I didn’t actually have any problem with the tables themselves. The two tables were just fine as they were; missing or NULL data does not indicate an error on their part. I wanted to get their merge. pt-table-sync is still up for the job: we can duplicate them, merge on the copy… But I prefer a query over an external script when possible.
INSERT…ON DUPLICATE KEY UPDATE
This MySQL-specific syntax is actually quite powerful. It basically says “if the insert fails due to unique constraint, you get a chance to update the row causing the failure”. But it also allows for smart setting of the column via the VALUES() clause. Let’s present some sample data and then see the solution. Continue reading » “Merging tables with INSERT…ON DUPLICATE KEY UPDATE”