{"id":7663,"date":"2017-01-30T11:14:33","date_gmt":"2017-01-30T09:14:33","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=7663"},"modified":"2017-01-30T11:16:18","modified_gmt":"2017-01-30T09:16:18","slug":"some-observations-on-mysql-to-sqlite-migration-compatibility","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/some-observations-on-mysql-to-sqlite-migration-compatibility","title":{"rendered":"Some observations on MySQL to sqlite migration &#038; compatibility"},"content":{"rendered":"<p>I&#8217;m experimenting with <a href=\"https:\/\/www.sqlite.org\/\">sqlite<\/a> as backend database for <a href=\"https:\/\/github.com\/github\/orchestrator\">orchestrator<\/a>. While <code>orchestrator<\/code>\u00a0manages MySQL replication topologies, it also uses MySQL as backend. For some deployments, and I&#8217;m looking into such one, having MySQL as backend is a considerable overhead.<\/p>\n<p>This sent me to the route of looking into a self contained <code>orchestrator<\/code>\u00a0binary + backend DB. I would have <code>orchestrator<\/code>\u00a0spawn up its own\u00a0backend database instead of connecting to an external one.<\/p>\n<h3>Why even relational?<\/h3>\n<p>Can&#8217;t <code>orchestrator<\/code>\u00a0just use a key-value backend?<\/p>\n<p>Maybe it could. But frankly I enjoy the power of relational databases, and the versatility they offer has proven itself multiple times with <code>orchestrator<\/code>, being able to answer interesting, new, complex questions about one&#8217;s topology by crafting SQL queries.<\/p>\n<p>Moreover, <code>orchestrator<\/code>\u00a0is already heavily invested in the relational model. At this time, replacing all SQL queries with key-value reads seems to me as a significant investment in time and risk. So I was looking for a relational, SQL accessible embeddable database for <code>orchestrator<\/code>.<\/p>\n<h3>Why sqlite?<\/h3>\n<p>I\u00a0am in particular looking at two options: sqlite (via the <a href=\"https:\/\/github.com\/mattn\/go-sqlite3\">go-sqlite3<\/a> binding) and <a href=\"https:\/\/github.com\/pingcap\/tidb\">TiDB<\/a>. sqlite does not need much introduction, and I&#8217;ll just say it&#8217;s embeddable within the golang-built binary.<!--more--><\/p>\n<p>TiDB is a pure-Go, MySQL dialect compatible server which provides relational model on top of key-value store. The store could be local or distributed, and the TiDB project cleverly separates involved layers.<\/p>\n<p>Of the two, sqlite is mature, alas uses a different SQL dialect and has different behavior. TiDB&#8217;s compatibility with MySQL is an impressive feat, but still ongoing.<\/p>\n<p>Both require adaptations of SQL code. Here are some observations on adaptations required when moving an existing app from MySQL backend to sqlite backend.<\/p>\n<h3>Differences<\/h3>\n<p>Just to answer an obvious question:\u00a0can&#8217;t everything be abstracted away by an ORM that speaks both dialects?<\/p>\n<p>I don&#8217;t think so. I always exploit SQL beyond the standard <code>insert<\/code>\/<code>delete<\/code>\/<code>update<\/code>\/<code>select<\/code>, exploits that ORMs just don&#8217;t support.<\/p>\n<p>Here&#8217;s an incomplete list of differences I found. Some purely syntactical, some semantical, some behavioral, and some operational.<\/p>\n<ul>\n<li>Data types: no <code>CHARACTER SET<\/code>\u00a0clause<\/li>\n<li>Data types: you can&#8217;t associate <code>UNSIGNED<\/code>\u00a0to any int type<\/li>\n<li>Data types: no <code>enum<\/code>. However there&#8217;s an alternative in the form of:<br \/>\n<code>race text check (race in ('human', 'dog', 'alien'))<\/code><\/li>\n<li><code>auto_increment<\/code>\u00a0is called <code>integer<\/code><\/li>\n<li>Data types: timestamps are not a thing. There&#8217;s no timezone info.<\/li>\n<li><code>TIMESTAMP<\/code>\u00a0has no <code>ON UPDATE<\/code>\u00a0clause.<\/li>\n<li>No <code>after<\/code>\u00a0clause for adding columns<\/li>\n<li>Indexes are not part of table creation. Only <code>PRIMARY KEY<\/code>\u00a0is. The rest of indexes are created via <code>CREATE INDEX<\/code>\u00a0statement<\/li>\n<li>Indexes have unique names across the schema. This is unfortunate, since it forces me to use longer names for indexes so as to differentiate them. For example,\u00a0in MySQL I can have an index named <code>name_idx<\/code>\u00a0in two different tables; in <code>sqlite<\/code>\u00a0I append table name for &#8220;namespacing&#8221;<\/li>\n<li>Temporal values and functions: poor support for time arithmetic.\n<ul>\n<li>Getting the diff between two datetimes is non-trivial\u00a0(what&#8217;s the diff in months for a leap year?)<br \/>\n<code><\/code><\/li>\n<li><code>INTERVAL<\/code>\u00a0keyword not respected. Appending\/subtracting dates can be done via:<br \/>\n<code>datetime('now', '-3 hour')<\/code><br \/>\nCan you see the problem in the above? What is the number <code>3<\/code>\u00a0is a positional argument? In MySQL I would use <code>NOW() - INTERVAL ? HOUR<\/code>. To make positional arguments work in <code>sqlite<\/code>, the above gets to be <code>datetime('now', printf('-%d hour', ?))<\/code>. How would you even translate <code>NOW() - INTERVAL (? * 2) SECOND<\/code>?<\/li>\n<li><code>UNIX_TIMESTAMP<\/code>\u00a0not respected. Instead using <code>strftime('%s', 'now')<\/code>, I dislike the use of string functions to generate times.<\/li>\n<\/ul>\n<\/li>\n<li><code>insert ignore<\/code>\u00a0turns to <code>insert or ignore<\/code><\/li>\n<li><code>replace into<\/code>\u00a0remains <code>replace into<\/code>. Yay!<\/li>\n<li><code>insert on duplicate key update<\/code>\u00a0has no equivalent. It&#8217;s worthwhile noting a <code>replace into<\/code>\u00a0does not <em>replace<\/em> (pun intended) an <code>insert ... on duplicate key<\/code>\u00a0as the latter can choose to only update a subset of column in the event of a constraint violation. It&#8217;s really very powerful.<\/li>\n<li><code>IS TRUE<\/code>\u00a0and <code>IS FALSE<\/code>\u00a0are not respected.<\/li>\n<li><code>ALTER TABLE<\/code>:\n<ul>\n<li>When adding a <code>not null<\/code> column one must specify the default value (e.g. <code>0<\/code>\u00a0for an <code>int<\/code>)<\/li>\n<li>You\u00a0cannot add a timestamp column that defaults to <code>CURRENT_TIMESTAMP<\/code>. You can have such column in your <code>CREATE TABLE<\/code>\u00a0definition, but you cannot add such a column. The reason being that <code>CURRENT_TIMESTAMP<\/code>\u00a0is not a constant value. When adding a column to a table, <code>sqlite<\/code>\u00a0does not actually apply the change to all existing rows, but only to newly created ones. It therefore does not know what value to provide to those older rows.<\/li>\n<li>You cannot <code>DROP COLUMN<\/code>. I&#8217;ll say it again. You cannot <code>DROP COLUMN<\/code><\/li>\n<li>You cannot modify a <code>PRIMARY KEY<\/code><\/li>\n<li>You cannot rename a column or change its datatype.<\/li>\n<li>In fact, the only supported <code>ALTER TABLE<\/code>\u00a0statements are\u00a0<code>ADD COLUMN\u00a0<\/code>and <code>RENAME<\/code>\u00a0(renaming the table itself)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Regarding the <code>ALTER TABLE<\/code>\u00a0limitations, the advice for dropping\/changing\/renaming columns or changing the primary key is to &#8220;create a new table with the new format, copy the rows, drop the old table, rename&#8221;. This is certainly feasible, but requires a substantial overhead from the user. And it&#8217;s non atomic. It requires a change in the state of mind but also a change in state of operations, the latter being non-trivial when moving from one DBMS to another, or when wishing to support both.<\/p>\n<p>I&#8217;m still looking into this, and trying to work my way around differences with cheap regular expressions for as much as possible. I&#8217;m\u00a0mainly interested right now in finding all semantic\/logic differences that would require application changes. So far the <code>TIMESTAMP<\/code>\u00a0behavior is such, and so is the <code>INSERT ... ON DUPLICATE KEY<\/code>\u00a0statement.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;m experimenting with sqlite as backend database for orchestrator. While orchestrator\u00a0manages MySQL replication topologies, it also uses MySQL as backend. For some deployments, and I&#8217;m looking into such one, having MySQL as backend is a considerable overhead. This sent me to the route of looking into a self contained orchestrator\u00a0binary + backend DB. I would [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":true,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[21,126],"class_list":["post-7663","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-sql","tag-sqlite"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-1ZB","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7663","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=7663"}],"version-history":[{"count":6,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7663\/revisions"}],"predecessor-version":[{"id":7669,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/7663\/revisions\/7669"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=7663"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=7663"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=7663"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}