Some observations on MySQL to sqlite migration & compatibility

January 30, 2017

I'm experimenting with sqlite as backend database for orchestrator. While orchestrator manages MySQL replication topologies, it also uses MySQL as backend. For some deployments, and I'm looking into such one, having MySQL as backend is a considerable overhead.

This sent me to the route of looking into a self contained orchestrator binary + backend DB. I would have orchestrator spawn up its own backend database instead of connecting to an external one.

Why even relational?

Can't orchestrator just use a key-value backend?

Maybe it could. But frankly I enjoy the power of relational databases, and the versatility they offer has proven itself multiple times with orchestrator, being able to answer interesting, new, complex questions about one's topology by crafting SQL queries.

Moreover, orchestrator is already heavily invested in the relational model. At this time, replacing all SQL queries with key-value reads seems to me as a significant investment in time and risk. So I was looking for a relational, SQL accessible embeddable database for orchestrator.

Why sqlite?

I am in particular looking at two options: sqlite (via the go-sqlite3 binding) and TiDB. sqlite does not need much introduction, and I'll just say it's embeddable within the golang-built binary.

TiDB is a pure-Go, MySQL dialect compatible server which provides relational model on top of key-value store. The store could be local or distributed, and the TiDB project cleverly separates involved layers.

Of the two, sqlite is mature, alas uses a different SQL dialect and has different behavior. TiDB's compatibility with MySQL is an impressive feat, but still ongoing.

Both require adaptations of SQL code. Here are some observations on adaptations required when moving an existing app from MySQL backend to sqlite backend.

Differences

Just to answer an obvious question: can't everything be abstracted away by an ORM that speaks both dialects?

I don't think so. I always exploit SQL beyond the standard insert/delete/update/select, exploits that ORMs just don't support.

Here's an incomplete list of differences I found. Some purely syntactical, some semantical, some behavioral, and some operational.

  • Data types: no CHARACTER SET clause
  • Data types: you can't associate UNSIGNED to any int type
  • Data types: no enum. However there's an alternative in the form of:
    race text check (race in ('human', 'dog', 'alien'))
  • auto_increment is called integer
  • Data types: timestamps are not a thing. There's no timezone info.
  • TIMESTAMP has no ON UPDATE clause.
  • No after clause for adding columns
  • Indexes are not part of table creation. Only PRIMARY KEY is. The rest of indexes are created via CREATE INDEX statement
  • Indexes have unique names across the schema. This is unfortunate, since it forces me to use longer names for indexes so as to differentiate them. For example, in MySQL I can have an index named name_idx in two different tables; in sqlite I append table name for "namespacing"
  • Temporal values and functions: poor support for time arithmetic.
    • Getting the diff between two datetimes is non-trivial (what's the diff in months for a leap year?)
    • INTERVAL keyword not respected. Appending/subtracting dates can be done via:
      datetime('now', '-3 hour')
      Can you see the problem in the above? What is the number 3 is a positional argument? In MySQL I would use NOW() - INTERVAL ? HOUR. To make positional arguments work in sqlite, the above gets to be datetime('now', printf('-%d hour', ?)). How would you even translate NOW() - INTERVAL (? * 2) SECOND?
    • UNIX_TIMESTAMP not respected. Instead using strftime('%s', 'now'), I dislike the use of string functions to generate times.
  • insert ignore turns to insert or ignore
  • replace into remains replace into. Yay!
  • insert on duplicate key update has no equivalent. It's worthwhile noting a replace into does not replace (pun intended) an insert ... on duplicate key as the latter can choose to only update a subset of column in the event of a constraint violation. It's really very powerful.
  • IS TRUE and IS FALSE are not respected.
  • ALTER TABLE:
    • When adding a not null column one must specify the default value (e.g. 0 for an int)
    • You cannot add a timestamp column that defaults to CURRENT_TIMESTAMP. You can have such column in your CREATE TABLE definition, but you cannot add such a column. The reason being that CURRENT_TIMESTAMP is not a constant value. When adding a column to a table, sqlite does not actually apply the change to all existing rows, but only to newly created ones. It therefore does not know what value to provide to those older rows.
    • You cannot DROP COLUMN. I'll say it again. You cannot DROP COLUMN
    • You cannot modify a PRIMARY KEY
    • You cannot rename a column or change its datatype.
    • In fact, the only supported ALTER TABLE statements are ADD COLUMN and RENAME (renaming the table itself)

Regarding the ALTER TABLE limitations, the advice for dropping/changing/renaming columns or changing the primary key is to "create a new table with the new format, copy the rows, drop the old table, rename". This is certainly feasible, but requires a substantial overhead from the user. And it's non atomic. It requires a change in the state of mind but also a change in state of operations, the latter being non-trivial when moving from one DBMS to another, or when wishing to support both.

I'm still looking into this, and trying to work my way around differences with cheap regular expressions for as much as possible. I'm mainly interested right now in finding all semantic/logic differences that would require application changes. So far the TIMESTAMP behavior is such, and so is the INSERT ... ON DUPLICATE KEY statement.

  • Jouni Järvinen

    SQLite is a pretty good option to be compatible in the direction of MySQL, but if the SQLite db is heavily used the performance just won't cut it. Firefox with tons of data in its DBs is a good example on slow db. LibreOffice changes to Firebird 3.0.0.

  • Shlomi Noach

    Yes, definitely, my use case will not be heavyweight and will not expect high performance. If it grows such that sqlite can't hold, that is perfect justification to switch back to MySQL.

  • Simon Mudd

    Not heavyweight depends. For people in small environments then having a single app+db is great. There's no disagreement there, but if you want orchestrator to be HA because you don't want to have your db servers going down then that's not so good. So perhaps it's misleading to lead people to a "this app fixes your problem mentality" as they may think: just install orchestrator and my MySQL failover problems are solved. I don't think it's quite as simple as that. I am pending writing a blog on setting up orchestrator in a HA environment and obviously that's a lot more complex and maybe only makes sense if orchestrator is monitoring a number of servers.

    If I were to do what you want I guess I would keep state in memory with a very basic setup on disk of the boxes to monitor (not their state) and if I needed HA I would exchange "config" and/or state with neighbours etc. Basically like networking daemons exchange routing information. The idea sounds simple, but having seen and knowing the current code that's a major change in the way things work and quite a risky undertaking though maybe it would allow "instant" recognition of failure scenarios and therefore be "better" than now. Most people don't really appreciate the complexity of what orchestrator does, the situations it handles and in practice a new in-memory based orchestrator server would also be pretty complex. It might be an interesting path to take but few people probably have time for such an adventure.

  • Shlomi Noach

    I'll be looking forward to read your blog post, Simon!
    These are early stages, and I'm still looking into the diffs between mysql/sqlite, and clearly expect to be looking more into the app behavioral changes.

    While I'm looking at this for an actual use case, I should note that orchestrator will continue to support mysql as backend (and I/we will continue to use it like that...).

    You're right about the architectural changes that would be required of orchestrator. We discussed some of those in the past: sharing work between orchestrator nodes; taking leadership; getting quorum agreement for failover; operation locks.

    I do believe with proper layering most of these changes would not have to complicate existing code, which would mostly be ignorant of the change. But -- ongoing effort, so we'll see.

 
Powered by Wordpress and MySQL. Theme by openark.org