common_schema – code.openark.org http://shlomi-noach.github.io/blog/ Blog by Shlomi Noach Tue, 27 Dec 2016 16:45:41 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.3 32412571 Get per-object grants via common_schema https://shlomi-noach.github.io/blog/mysql/get-per-object-grants-via-common_schema https://shlomi-noach.github.io/blog/mysql/get-per-object-grants-via-common_schema#respond Mon, 29 Sep 2014 11:30:09 +0000 https://shlomi-noach.github.io/blog/?p=7007 Did you know common_schema supports a complete breakdown of all accounts on your database server? It can provide you with the GRANT statements required to set up an account, the REVOKE statements to undo the former, and this can be broken down on a per-object & per-object-type basis. Consider the sql_grants view:

Find who has privileges on a table called t:

select * from common_schema.sql_grants where priv_level_name='table' and object_name='t'\G

           GRANTEE: 'power1'@'localhost'
              user: power1
              host: localhost
        priv_level: `mytst`.`t`
   priv_level_name: table
     object_schema: mytst
       object_name: t
current_privileges: SELECT
      IS_GRANTABLE: NO
         sql_grant: GRANT SELECT ON `mytst`.`t` TO 'power1'@'localhost'
        sql_revoke: REVOKE SELECT ON `mytst`.`t` FROM 'power1'@'localhost'
     sql_drop_user: DROP USER 'power1'@'localhost'

or, who has privileges on the test schema:

select * from common_schema.sql_grants where priv_level_name='schema' and object_name='test' \G

           GRANTEE: 'power1'@'localhost'
              user: power1
              host: localhost
        priv_level: `test`.*
   priv_level_name: schema
     object_schema: NULL
       object_name: test
current_privileges: ALTER, ALTER ROUTINE, CREATE, CREATE ROUTINE, CREATE TEMPORARY TABLES, CREATE VIEW, DELETE, DROP, EVENT, EXECUTE, INDEX, INSERT, LOCK TABLES, REFERENCES, SELECT, SHOW VIEW, TRIGGER, UPDATE
      IS_GRANTABLE: NO
         sql_grant: GRANT ALTER, ALTER ROUTINE, CREATE, CREATE ROUTINE, CREATE TEMPORARY TABLES, CREATE VIEW, DELETE, DROP, EVENT, EXECUTE, INDEX, INSERT, LOCK TABLES, REFERENCES, SELECT, SHOW VIEW, TRIGGER, UPDATE ON `test`.* TO 'power1'@'localhost'
        sql_revoke: REVOKE ALTER, ALTER ROUTINE, CREATE, CREATE ROUTINE, CREATE TEMPORARY TABLES, CREATE VIEW, DELETE, DROP, EVENT, EXECUTE, INDEX, INSERT, LOCK TABLES, REFERENCES, SELECT, SHOW VIEW, TRIGGER, UPDATE ON `test`.* FROM 'power1'@'localhost'
     sql_drop_user: DROP USER 'power1'@'localhost'

           GRANTEE: 'test'@'localhost'
              user: test
              host: localhost
        priv_level: `test`.*
   priv_level_name: schema
     object_schema: NULL
       object_name: test
current_privileges: ALTER, ALTER ROUTINE, CREATE, CREATE ROUTINE, CREATE TEMPORARY TABLES, CREATE VIEW, DELETE, DROP, EVENT, EXECUTE, INDEX, INSERT, LOCK TABLES, REFERENCES, SELECT, SHOW VIEW, TRIGGER, UPDATE
      IS_GRANTABLE: NO
         sql_grant: GRANT ALTER, ALTER ROUTINE, CREATE, CREATE ROUTINE, CREATE TEMPORARY TABLES, CREATE VIEW, DELETE, DROP, EVENT, EXECUTE, INDEX, INSERT, LOCK TABLES, REFERENCES, SELECT, SHOW VIEW, TRIGGER, UPDATE ON `test`.* TO 'test'@'localhost'
        sql_revoke: REVOKE ALTER, ALTER ROUTINE, CREATE, CREATE ROUTINE, CREATE TEMPORARY TABLES, CREATE VIEW, DELETE, DROP, EVENT, EXECUTE, INDEX, INSERT, LOCK TABLES, REFERENCES, SELECT, SHOW VIEW, TRIGGER, UPDATE ON `test`.* FROM 'test'@'localhost'
     sql_drop_user: DROP USER 'test'@'localhost'

In the same manner, you can easily SELECT for all grants that are defined per-table, per-schema, … priv_level_name: is any one of ‘user’, ‘schema’, ‘table’, ‘column’, ‘routine’.

The above is a view, which aggregates data from all relevant INFORMATION_SCHEMA tables, normalizing and de-normalizing as necessary.

You might also want to look at sql_show_grants which provides with a no-breakdown GRANT for all accounts. It’s like a SHOW ALL GRANTS which doesn’t exist, plus it’s SELECTable.

Also, thanks to eval(), the above make for executable code. Consider:

Revoke any and all specific grants on private_schema.private_table:

call common_schema.eval("select sql_revoke from sql_grants where object_schema='private_schema' and object_name='private_table'")

 

It’s been around for quite a while now. We’re using it in production extensively. Try it out!

]]>
https://shlomi-noach.github.io/blog/mysql/get-per-object-grants-via-common_schema/feed 0 7007
Speaking at Percona Live: common_schema, MySQL DevOps https://shlomi-noach.github.io/blog/mysql/speaking-at-perconalive-common_schema-mysql-devops https://shlomi-noach.github.io/blog/mysql/speaking-at-perconalive-common_schema-mysql-devops#respond Mon, 10 Mar 2014 08:27:42 +0000 https://shlomi-noach.github.io/blog/?p=6769 In less than a month I’ll be giving these two talks at Percona Live:

If you are still unfamiliar with common_schema, this will make for a good introduction. I’ll give you multiple reasons why you would want to use it, and how it would come to immediate use at your company. I do mean immediate, as in previous common_schema presentations I happened to get feedback emails from attendees within the same or next day letting me know how common_schema solved an insistent problem of theirs or how it exposed an unknown status.

I’ll review some useful views & routines, and discuss the ease and power of QueryScript. common_schema is a Swiss-knife of solutions, and all from within your MySQL server.

I am using common_schema in production on a regular basis, and it happened to be hero of the day in multiple occasions. I’ll present a couple such cases.

This is a technical talk touching at some cultural issues.

At Outbrain, where I work, we have two blessings: a large group of engineers and a large dataset. We at the infrastructure team, together with the ops team, are responsible for the availability of the data. What we really like is technology which lets the owners of a problem be able to recognize it and take care of it. We want ops guys to do ops, and engineers to do engineering. And we want them to be able to talk to each other and understand each other.

What tools can you use to increase visibility? To allow sharing of data between the teams? I’ll share some tools and techniques that allow us to automate deployments, detect a malfunctioning/abusing service, deploy schema changes across dozens of hosts, control data retention, monitor connections, and more.

We like open source. The tools discussed are mostly open source, or open sourced by Outbrain.

I’ll explain why these tools matter, and how they serve the purpose of removing friction between teams, allowing for quick analysis of problems and overall visibility on all things that happen.

Do come by!

]]>
https://shlomi-noach.github.io/blog/mysql/speaking-at-perconalive-common_schema-mysql-devops/feed 0 6769
Why delegating code to MySQL Stored Routines is poor engineering practice https://shlomi-noach.github.io/blog/mysql/why-delegating-code-to-mysql-stored-routines-is-poor-engineering-practice https://shlomi-noach.github.io/blog/mysql/why-delegating-code-to-mysql-stored-routines-is-poor-engineering-practice#comments Thu, 06 Feb 2014 08:32:17 +0000 https://shlomi-noach.github.io/blog/?p=6713 I happen to use stored routines with MySQL. In fact, my open source project common_schema heavily utilizes them. DBA-wise, I think they provide with a lot of power (alas, the ANSI:SQL 2003 syntax feels more like COBOL than a sane programming language, which is why I use QueryScript instead).

However I wish to discuss the use of stored routines as integral part of your application code, which I discourage.

The common discussion on whether to use or not use stored routines typically revolves around data transfer (with stored routines you transfer less data since it’s being processed on server side), security (with stored routines you can obfuscate/hide internal datasets, and provide with limited and expected API) and performance (with MySQL this is not what you would expect, as routines are interpreted & their queries re-evaluated, as opposed to other RDBMS you may be used to).

But I wish to discuss the use of stored routines from an engineering standpoint. The first couple of points I raise are cultural/behavioural.

2nd grade citizens

Your stored routines are not likely to integrate well with your IDE. While your Java/Scala/PHP/Ruby/whatnot code comfortably lies within your home directory, the stored routines live in their own space: a database container. They’re not as visible to you as your standard code. Your IDE is unaware of their existence and is unlikely to have the necessary plugin/state of mind to be able to view these.

This leads to difficulty in maintaining the code. People typically resort to using some SQL-oriented GUI tool such as MySQL Workbench, SequelPro or other, commercial tools. But these tools, while make it easy to edit your routine code, do not integrate (well?) with your source control. I can’t say I’ve used all GUI tools; but how many of them will have Git/SVN/Mercurial connectors? How many of them will keep local history changes once you edit a routine? I’m happy to get introduced to such a tool.

Even with such integration, you’re split between two IDEs. And if you’re the command line enthusiast, well, you can’t just svn ci -m “fixed my stored procedure bug”. Your code is simply not in your trunk directory.

It can be done. You could maintain the entire routine code from within your source tree, and hats off to all those who do it. Most will not. See later on about deployments for more on this.

Testing

While engineers are keen on writing unit tests for every class and method they create, they are less keen on doing the same for stored routines. This is an observation, having seen many instalments. And I can tell you why: your stored routine testing will not integrate well with your JUnit/PHPUnit/…

There are testing frameworks for databases, and indeed I hacked my own mini unit testing code with common_schema. But it’s a different testing framework. You might also have realized by now that testing databases is somewhat different. It can be done, and hats off again to those that implement it as common practice. Many don’t. Database are often more heavyweight to test. Not all operations done by routines are easily rolled back, which leads to having to rebuild the entire dataset before tests. This in itself leads to longer test periods and a need for multiple test databases so as to allow for concurrent builds.

How many companies practice both version control and unit testing over their routine code? I believe not many (and am happy to hear about those who do). To be more direct, of all the companies I ever consulted to: I have never seen one that does both.

Debugging

MySQL stored routines do not have built in debugging capabilities. To debug your routines, you will have to use one of two methods:

  • Simulate your routine code (ie mimic their execution on top of some interpreter). There are tools to do that. For me this is a complete NO GO and utterly untrustworthy. You can mimic what you think is how the routine should behave, but never they full behaviour. While developing common_schema I came upon plenty weird behaviour, some of it bugs, that you just can’t build into your emulation.
  • Inject debugging code into your routine code. I do that with RDebug. You can do breakpoints, step into, step out, most of the interesting stuff. Other tools do that as well. It is not the right way to go: you’re essentially modifying your code, placing more locks, communicating, and losing some functionality. It is a necessary evil solution for a necessary evil programming method… How good can that be?

The right way to go would be to have debugging API built into the MySQL server.

But, wait, that would still be next to worthless, since our discussion is over programming with stored routines: letting your application call upon stored routines in your database. Until the day where I could use my IntelliJ debugger to step from my java method which calls upon a stored procedure, and into the stored procedure itself, debugging your code is completely detached from your stored routine debugging.

Refactoring & deploying

Say you wanted to add a column to your table: you would go ahead and add it, and perhaps populate it. You would then modify your application code to support this new column, and deploy. Say you wanted to drop a table column. You would first deploy changes to your application code that ignore said column, and once the code is in place you would go and actually make the DROP.

How do you do the same with a stored routine? Support your routine accepts two parameters, and you wish to add a third?

There is no support for optional parameters. Your routine either accepts two parameters or three. Your application code will have to provide the exact number of parameters. You will have to deploy both your SQL changes and your application changes at the same time. This is by definition impossible, unless you are OK with a stop the world approach, which is unlikely in production.

Code constraints

One solution to the above is to create a new routines. Somehow “overload” it. But you can’t overload a stored routine; you’ll have to create a routine by a new name. This will allow you to slowly and smoothly migrate between the two.

Ahem, smoothly? How easy is it to find all invocations of a certain routines from your code? It will be typically lie in some String, or within some XML config file. There is no safe “find references to this procedure” IDE mechanism. There is no constraint in your IDE that will tell you “there is no such procedure” if you misspell the name.

Trash bin

Suppose you overcame the above. You now have two routines. You need to remember to DROP the old one, right? Will you?

When presenting common_schema, a common question I ask the audience is as follows:

Suppose I accessed your database and listed the entire set of stored functions and procedures. How many of them are you not even sure are in use anymore? How many of them you think you can DROP, but are too afraid to, and keep them in just in case?

I wouldn’t commonly ask that question had it not always provides a common nodding and smiling in the audience. People forget to drop their routines, and then forget about them, and are never sure whether they are used (your IDE doesn’t easily tell you that, remember? Sure, you can grep around; that’s not what most engineers would do). And those routines pile up to become trash.

Data or code?

Last but not least: a stored routine is a piece of code, right? Well, as far as the database is concerned, it’s really a piece of data. It’s located within a schema. It’s stored. It is an integral part of your data set: when you back up your data, you’re most likely to backup the code as well. When you restore, you’re likely to restore both. There are obvious advantages to that, DB-wise. Or should I say, DBA-wise. Engineering-wise? Does a database-restore operation count as code deployment? We can argue over beer.

Final notes

Having said all that: yes, I’m using an occasional stored routine. I see these occasions as a necessary evil, and sometimes it’s just the correct solution.

I’m happy to know what methods have been developed out there to overcome the above, please share; and please feel free to contradict the above.

]]>
https://shlomi-noach.github.io/blog/mysql/why-delegating-code-to-mysql-stored-routines-is-poor-engineering-practice/feed 11 6713
common_schema: speaking at Percona Live London, Nov. 2013 https://shlomi-noach.github.io/blog/mysql/common_schema-speaking-at-percona-live-london-nov-2013 https://shlomi-noach.github.io/blog/mysql/common_schema-speaking-at-percona-live-london-nov-2013#comments Mon, 04 Nov 2013 15:30:50 +0000 https://shlomi-noach.github.io/blog/?p=6645 In one week’s time I’ll be presenting common_schema: DBA’s framework for MySQL at Percona Live, London.

This talk introduces the rich toolset known as common_schema. It is free, open source, extremely useful in solving DBA & developer tasks, and is the next best thing ever invented next to SQL pie charts.

I’ll introduce:

  • Views, with which you can detect and terminate idle transactions, blocking transactions; roll your range partitions; find duplicate keys; block/unblock accounts; get fine grained privileges per account; find AUTO_ICNREMENT free space; …
  • Routines: do meta executions such as eval(); get query checksums; duplicating accounts; killing suspicious connections; security auditing; parsing JSON data; …
  • QueryScript: if you’re not using it, you’re missing on a powerful scripting language tightly integrated with SQL/MySQL. We’ll see the basic constructs, variables, loops; the more sophisticated MySQL/locks/overhead/danger aware constructs such as foreach & split; throttling, exceptions, it’s all in there. I’ll present real scripts that saved the day and challenge you to implement them in another scripting language.
  • Briefly introducing rdebug: stored routine debugger and debugging API
  • Roadmap (some cool things coming along)

What this talk isn’t

A tedious “read the manual aloud”. Nor is it a comprehensive listing of all functionality. These would be the surest way of sending you to sleep.

What this talk is

A view into the concepts behind common_schema; the reasons this project has justification to exist; the various problems it solves; and yes, interesting examples. Every single common_schema feature was developed out of real world need.

I am confident you’ll find common_schema to have something you need that will improve your work as a developer or a DBA.

The talk is similar to the one I gave at Santa Clara this April. It is updated with new content following recent developments.

 

]]>
https://shlomi-noach.github.io/blog/mysql/common_schema-speaking-at-percona-live-london-nov-2013/feed 5 6645
Converting an OLAP database to TokuDB, part 3: operational stuff https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-3-operational-stuff https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-3-operational-stuff#comments Mon, 14 Oct 2013 10:03:43 +0000 https://shlomi-noach.github.io/blog/?p=6603 This is the third post in a series of posts describing our experience in migrating a large DWH server to TokuDB (see 1st and 2nd parts). This post discusses operations; namely ALTER TABLE operations in TokuDB. We ran into quite a few use cases by this time that we can shed light on.

Quick recap: we’ve altered one of out DWH slaves to TokuDB, with the goal of migrating most of out servers, including the master, to TokuDB.

Adding an index

Shortly after migrating our server to TokuDB we noticed an unreasonably disproportionate slave lag on our TokuDB slave (red line in chart below) as compared to other slaves.

tokudb-slave-lag

Quick investigation led to the fact that, coincidentally, a manual heavy-duty operation was just taking place, which updated some year’s worth of data retroactively. OK, but why so slow on TokuDB? Another quick investigation led to an apples vs. oranges problem: as depicted in part 1, our original setup included MONTHly partitioning on our larger tables, whereas we could not do the same in TokuDB, where we settled for YEARly partitioning.

The heavy-duty operation included a query that was relying on the MONTHly partitioning to do reasonable pruning: a WHERE condition on a date column did the right partition pruning; but where on InnoDB that would filter 1 month’s worth of data, on TokuDB it would filter 1 year.

Wasn’t it suggested that TokuDB has online table operations? I decided to give it a shot, and add a proper index on our date column (I actually created a compound index, but irrelevant).

It took 13 minutes to add an index on a 1GB TokuDB table (approx. 20GB InnoDB uncompressed equivalent):

  • The ALTER was non blocking: table was unlocked at that duration
  • The client issuing the ALTER was blocked (I thought it would happen completely in the background) — but who cares?
  • I would say 13 minutes is fast

Not surprisingly adding the index eliminated the problem altogether.

Modifying a PRIMARY KEY

It was suggested by our DBA that there was a long time standing need to modify our PRIMARY KEY. It was impossible to achieve with our InnoDB setup (not enough disk space for the operation, would take weeks to complete if we did have the disk space). Would it be possible to modify our TokuDB tables? On some of our medium-sized tables we issued an ALTER of the form:

ALTER TABLE my_table DROP PRIMARY KEY, ADD PRIMARY KEY (c1, c2, c3, ...);

Time-wise the operation completed in good time. We did note, however, that the disk space consumed by the new table was doubled. Was it due to the fact we added two columns to our PK? Did that account for the bloated space? I did not believe so, and decided to rebuild the table:

OPTIMIZE TABLE my_table

Nope. Space not reduced. However we were unconvinced and asked. As usual, we got quick response from the Tokutek team; this was a bug: while our original table used the TOKUDB_SMALL row format (high compression), the table rebuild reset it to TOKUDB_FAST (normal compression), which makes for roughly twice the file size. The bug was filed as: alter table operations that rebuild the table lose the original tokudb compression.

Now, we were altering the PRIMARY KEY. We were not expecting an online operation anyhow, and didn’t mind blocking the table; hence the solution was simple: make sure to spceify the row format:

ALTER TABLE my_table DROP PRIMARY KEY, ADD PRIMARY KEY (c1, c2, c3, ...) ENGINE=TokuDB ROW_FORMAT=TOKUDB_SMALL;

This worked in terms of disk space — but we only later realized it would still make us trouble.

Modifying a PRIMARY KEY on our largest table

We moved on to our largest table: originally 1TB InnoDB COMPRESSED, worth of 2TB uncompressed. With TokuDB it went down to 100GB. Converting this table to TokuDB took about 40 hours, which is just fast. We issued an ALTAR TABLE modifying the PRIMARY KEY as above and waited.

The operation did not complete after 40 hours. Nor after 3 days. By day 4 we thought we might look into this. Fortunately, TokuDB is friendly on SHOW PROCESSLIST and provides you with useful information, such as “Fetched about 1234567890 rows, loading data still remains“. Yikes! We extrapolated the values to realize it would take 2 weeks to complete! Weekend went by and we decided to find a better way. Again, posting on the tokudb-user group, we got a definitive answer: a table rebuild does not utilize the bulk loader (you really want to be friends with the bulk loader, it’s the process that loads your data quickly).

And so we chose to KILL the ALTER process and go another way; again, KILLs are very easy with TokuDB ALTER operations: took 3 minutes to abort this week old operation. The alternative operation was:

CREATE TABLE my_table_New LIKE my_table;
ALTER TABLE my_table_New DROP PRIMARY KEY, ADD PRIMARY KEY (c1, c2, c3, ...) ENGINE=TokuDB ROW_FORMAT=TOKUDB_SMALL;
INSERT INTO my_table_New SELECT * FROM my_table;
RENAME TABLE my_table TO my_table_Old, my_table_New TO my_table;
DROP TABLE my_table_Old;

The INSERT INTO … SELECT operation does use the bulk loader when you do it on an empty table. It completed within merely 30 hours. Hurrah!

DROPping a TABLE

It was an immediate operation to drop our “Old” table — subsecond. Nothing like your InnoDB DROP.

]]>
https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-3-operational-stuff/feed 7 6603
Converting an OLAP database to TokuDB, part 2: the process of migration https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-2-the-process-of-migration https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-2-the-process-of-migration#comments Mon, 09 Sep 2013 03:29:30 +0000 https://shlomi-noach.github.io/blog/?p=6530 This is a second in a series of posts describing our experience in migrating a large DWH server to TokuDB. This post discusses the process of migration itself.

As a quick recap (read part 1 here), we have a 2TB compressed InnoDB (4TB uncompressed) based DWH server. Space is running low, and we’re looking at TokuDB for answers. Early experiments show that TokuDB’s compression could make a good impact on disk space usage. I’m still not discussing performance — keeping this till later post.

Those with weak hearts can skip right to the end, where we finally have a complete conversion. You can also peek at the very end to find out how much 4TB uncompressed InnoDB data is worth in TokuDB. But you might want to read through. The process was not smooth, and not as expected (it’s a war story thing). Throughout the migration we got a lot of insight on TokuDB’s behaviour, limitations, conveniences, inconveniences and more.

Disclosure: I have no personal interests and no company interests; throughout the process we were in touch with Tokutek engineers, getting free, friendly & professional advice and providing with input of our own. Most of this content has already been presented to Tokutek throughout the process. TokuDB is open source and free to use, though commercial license is also available.

How do you convert 4TB worth of data to TokuDB?

Obviously one table at a time. But we had another restriction: you may recall I took a live slave for the migration process. And we wanted to end the process with a live slave. So the restriction was: keep it replicating!

How easy would that be? Based on our initial tests, I extrapolated over 20 days of conversion from InnoDB to TokuDB. Even with one table at a time, our largest table was expected to convert in some 12-14 days. Can we retain 14 days of binary logs on a server already running low on disk space? If only I knew then what I know today 🙂

I recently claimed (as I recall it was in one of the @DBHangops meetings) I was done with ALTER TABLE statements. I would not touch them again: with oak-online-alter-table and pt-online-schema-change I can get better control of my server (and my sleep). Can I use an online migration tool?

Fortunately we are using Statement Based Replication on this MySQL topology. This makes for good news, because triggers are activated on slave as it is replicating its master’s statements. You can run an online migration tool on the slave, while it keeps replicating. This is so cool. I don’t need to worry too much about binary logs and relay logs. I can take my time with conversion…

I chose to use pt-online-schema-change. Why choose Percona’s tool over my own? Percona’s tool supports the –check-slave-lag option, which throttles the operation should the server’s slaves start lagging. Wait, I’m running the tool on a slave, so what’s the point? Well, you can cheat and provide –check-slave-lag=h=127.0.0.1 so that the tool assumes the localhost is the slave (while it is actually the server being altered); which means it will check on its own slave lag to do the throttling. This works well and is fun to watch.

Starting the migration

Some of our tables had the KEY_BLOCK_SIZE explicitly declared. As I mentioned in previous post, for TokuDB <= 7.0.4 this causes problems by bloating the indexes instead of compressing them (and Tim Callaghan of Tokutek notes this is fixed in next version). common_schema to the rescue: the ALTER statement has to include a DROP and recreate of all indexes.

This is the place to mention our tables are heavily partitioned. This will play a crucial role in the next events. We use RANGE partitions over dates. We have a monthly partitioning scheme on all partitioned tables. And we have partitions to spare: starting a few years back from today (depending on the table) and made until Dec. 2025 — making for some 170 – 200 partitions per table.

Starting from smallest table (a few rows) and increasing in size, we migrated tables one by one to TokuDB.

Party crashers

A few party crashers were made obvious right from the start (they are all being addressed by Tokutek as far as I know):

  1. @@datadir: all TokuDB files reside in @@datadir. You get a pile of files in the same directory where you would find your ib_logfile*, master.info, etc. files.
  2. File names: you do not get file names after table names. Instead, you get cryptic names like ./_myschema_sql_7fda_8e73_*.You would suspect that there is some uniqueness to the 7fda_8e73 thing; that it relates to a single table — it doesn’t. Same table get different file names, different tables get similar names — there’s not one regular expression to differentiate tables — and I do know my regexes.
  3. INFORMATION_SCHEMA.Tokudb_file_map doesn’t make it much easier, either. It is meant to tell you about tables to file names mappings. But the tables are not laid out in easy TABLE_SCHEMA, TABLE_NAME columns, but are denormalized themselves, and can be vague and almost ambiguous, to some extent. common_schema to the rescue, its rewritten tokudb_file_map maps your tables to aggregated list of file names, along with shell commands you would typically want to issue.

But let me emphasize these issues, before you think I’m just spoiled. TokuDB creates multiple files per table: other than the standard .frm file, you get one “main” file for each table, and then another file for each index. So it’s quite possible your table will consist of some 7 files.

Well, as you may know, MySQL’s implementation of partitioning is that each partition is made of its own standalone table, hidden from the user (but easily viewable on the filesystem). Now this means a single table with 170 partitions and a few indexes can make for over 1,000 files. That’s right – for a single table. We have a few dozens like this.

Now consider:

  • You may need to accommodate thousands or tens of thousands of files in your @@datadir
  • None of which it is easy for you to know who to relate to.

So common_schema‘s tokudb_file_map gives you this crazy list of 1,000 files which make up your single table. This isn’t too friendly, either.

I think Tokutek are missing here on one of the greatest advantages they bring to the table. The one thing a DBA want to know when looking at her MySQL’s filesystem is: how much disk space is used by a schema/table. And this information becomes hard to get. Again, common_schema‘s view will provide you with the script to do it (du -c … | tail -1) — but you would have to go into MySQL, out to shell… Bother.

Crashes and failures

The above did not stop at inconveniences. Soon enough, and while still altering my smaller tables, I would get failures from pt-online-schema-change. Checking manually to eliminate the possibility of a bug in Percona’s tool, I got:

mysql> alter table my_schema.my_table engine=tokudb row_format=tokudb_small;
ERROR 1016 (HY000): Can't open file: './my_schema/#sql-80d_2.frm' (errno: 24)

This would happen again and again and again. What’s the deal here?

Let me cut short on this one: we got over 20,000 files in @@datadir. And MySQL was unable to open any more files. Mind you, we had:

  • open_files_limit=30000
  • open_table_cache=16000
  • table_definition_cache=3000

Quite the generous numbers (also backed up by ulimit, to be on the safe side; and also note we’re using XFS file system). And yet, open files were an issue. To prove my point, it was possible to ALTER a table with a fewer number of partitions. It was then possible to ALTER another smaller table without partitions. It was then impossible to alter any additional partitioned table. Once I dropped partitioning for some very small table some room was made and I was again able to ALTER a partitioned table… And this would happen for completely empty tables — no data involved. We were on some file limit here.

Another such ALTER and the server crashed. This was quite unceremonious. The error log produced nothing; no stack trace. Zit.

I was fervently querying the Tokudb_file_map to get a picture of what’s going on. I would need to do a self join on the table (as common_schema‘s view does) to get a per-table listing of files. This would occasionally crash the server. I guess I had 3 or 4 such crashes.

Recovery

On this I feel I’m quite the veteran here :D. TokuDB recovery works well. As with InnoDB, TokuDB recognize there has been a crash, and before allowing MySQL to open connections it restores to a stable state.

ALTER TABLE in TokuDB

Here I found two comforting features (the third and fourth yet to be discovered). As indicated above, I did turn to issue a manual ALTER TABLE. What I found was:

  1. A TokuDB ALTER TABLE statement lets you know its progress. This is no little thing! Your SHOW PROCESSLIST output shows messages like “Fetched about 312724000 rows, loading data still remains”, or “Loading of data about 66.1% done”, or “Queried about 33113001 rows, Inserted about 33113000 rows”. Cool!
  2. Even better, the crash I had during the ALTER TABLE? I thought that would be the end of it. If you ever had a crash while ALTERing an InnoDB table, you know how it goes: InnoDB will forever complain about some table existing but not listed (or the other way around). And don’t get me started with DISCARD TABLESPACE; when InnoDB decides it is upset about something – you cannot convince it otherwise.
    Thankfully, TokuDB completely reverted the ALTER operation. It removed what temporary files were created (further notes following) and forgot all about it. No complaints, no ghosts. Great! Back to consistency!

What do we do now?

Throwing my hands up in the air, having worked on this for many days, I thought to myself: OK, I still have this server all to myself. If TokuDB is not going to work out, I have some time to come up with a sharding/commercial solution. Let’s use up this time and learn something about TokuDB. And I decided to re-create all tables without partitions. My colleague argued that she was not ready to give up on partitioning altogether and we decided to try again with YEARly partitioing scheme. This would reduce number of files by factor of 12. Also, 2025 is so far away, we agreed to settle for 2015. So reducing number of files by factor of 25-30.

And this made all the difference in the world. Having reduced number of files made the impact we were hoping for. Suddenly all went well. No crashes, no weird complaints, little proliferation of files in @@datadir.

ALTER TABLE

And I did notice that a manual ALTER TABLE went considerably faster than I would expect. And by far faster than the pt-online-schema-change pace. I tried a couple more — sure thing. ALTERing a table from InnoDB to TokuDB is fast.

How fast?

  • I converted a 47GB InnoDB COMPRESSED table to TokuDB in 73 minutes. By the way, resulting table size measured 3.4GB.
  • A 330GB InnoDB COMPRESSED table converted to TokuDB took little over 9 hours. I dare you alter 600GB worth of uncompressed data into InnoDB (COMPRESSED) in less than a few days. It went down to 31GB.
  • And our largest, 1TB COMPRESSED  table (2TB worth of uncompressed data)? There’s yet another story here.

Altering 1 (2 uncomressed) TB of data

Here’s a tip that will save you some exhaustion: SET tokudb_load_save_space := 1.

While ALTERing our largest table, I was concerned to find our disk space was running low. Plenty temporary TokuDB files were created. I assumed these would consume only so much disk space, but to my surprise they accumulated and accumulated… It turns out for ALTERing a table TokuDB creates the equivalent of the table in temporary files, and only then generates the new table. This means you need to have enough room for your own original table, the equivalent in temporary files, and your new table altogether.

With great compression that would be nothing. However you’ll be surprised to learn that by default those temporary files are not compressed. Thus, the ALTER operation consumed more than 1.3TB of disk space in temporary files, until I had no choice and (36 hours into the operation) had to KILL it before it consumed the entire 3TB of disk space.

Setting the variable as specified and the next attempt was far more successful: the temporary files were created with same compression algorithm as target table, which left with a lot of free space to work with.

ALTER time took about 40 hours.

Well, what’s the resulting size?

And we were finally done! It took the better part of three weeks to work through all the pitfalls, the pt-online-schems-change attempts, the crashes, the tests, the no-partitions, the YEARly partitions… Finally we are with a TokuDB version of our data warehouse.

Suspension is over. We got from 2TB of InnoDB COMPRESSED (KEY_BLOCK_SIZE=8) down to 200GB of TokuDB_SMALL (aka agressive, aka lzma) tables.

I mean, this is beyond expectations. It is ridiculously small. From 80% disk space utilization down to 8% disk space utilization. Absolutely ridiculous!

Conclusions

  • TokuDB does not play well with many partitions.
  • Crashes encountered. Recovery is fine.
  • Good ALTER TABLE experience
  • SET tokudb_load_save_space := 1
  • Great compression (x20 from uncompressed InnoDB; x10 from KEY_BLOCK_SIZE=8)

Next

In the following post I’ll share some observations on how well our newly converted TokuDB slave performs as compared to our equivalent InnoDB slaves; some configuration you might care about; and some things you can do with TokuDB that would be so very painful with InnoDB. Stay tuned!

]]>
https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-2-the-process-of-migration/feed 7 6530
Converting an OLAP database to TokuDB, part 1 https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-1 https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-1#comments Tue, 03 Sep 2013 07:04:12 +0000 https://shlomi-noach.github.io/blog/?p=6473 This is the first in a series of posts describing my impressions of converting a large OLAP server to TokuDB. There’s a lot to tell, and the experiment is not yet complete, so this is an ongoing blogging. In this post I will describe the case at hand and out initial reasons for looking at TokuDB.

Disclosure: I have no personal interests and no company interests; we did get friendly, useful and free advice from Tokutek engineers. TokuDB is open source and free to use, though commercial license is also available.

The case at hand

We have a large and fast growing DWH MySQL setup. This data warehouse is but one component in a larger data setup, which includes Hadoop, Cassandra and more. For online dashboards and most reports, MySQL is our service. We populate this warehouse mainly via Hive/Hadoop. Thus, we have an hourly load of data from Hive, as well as a larger daily load.

There are some updates on the data, but the majority of writes are just mysqlimports of Hive queries.

Usage of this database is OLAP: no concurrency issues here; we have some should-be-fast-running queries issued by our dashboards, as well as ok-to-run-longer queries issued for reports.

Our initial and most burning trouble is with size. Today we use COMPRESSED InnoDB tables (KEY_BLOCK_SIZE is default, i.e. 8). Our data volume sums right now at about 2TB. I happen to know this translates as 4TB of uncompressed data.

However growth of data is accelerating. A year ago we would capture a dozen GB per month. Today it is a 100GB per month, and by the end of this year it may climb to 150GB per month or more.

Our data is not sharded. We have a simple replication topology of some 6 servers. Machines are quite generous as detailed following. And yet, we will be running out of resources shortly: disk space (total 2.7TB) is now running low and is expected to run out in about six months. One of my first tasks in Outbrain is to find a solution to our DWH growth problem. The solution could be sharding; it could be a commercial DWH product; anything that works.

The approach we experiment with

It was at my initial interview that I suggested TokuDB might be a good solution, with the primary reason of being so good with compression. And we decided to experiment with this simple (setup-wise) solution of compression. If we could compress the data even by 50%, that would buy us considerable time. And it’s the simplest approach as we would need to change nothing at the application side, nor add additional frameworks.

Of course, we were already using InnoDB COMPRESSED tables. How about just improving the compression? And here I thought to myself: we can try KEY_BLOCK_SIZE=4, which I know would generally compress by 50% as compared to KEY_BLOCK_SIZE=8 (not always, but in many use cases). We’re already using InnoDB so this isn’t a new beast; it will be “more of the same”. It would work.

I got myself a dedicated machine: a slave in our production topology I am free to play with. I installed TokuDB 7.0.1, later upgraded to 7.0.3, based on MySQL 5.5.30.

The machine is a Dell Inc. PowerEdge R510 machine, with 16 CPUs @ 2.1 GHz and 126 GiB RAM, 16 GiB Swap. OS is CentOS 5.7,  kernel 2.6.18. We have RAID 10 over local 10k RPM SAS disks (10x600GB disks)

How to compare InnoDB & TokuDB?

2TB of compressed data (for absolute measurement I consider it to be a 4TB worth of data) is quite a large setup. How do I do the comparison? I don’t even have too much disk space here…

We have tables of various size. Our largest is in itself 1TB (2TB uncompressed) – half of the entire volume. The rest ranging 330GB, 140GB, 120GB, 90GB, 50GB and below. We have MONTHly partitioning schemes on most tables and obviously on our larger tables.

For our smaller tables, we could just CREATE TABLE test_table LIKE small_table, populating it and comparing compression. However, the really interesting question (and perhaps the only interesting question compression-wise) is how well would our larger (and specifically largest) tables would compress.

Indeed, for our smaller tables we saw between 20% to 70% reduction in size when using stronger InnoDB compression: KEY_BLOCK_SIZE=4/2/1. How well would that work on our larger tables? How much slower would it be?

We know MySQL partitions are implemented by actual independent tables. Our testing approach was: let’s build a test_table from a one month worth of data (== one single partition) of our largest table. We tested:

  • The time it takes to load the entire partition (about 120M rows, 100GB COMPRESSED data as seen on .idb file)
  • The time it would take to load a single day’s worth of data from Hive/Hadoop (loading real data, as does our nightly import)
  • The time it would take for various important SELECT query to execute on this data.

InnoDB vs. TokuDB comparison

In this post I will only describe our impressions of compression size. I have a lot to say about TokuDB vs InnoDB partitioning and queries; this will wait till later post.

So here goes:

Engine Compression Time to Insert 1 month Table size (optimized) Time to import 1 day
InnoDB 8k 10.5h 58GB 32m
InnoDB 4k 48h 33GB unknown (too long)
TokuDB quicklz 14h 17GB 40m
TokuDB lzma (small/aggresive) 15h 7.5GB 42m

Some comments and insights:

  • Each test was performed 3-4 times. There were no significant differences on the various cycles.
  • The 1 month insert was done courtesy QueryScript split5,000 rows at a time, no throttling.
  • The 1 day import via mysqlimport. There were multiple files imported. Each file is sorted by PRIMARY KEY ASC.
  • Isn’t it nice to know that your 100GB InnoDB table actually fits within 58GB when rebuilt?
  • For InnoDB flush_logs_at_trx_commit=2, flush_method=O_DIRECT.
  • I used default configuration to TokuDB — touched nothing. More on this in later post.
  • InnoDB 4k was prohibitively slow to load data. It was so slow so as to be unacceptable. For the 1 day load it took 1 hour for a mere 20% of data to load. 1 hour was already marginal for our requirements; waiting for 5 hours was out of the question. I tested several times, never got to wait for completion. Did I say it would just be “more of the same”? 4k turned to be “not an option”.
  • I saw almost no difference in load time between the two TokuDB compression formats. Both somewhat (30%) longer than InnoDB to load, but comparable.
  • TokuDB compression: nothing short of amazing.

With InnoDB 4k being “not an option”, and with both TokuDB compressions being similar in load time yet so different in compression size, we are left with the following conclusion: if we want to compress more than our existing 8k (and we have to) – TokuDB’s agressive compression (aka small, aka lzma) is our only option.

Shameless plug

common_schema turned to be quite the “save the day” tool here. Not only did we use it to extract 100GB of data from a large dataset and load it onto our tables, it also helped out in the ALTER process for TokuDB: at this time (<= 7.0.4) TokuDB still has a bug with KEY_BLOCK_SIZE: when this option is found in table definition, it impacts TokuDB’s indexes by bloating them. This is how sql_alter_table_tokudb was born. Hopefully it will be redundant shortly.

More to come

Was our test fair? Should we have configure TokuDB differently? Is loading via small 5,000 row chunks the right way?

In the next post I will describe the process of migrating our 4TB worth of data to TokuDB, pitfalls, issues, party crushers, sport spoilers, configuration, recovery, cool behaviour and general advice you should probably want to embrace. At later stage I’ll describe how our DWH looks after migration. Finally I’ll share some (ongoing) insights on performance.

You’ll probably want to know “How much is (non compressed) 4TB of data worth in TokuDB?” Let’s keep the suspense 🙂

]]>
https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-1/feed 8 6473
common_schema 2.2: better QueryScript isolation & cleanup; TokuDB; table_rotate, split params https://shlomi-noach.github.io/blog/mysql/common_schema-2-2-better-queryscript-isolation-tokudb-table_rotate-split-params https://shlomi-noach.github.io/blog/mysql/common_schema-2-2-better-queryscript-isolation-tokudb-table_rotate-split-params#comments Tue, 13 Aug 2013 03:39:12 +0000 https://shlomi-noach.github.io/blog/?p=6468 common_schema 2.2 is released. This is shortly after the 2.1 release; it was only meant as bug fixes release but some interesting things came up, leading to new functionality.

Highlights of the 2.2 release:

  • Better QueryScript isolation & cleanup: isolation improved across replication topology, cleanup done even on error
  • Added TokuDB related views
  • split with “index” hint (Ike, this is for you)
  • table_rotate(): a logrotate-like mechanism for tables
  • better throw()

Drill down:

Better QueryScript isolation & cleanup

common_schema 2.1 introduced persistent tables for QueryScript. This also introduced the problem of isolating concurrent scripts, all reading from and writing to shared tables. In 2.1 isolation was based on session id. However although unique per machine, collisions were possible across replication topology: a script could be issued on master, another on slave (I have such use cases) and both use same (local) session id.

With 2.2 isolation is based on server_id & session id combination; this is unique across a replication topology.

Until 2.1, QueryScript used temporary tables. This meant any error would just break the script, and the tables were left (isolated as they were, and auto-destroyed in time). With persistent tables a script throwing an error meant legacy code piling up. With common_schema 2.2 and on MySQL >= 5.5 all exceptions are caught, cleanup is made, leaving exceptions to be RESIGNALled.

TokuDB views

A couple TokuDB related views help out in converting to TokuDB and in figuring out tables status on disk:

  • sql_alter_table_tokudb will help you out to generate the complex ALTER statement to TokuDB engine if you happen to used COMPRESSED InnoDB tables with KEY_BLOCK_SIZE specified. The view generates a complex DROP KEYs & ADD KEYs statementl this is due to bug …
  • tokudb_file_map simplifies the INFORMATION_SCHEMA.Tokudb_file_map table: the original view is not normalized and is difficult to interpret and follow when your table had many indexes or is partitioned (I will write more on this shortly). with common_schema‘s tokudb_file_map you get, per table, the list of files representing that table, along with a couple Shell commands to tell you the thing you want to know most: “what is the size of my TokuDB table on disk?”

split

QueryScript’s split device now supports the index” parameter (or hint), which instructs the split() operation to use an explicitly named index. If used, the index must exist and must be UNIQUE.

table_rotate()

Rotate your tables a-la logrotate with table_rotate(): generate a new, identical, empty table, version your table, pushing older versions along the line; optionally drop older versions. You get the picture. Got some nice use case behind this on cleaning up a test database.

throw()

On MySQL >= 5.5 throw() uses SIGNAL. No more weird “table `Unknown column ‘$t’ in ‘field list’` does not exist” messages. Just plain old:

ERROR 1054 (42S22): Unknown column '$t' in 'field list'

Get it

common_schema is free and open source. It is licensed under GPL v2. This is where you can find and download latest common_schema release.

Your input is welcome! Please submit your bugs, or otherwise share your experience with common_schema.

]]>
https://shlomi-noach.github.io/blog/mysql/common_schema-2-2-better-queryscript-isolation-tokudb-table_rotate-split-params/feed 1 6468
common_schema roadmap thoughts https://shlomi-noach.github.io/blog/mysql/common_schema-roadmap-thoughts https://shlomi-noach.github.io/blog/mysql/common_schema-roadmap-thoughts#comments Mon, 22 Jul 2013 12:36:08 +0000 https://shlomi-noach.github.io/blog/?p=6435 I’m happy with common_schema; it is in fact a tool I use myself on an almost daily basis. I’m also happy to see that it gains traction; which is why I’m exposing a little bit of my thoughts on general future development. I’d love to get feedback.

Supported versions

At this moment, common_schema supports MySQL >= 5.1, all variants. This includes 5.5, 5.6, MySQL, Percona Server & MariaDB.

5.1 is today past end of line, and I’m really missing the SIGNAL/RESIGNAL syntax that I would like to use; I can do in the meanwhile with version-specific code such as /*!50500 … */. Nevertheless, I’m wondering whether I will eventually have to:

  • Support different branches of common_schema (one that supports 5.1, one that supports >= 5.5)
  • Stop support for 5.1

Of course community-wise, the former is preferred; but I have limited resources, so I would like to make a quick poll here:

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.

I’ll use the poll’s results as a vague idea of what people use and want. Or please use comments below to sound your voice!

rdebug

This was a crazy jump at providing a stored routine debugger and debugging API. From some talk I made I don’t see this getting traction. For the time being, I don’t see that I will concentrate my efforts on this. Actually it is almost complete. You can step-into, step-out, step-over, set breakpoints, read variables, modify variables — it’s pretty cool.

But someone will eventually have to write a GUI front-end for this (eclipse/IntelliJ/whatever); I know not many will use a command line approach for a debugger. I also know I’m not going to write the GUI front-end. So the API is there, let’s see how it rolls.

QueryScript

I will keep on improving QueryScript, and in particular split, error handling, and otherwise simplification of common tasks. I have no doubt QueryScript goes the right way: I just see how easy it is to solve complex problems with a QueryScript one-liner. Other bullets on my TODO for QueryScript:

  • Script tracking (a semi-debugging mechanism, which allows one to recognize status of script)
  • Message passing to scripts (again, a semi-debugger approach)
  • Error recovery; ability to resume script from point of failure or point of suspension. I have plenty use cases for that.

performance_schema

I will most probably include parts of Mark Leith’s ps_helper, which is released under a permissive license, and otherwise draw ideas from his work. I’m happy to learn parts of ps_helper were influenced by common_schema itself.

Hosting

common_schema will most probably move out of Google Code; by Jan 2014 there will no longer be a “Downloads” section, and I really, really, want there to be a “Downloads” section.

I could go LaunchPad, GitHub, BitBucket (they don’t have “Downloads” sections, either, do they?), other; any advice?

World domination

Yep. This is still common_schema‘s goal. More seriously, I would want to see it installed on every single MySQL server instance. Then I would control your fate. bwahahaha!

Even more seriously, if you are a happy user, please do pass the word. I can only blog so much and present so much; there are no financing resources for this project, and I need all the help I can get in promoting common_schema. Thank you!

]]>
https://shlomi-noach.github.io/blog/mysql/common_schema-roadmap-thoughts/feed 4 6435
common_schema 2.1 released: advanced & improved split(), persistent script tables, more schema analysis, and (ahem) charts! https://shlomi-noach.github.io/blog/mysql/common_schema-2-1-released-advanced-improved-split-persistent-script-tables-more-schema-analysis-and-ahem-charts https://shlomi-noach.github.io/blog/mysql/common_schema-2-1-released-advanced-improved-split-persistent-script-tables-more-schema-analysis-and-ahem-charts#comments Wed, 17 Jul 2013 18:57:06 +0000 https://shlomi-noach.github.io/blog/?p=6398 common_schema 2.1 is released! common_schema is your free & open source companion schema within your MySQL server, providing with a function library, scripting capabilities, powerful routines and ready-to-apply information and recommendations.

New and noteworthy in version 2.1:

  • Better QueryScript’s split() functionality
  • Persistent tables for QueryScript: no long held temporary tables
  • Index creation analysis, further range partition analysis
  • grant_access(): allow everyone to use common_schema
  • Ascii charts, google charts
  • debugged_routines: show routines with debug code

Other minor enhancements and bugfixes not listed.

Here’s a breakdown of the above:

split() enhancements

split is one of those parts of common_schema that (should) appeal to every DBA. Break a huge transaction automagically into smaller chunks, and don’t worry about how it’s done. If you like, throttle execution, or print progress, or…

split enhancements include:

  • A much better auto-detection-and-selection of the chunking index. split now consults all columns covered by the index, and uses realistic heuristics to decide which UNIQUE KEY on your table is best for the chunking process. A couple bugs are solved on the way; split is much smarter now.
  • Better support for multi-column chunking keys. You may now utilize the start/stop parameters even on multi column keys, passing a comma delimited of values for the split operation to start/end with, respectively. Also fixed issue for nonexistent start/stop values, which are now valid: split will just keep to the given range.
  • split no longer requires a temporary table open through the duration of its operation. See next section.

Persistent script tables

QueryScript used to use several temporary tables for its operation. Thus, a script could hold open two or three temporary tables for the entire execution duration. For long split operations, for example, this could mean hours and days.

Temporary tables are nice and quick to respond (well, MyISAM tables are, until MySQL 5.7 is out), but make for an inherent problem: stopped slaves must not shut down nor restart when replication has an open temporary table. Why? Well, because the slave forgets about the temporary tables. When it resumes operation, it will not recognize DML issued against those tables it has forgotten. That’s why oak-prepare-shutdown is so good for slaves.

When temporary tables are short-lived, this is typically not an issue. But if you are not allowed to restart your slave throughout a 24 hour operation, that’s a limitation.

As of 2.1, QueryScript does not require long held temporary tables. In fact, typical scripts do not create any temporary tables. A split operation creates and immediately drops a series of temporary tables. These are dropped even before actual split operation begins. All tables operated on are persistent InnoDB tables.

Result: safer script replication. There’s another nice side effect I may take advantage of in a later release: ability to monitor and control flow of concurrent scripts.

Schema analysis

Two noteworthy additions to schema analysis views:

  • sql_alter_table now includes the sql_drop_keys & sql_add_keys columns. For each table, you get the SQL statements to create and drop the existing indexes. I developed this when I hit this problem with TokuDB.
  • sql_range_partitions now includes the count_past_partitions & count_future_partitions; when your table is partitioned by some type of time range, these views tell you how many partitions are in the past, and how many are to be written to in the future. This turns useful when you want to rotate or otherwise set a retention policy for your range partitions.

grant_access()

The grant_access() routine GRANTs all accounts on your server with SELECT & EXECUTE privileges on common_schema. This is a quick complementary to the installation process (though you have to invoke it yourself; it’s up to you).

Ascii/google line charts

Laugh all you want! And find how cool it is to get (poor man’s) instant charting like this:

mysql> call line_chart('select ts, com_insert_psec, com_update_psec from mycheckpoint.sv_hour limit 100', 'insert per second, update per second') ;
+---------+------------------------------------------------------------------------------------------------------+
| y_scale | chart                                                                                                |
+---------+------------------------------------------------------------------------------------------------------+
| 162     | -#-------------------------------------------------------------------------------------------------- |
| 152     | ---------------------------------------------------------------------------------------------------- |
| 143     | ---------------------------------------------------------------------------------------------------- |
| 134     | ---------------------------------------------------------------------------------------------------- |
| 124     | ---------------------------------------------------------------------------------------------------- |
| 115     | ------------------------------------------------------------#--------------------------------------- |
| 106     | ---------------------------------------------------------------------------------------------------- |
| 96      | -*-------------------------------------------------------------------------------------------------- |
| 87      | ---------------------------------#-------------------------#---------------------------------------- |
| 77      | ---------------------------------------------------------------------------------#------------------ |
| 68      | ---------------------------------------------------------------------------#------------------------ |
| 59      | #-------------------------------#------------------------------------------------------------------- |
| 49      | ---##------#-#-##-#-#--#--###----------------##---------------------------------#-----#---###------- |
| 40      | --#------#--#-#--#-#-##-##----##--###########--######--------#############-*#-##--####-###---####### |
| 31      | *-**--#-#-*-**-**------**--**#-****-**-*****-*******-#---#-*------------**---#--*------------------- |
| 21      | ----*#*#*--*--*--******--**--**----*--*-----*-------**#*#**-************--#-****-******************* |
| 12      | -----*-*-*--------------------------------------------*-*-----------------*------------------------- |
|         | v::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::v |
|         | 2010-10-06 20:00:00                                                              2010-10-10 23:00:00 |
|         |     # insert per second                                                                              |
|         |     * update per second                                                                              |
+---------+------------------------------------------------------------------------------------------------------+

You can get the same in Google Image Charts format. Yes, it’s deprecated (and has been for a year — it’s still working)

mysql> call google_line_chart('select ts, com_insert_psec, com_update_psec from mycheckpoint.sv_hour limit 100', 'insert per second, update per second') \G

google_chart_url: http://chart.apis.google.com/chart?cht=lc&chs=800x350&chtt=SQL+chart+by+common_schema&chxt=x,y&chxr=1,11.9,161.7&chd=s:S9NOOGKFGKHQMONPONONNKNONNOOQINMRgLLNMMNNNNNNOONMNNNMHEFFJFfsLLMMMLLMNMMNNDVNIMKPaKLLMOMNNNONNNMMMMM,IiGGFCDBCBGFGGGGGFGGFEFGGGGGHDGGJGGGGGFGGGGGGGGGGGGGFCBCCCEHGGGFFFFFGGGFGGAKFDFFIFFFFFFFFFFFFFFFFFFF&chxs=0,505050,10,0,lt&chxl=0:|2010-10-06%2020:00:00||||||||||||||||||||||||2010-10-07%2020:00:00|||||||||||||||||||||||||2010-10-08%2021:00:00|||||||||||||||||||||||||2010-10-09%2022:00:00|||||||||||||||||||||||||2010-10-10%2023:00:00&chg=1.010101010,25,1,2,0,0&chco=ff8c00,4682b4&chdl=insert%20per%20second|update%20per%20second&chdlp=b

The above translates into the following image:

Throw you own query in. Make 1st column your ordering column, 2nd [, 3rd…] value columns. Provide your own legend. Watch it instantly. And laugh all you want.

Read more about common_schema’s charting routines.

debugged_routines

The new debugged_routines view shows you which routines are currently “compiled with debug mode“.

I will write more on the state of rdebug in a future post.

Try it, get it

  • common_schema 2.1 comes with over 500 tests and fast growing.
  • It supports MySQL 5.1, 5.5, 5.6, Percona Server and MariaDB.
  • It has superb documentation (may I say so?) with a lot of examples & drill down into edge cases.

You are free to download and use it.

Your feedback is welcome! Indeed, many of this version’s improvements originated with community feedback.

]]>
https://shlomi-noach.github.io/blog/mysql/common_schema-2-1-released-advanced-improved-split-persistent-script-tables-more-schema-analysis-and-ahem-charts/feed 4 6398