Performance – code.openark.org http://shlomi-noach.github.io/blog/ Blog by Shlomi Noach Mon, 01 Aug 2016 17:19:00 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.3 32412571 Introducing gh-ost: triggerless online schema migrations https://shlomi-noach.github.io/blog/mysql/introducing-gh-ost-triggerless-online-schema-migrations https://shlomi-noach.github.io/blog/mysql/introducing-gh-ost-triggerless-online-schema-migrations#comments Mon, 01 Aug 2016 17:19:00 +0000 https://shlomi-noach.github.io/blog/?p=7596 I’m thoroughly happy to introduce gh-ost: triggerless, controllable, auditable, testable, trusted online schema change tool released today by GitHub.

gh-ost now powers our production schema migrations. We hit some serious limitations using pt-online-schema-change on our large volume, high traffic tables, to the effect of driving our database to a near grinding halt or even to the extent of causing outages. With gh-ost, we are now able to migrate our busiest tables at any time, peak hours and heavy workloads included, without causing impact to our service.

gh-ost supports testing in production. It goes a long way to build trust, both in integrity and in control. Are your databases just too busy and you cannot run existing online-schema-change tools? Have you suffered outages due to migrations? Are you tired of babysitting migrations that run up to 3:00am? Tired of being the only one tailing logs? Please, take a look at gh-ost. I believe it changes online migration paradigm.

For a more thorough overview, please read the announcement on the GitHub Engineering Blog, and proceed to the documentation.

gh-ost is open sourced under the MIT license.

]]>
https://shlomi-noach.github.io/blog/mysql/introducing-gh-ost-triggerless-online-schema-migrations/feed 1 7596
The mystery of MySQL 5.6 excessive buffer pool flushing https://shlomi-noach.github.io/blog/mysql/the-mystery-of-mysql-5-6-excessive-buffer-pool-flushing https://shlomi-noach.github.io/blog/mysql/the-mystery-of-mysql-5-6-excessive-buffer-pool-flushing#comments Sun, 20 Apr 2014 05:16:13 +0000 https://shlomi-noach.github.io/blog/?p=6843 I’m experimenting with upgrading to MySQL 5.6 and am experiencing an unexplained increase in disk I/O utilization. After discussing this with several people I’m publishing in the hope that someone has an enlightenment on this.

We have a few dozens servers in a normal replication topology. On this particular replication topology we’ve already evaluated that STATEMENT based replication is faster than ROW based replication, and so we use SBR. We have two different workloads on our slaves, applied by two different HAProxy groups, on three different data centres. Hardware-wise, servers of two groups use either Virident SSD cards or normal SAS spindle disks.

Our servers are I/O bound. A common query used by both workloads looks up data that does not necessarily have a hotspot, and is very large in volume. DML is low, and we only have a few hundred statements per second executed on master (and propagated through replication).

We have upgraded 6 servers from all datacenters to 5.6, both on SSD and spindle disks, and are experiencing the following phenomena:

  • A substantial increase in disk I/O utilization. See a 10 day breakdown (upgrade is visible on 04/14) this goes on like this many days later:

    5.5-to-5.6-disk-utilization-10-days

  • A substantial increase in InnoDB buffer pool pages flush: Mr. Blue is our newly upgraded server; it joins Mr. Green upgraded a couple weeks ago. Mr. Red is still 5.5. This is the only MySQL graph that I could directly relate to the increase in I/O:

    5.5-to-5.6-rise-in-innodb-buffer-pool-pages-flushed

  • No change in workload (it takes some 60 minutes for caches to warm up, so measuring after that time). Same equal share of serving as dictated by HAProxy. Same amount of queries. Same amount of everything.
  • Faster replication speed, on single thread – that’s the good part! We see 30% and more improvement in replication speed. Tested by stopping SLAVE SQL_THREAD for a number of pre-defined minutes, then measuring time it took for slave to catch up, up to 10 seconds lag. The results vary depending on the time of day and serving workload on slaves, but it is consistently far faster with 5.6.

The faster replication speed motivates us to continue with the experiment, and is of a significant factor in our decision. However we are concerned about the I/O utilization and excessive flushing.

The above graphs depict the 5.6 status without any configuration changes as compared to 5.5. I took some days to reconfigure the following variables, with no change to the rate of flushed pages (though some changes visible in double-wite buffer writes):

  • innodb_log_file_size=128M/2G
  • innodb_adaptive_flushing:=0/1
  • innodb_adaptive_flushing_lwm:=0/70
  • innodb_max_dirty_pages_pct := 75/90
  • innodb_flush_neighbors:=0/1
  • innodb_max_dirty_pages_pct_lwm:=75/90
  • innodb_old_blocks_time:=0/1000
  • innodb_io_capacity:=50/100/200
  • innodb_io_capacity_max:=50/100/1000
  • relay_log_info_repository:=’table’/’file’
  • master_info_repository:=’table’/’file’
  • default_tmp_storage_engine:=’myisam’/’innodb’
  • eq_range_index_dive_limit:=0/10

And more… Have done patient one-by-one or combinations of the above where it made sense. As you see I began with the usual suspects and moved on to more esoteric stuff. I concentrated on new variables introduced in 5.6, or ones where the defaults have changed, or ones we have explicitly changed the defaults from.

The above is consistent on all upgraded servers. On SSD the disk utilization is lower, but still concerning.

Our use case is very different from the one presented by Yoshinori Matsunobu. and apparently not too many have experienced upgrading to 5.6. I’m hoping someone might shed some light.

]]>
https://shlomi-noach.github.io/blog/mysql/the-mystery-of-mysql-5-6-excessive-buffer-pool-flushing/feed 15 6843
TokuDB configuration variables of interest https://shlomi-noach.github.io/blog/mysql/tokudb-configuration-variables-of-interest https://shlomi-noach.github.io/blog/mysql/tokudb-configuration-variables-of-interest#comments Wed, 23 Oct 2013 17:42:12 +0000 https://shlomi-noach.github.io/blog/?p=6613 During our experiments I came upon a few TokuDB variables of interest; if you are using TokuDB you might want to look into these:

  • tokudb_analyze_time

This is a boundary on the number of seconds an ANALYZE TABLE will operate on each index on each partition on a TokuDB table.

That is, if tokudb_analyze_time = 5, and your table has 4 indexes (including PRIMARY) and 7 partitions, then the total runtime is limited to 5*4*7 = 140 seconds.

Default in 7.1.0: 5 seconds

  • tokudb_cache_size

Similar to innodb_buffer_pool_size, this variable sets the amount of memory allocated by TokuDB for caching pages. Like InnoDB the table is clustered within the index, so the cache includes pages for both indexes and data.

Default: 50% of total memory

  • tokudb_directio

Boolean, values are 0/1. Setting tokudb_directio = 1 is like specifying innodb_flush_method = O_DIRECT. Which in turn means the OS should not cache pages requested by TokuDB. Default: 0.

Now here’s the interesting part: we are used to tell InnoDB to get the most memory we can provide (because we want it to cache as much as it can) and to avoid OS caching (because that would mean a page would appear both in the buffer pool and in OS memory, which is a waste). So the following setup is common:

innodb_buffer_pool_size = [as much as you can allocate while leaving room for connection memory]G
innodb_flush_method = O_DIRECT

And my first instinct was to do the same for TokuDB. But after speaking to Gerry Narvaja of Tokutek, I realized it was not that simple. The reason TokuDB’s default memory allocation is 50% and not, say, 90%, is that OS cache caches the data in compressed form, while TokuDB cache caches data in uncompressed form. Which means if you limit the TokuDB cache, you allow for more cache to the OS, that is used to cache compressed data, which means more data (hopefully, pending duplicates) in memory.

I did try both options and did not see an obvious difference, but did not test this thoroughly. My current setup is:

#No setup. just keep to the default for both:
#tokudb_cache_size
#tokudb_directio
  • tokudb_commit_sync

  • tokudb_fsync_log_period

These two variable are similar in essence to innodb_flush_log_at_trx_commit, but allow for finer tuning. With innodb_flush_log_at_trx_commit you choose between syncing the transaction log to disk upon each commit and once per second. With tokudb_commit_sync = 1 (which is default) you get transaction log sync to disk per commit. When tokudb_commit_sync = 0, then tokudb_fsync_log_period dictates the interval between flushes. So a value of tokudb_fsync_log_period = 1000 means once per second.

Since our original InnoDB installation used innodb_flush_log_at_trx_commit = 2, our TokuDB setup is:

tokudb_commit_sync = 0
tokudb_fsync_log_period = 1000
  • tokudb_load_save_space

Turned on (value 1) by default as of TokuDB 7.1.0, this parameter decides whether temporary file created on bulk load operations (e.g. ALTER TABLE) are compressed or uncompressed. Do yourself a big favour (why? read here) and keep it on. Our setup is:

tokudb_load_save_space = 1

TokuDB’s general recommendation is: don’t change the variables; the engine should work well right out of the box. I like the approach (by MySQL 5.5 I already lost count of InnoDB variables that can have noticeable impact; with 5.6 I’m all but lost). The complete list of configuration variables is found in TokuDB’s Users Guide.

]]>
https://shlomi-noach.github.io/blog/mysql/tokudb-configuration-variables-of-interest/feed 14 6613
Converting an OLAP database to TokuDB, part 1 https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-1 https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-1#comments Tue, 03 Sep 2013 07:04:12 +0000 https://shlomi-noach.github.io/blog/?p=6473 This is the first in a series of posts describing my impressions of converting a large OLAP server to TokuDB. There’s a lot to tell, and the experiment is not yet complete, so this is an ongoing blogging. In this post I will describe the case at hand and out initial reasons for looking at TokuDB.

Disclosure: I have no personal interests and no company interests; we did get friendly, useful and free advice from Tokutek engineers. TokuDB is open source and free to use, though commercial license is also available.

The case at hand

We have a large and fast growing DWH MySQL setup. This data warehouse is but one component in a larger data setup, which includes Hadoop, Cassandra and more. For online dashboards and most reports, MySQL is our service. We populate this warehouse mainly via Hive/Hadoop. Thus, we have an hourly load of data from Hive, as well as a larger daily load.

There are some updates on the data, but the majority of writes are just mysqlimports of Hive queries.

Usage of this database is OLAP: no concurrency issues here; we have some should-be-fast-running queries issued by our dashboards, as well as ok-to-run-longer queries issued for reports.

Our initial and most burning trouble is with size. Today we use COMPRESSED InnoDB tables (KEY_BLOCK_SIZE is default, i.e. 8). Our data volume sums right now at about 2TB. I happen to know this translates as 4TB of uncompressed data.

However growth of data is accelerating. A year ago we would capture a dozen GB per month. Today it is a 100GB per month, and by the end of this year it may climb to 150GB per month or more.

Our data is not sharded. We have a simple replication topology of some 6 servers. Machines are quite generous as detailed following. And yet, we will be running out of resources shortly: disk space (total 2.7TB) is now running low and is expected to run out in about six months. One of my first tasks in Outbrain is to find a solution to our DWH growth problem. The solution could be sharding; it could be a commercial DWH product; anything that works.

The approach we experiment with

It was at my initial interview that I suggested TokuDB might be a good solution, with the primary reason of being so good with compression. And we decided to experiment with this simple (setup-wise) solution of compression. If we could compress the data even by 50%, that would buy us considerable time. And it’s the simplest approach as we would need to change nothing at the application side, nor add additional frameworks.

Of course, we were already using InnoDB COMPRESSED tables. How about just improving the compression? And here I thought to myself: we can try KEY_BLOCK_SIZE=4, which I know would generally compress by 50% as compared to KEY_BLOCK_SIZE=8 (not always, but in many use cases). We’re already using InnoDB so this isn’t a new beast; it will be “more of the same”. It would work.

I got myself a dedicated machine: a slave in our production topology I am free to play with. I installed TokuDB 7.0.1, later upgraded to 7.0.3, based on MySQL 5.5.30.

The machine is a Dell Inc. PowerEdge R510 machine, with 16 CPUs @ 2.1 GHz and 126 GiB RAM, 16 GiB Swap. OS is CentOS 5.7,  kernel 2.6.18. We have RAID 10 over local 10k RPM SAS disks (10x600GB disks)

How to compare InnoDB & TokuDB?

2TB of compressed data (for absolute measurement I consider it to be a 4TB worth of data) is quite a large setup. How do I do the comparison? I don’t even have too much disk space here…

We have tables of various size. Our largest is in itself 1TB (2TB uncompressed) – half of the entire volume. The rest ranging 330GB, 140GB, 120GB, 90GB, 50GB and below. We have MONTHly partitioning schemes on most tables and obviously on our larger tables.

For our smaller tables, we could just CREATE TABLE test_table LIKE small_table, populating it and comparing compression. However, the really interesting question (and perhaps the only interesting question compression-wise) is how well would our larger (and specifically largest) tables would compress.

Indeed, for our smaller tables we saw between 20% to 70% reduction in size when using stronger InnoDB compression: KEY_BLOCK_SIZE=4/2/1. How well would that work on our larger tables? How much slower would it be?

We know MySQL partitions are implemented by actual independent tables. Our testing approach was: let’s build a test_table from a one month worth of data (== one single partition) of our largest table. We tested:

  • The time it takes to load the entire partition (about 120M rows, 100GB COMPRESSED data as seen on .idb file)
  • The time it would take to load a single day’s worth of data from Hive/Hadoop (loading real data, as does our nightly import)
  • The time it would take for various important SELECT query to execute on this data.

InnoDB vs. TokuDB comparison

In this post I will only describe our impressions of compression size. I have a lot to say about TokuDB vs InnoDB partitioning and queries; this will wait till later post.

So here goes:

Engine Compression Time to Insert 1 month Table size (optimized) Time to import 1 day
InnoDB 8k 10.5h 58GB 32m
InnoDB 4k 48h 33GB unknown (too long)
TokuDB quicklz 14h 17GB 40m
TokuDB lzma (small/aggresive) 15h 7.5GB 42m

Some comments and insights:

  • Each test was performed 3-4 times. There were no significant differences on the various cycles.
  • The 1 month insert was done courtesy QueryScript split5,000 rows at a time, no throttling.
  • The 1 day import via mysqlimport. There were multiple files imported. Each file is sorted by PRIMARY KEY ASC.
  • Isn’t it nice to know that your 100GB InnoDB table actually fits within 58GB when rebuilt?
  • For InnoDB flush_logs_at_trx_commit=2, flush_method=O_DIRECT.
  • I used default configuration to TokuDB — touched nothing. More on this in later post.
  • InnoDB 4k was prohibitively slow to load data. It was so slow so as to be unacceptable. For the 1 day load it took 1 hour for a mere 20% of data to load. 1 hour was already marginal for our requirements; waiting for 5 hours was out of the question. I tested several times, never got to wait for completion. Did I say it would just be “more of the same”? 4k turned to be “not an option”.
  • I saw almost no difference in load time between the two TokuDB compression formats. Both somewhat (30%) longer than InnoDB to load, but comparable.
  • TokuDB compression: nothing short of amazing.

With InnoDB 4k being “not an option”, and with both TokuDB compressions being similar in load time yet so different in compression size, we are left with the following conclusion: if we want to compress more than our existing 8k (and we have to) – TokuDB’s agressive compression (aka small, aka lzma) is our only option.

Shameless plug

common_schema turned to be quite the “save the day” tool here. Not only did we use it to extract 100GB of data from a large dataset and load it onto our tables, it also helped out in the ALTER process for TokuDB: at this time (<= 7.0.4) TokuDB still has a bug with KEY_BLOCK_SIZE: when this option is found in table definition, it impacts TokuDB’s indexes by bloating them. This is how sql_alter_table_tokudb was born. Hopefully it will be redundant shortly.

More to come

Was our test fair? Should we have configure TokuDB differently? Is loading via small 5,000 row chunks the right way?

In the next post I will describe the process of migrating our 4TB worth of data to TokuDB, pitfalls, issues, party crushers, sport spoilers, configuration, recovery, cool behaviour and general advice you should probably want to embrace. At later stage I’ll describe how our DWH looks after migration. Finally I’ll share some (ongoing) insights on performance.

You’ll probably want to know “How much is (non compressed) 4TB of data worth in TokuDB?” Let’s keep the suspense 🙂

]]>
https://shlomi-noach.github.io/blog/mysql/converting-an-olap-database-to-tokudb-part-1/feed 8 6473
DELETE, don’t INSERT https://shlomi-noach.github.io/blog/mysql/delete-dont-insert https://shlomi-noach.github.io/blog/mysql/delete-dont-insert#comments Wed, 27 Jun 2012 05:25:09 +0000 https://shlomi-noach.github.io/blog/?p=5008 Have just read INSERT, Don’t DELETE by Aaron Brown, and have some lengthy response, which is why I write this post instead of commenting on said post.

I wish to offer my counter thought and suggest that DELETEs are probably the better choice.

Aaron suggests that, when one wishes to purge rows from some table, a trick can be used: instead of DELETEing unwanted rows, one can INSERT “good” rows into a new table, then switch over with RENAME (but please read referenced post for complete details).

I respectfully disagree on several points discussed.

Lockdown

The fact one needs to block writes during the time of creation of new table is problematic: you need to essentially turn off parts of your application. The posts suggests one could use a slave – but this solution is far from being trivial as well. To switch over, you yet again need to turn off access to DB, even if for a short while.

A switch over to a slave is quite a big deal, in my opinion, for the mere purpose of deletion of rows.

DELETEs are easy

The DELETEs are so much easier: the first thing to note is the following: You don’t actually have to delete all the rows *at once*.

You just need to drop some rows, right? Why waste a huge transaction that takes minutes, when you can drop the rows by chunks, one at a time?
For that, you can use either pt-archive from Percona Toolkit, oak-chunk-update from openark-kit, or write a simple QueryScript code with common_schema:

while (DELETE FROM title WHERE title <= 'g' LIMIT 1000)
{
  throttle 1;
}

So, drop 1,000 rows or so at a time, then sleep some time, etc. The total runtime is longer, but who cares? The impact can be reduced to be unnoticeable.

Space reclaim

You can use online table operations to rebuild your table and reclaim the disk space. Either see oak-online-alter-table or pt-online-schema-change. Again, both work in small chunks, so no long stalls.

But more on this: my usual purge scenario shows that it is repetitive. You purge, data fills again, you purge again, and so on.

Which is why it doesn’t make much sense to rebuild the table and reclaim the disk space: it just grows again to roughly same dimensions.
For a one time operation (e.g. after neglect of cleanup for long time) — yes, absolutely, do a rebuild and reclaim. For repetitive cleanup – I don’t bother.

Conclusion

Aaron does make note at the end of his post that DELETE operations can be done online, while the INSERT trick requires downtime, and this is a fair assessment.

But just to make a point: none of the DELETE timings are interesting. Since we are not concerned with deleting the rows in a given time (no “press the red button”), we can spread them over time and make the impact negligible. So not only is everything done online, it also goes unnoticed by the user. And this, I believe, is the major thing to consider.

]]>
https://shlomi-noach.github.io/blog/mysql/delete-dont-insert/feed 6 5008
Self throttling MySQL queries https://shlomi-noach.github.io/blog/mysql/self-throttling-mysql-queries https://shlomi-noach.github.io/blog/mysql/self-throttling-mysql-queries#comments Tue, 01 Nov 2011 07:55:47 +0000 https://shlomi-noach.github.io/blog/?p=4294 Recap on the problem:

  • A query takes a long time to complete.
  • During this time it makes for a lot of I/O.
  • Query’s I/O overloads the db, making for other queries run slow.

I introduce the notion of self-throttling queries: queries that go to sleep, by themselves, throughout the runtime. The sleep period means the query does not perform I/O at that time, which then means other queries can have their chance to execute.

I present two approaches:

  • The naive approach: for every 1,000 rows, the query sleep for 1 second
  • The factor approach: for every 1,000 rows, the query sleeps for the amount of time it took to iterate those 1,000 rows (effectively doubling the total runtime of the query).

Sample query

We use a simple, single-table scan. No aggregates (which complicate the solution considerably).

SELECT
  rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days
FROM
  sakila.rental
;

The naive solution

We need to know every 1,000 rows. So we need to count the rows. We do that by using a counter, as follows:

SELECT
  rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days,
  @row_counter := @row_counter + 1
FROM
  sakila.rental,
  (SELECT @row_counter := 0) sel_row_counter
;

A thing that bothers me, is that I wasn’t asking for an additional column. I would like the result set to remain as it were; same result structure. We also want to sleep for 1 second for each 1,000 rows. So we merge the two together along with one of the existing columns, like this:

SELECT
  rental_id +
    IF(
      (@row_counter := @row_counter + 1) % 1000 = 0,
      SLEEP(1), 0
    ) AS rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days
FROM
  sakila.rental,
  (SELECT @row_counter := 0) sel_row_counter
;

To remain faithful to my slides, I rewrite as follows, and this is the naive solution:

SELECT
  rental_id +
    CASE
      WHEN (@row_counter := @row_counter + 1) % 1000 = 0 THEN SLEEP(1)
      ELSE 0
    END AS rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days
FROM
  sakila.rental,
  (SELECT @row_counter := 0) sel_row_counter
;

The WHEN clause always returns 0, so it does not affect the value of rental_id.

The factor approach

In the factor approach we wish to keep record of query execution, every 1,000 rows. I introduce a nested WHEN statement which updates time records. I rely on SYSDATE() to return the true time, and on NOW() to return query execution start time.

SELECT
  rental_id +
    CASE
      WHEN (@row_counter := @row_counter + 1) IS NULL THEN NULL
      WHEN @row_counter % 1000 = 0 THEN
        CASE
          WHEN (@time_now := SYSDATE()) IS NULL THEN NULL
          WHEN (@time_diff := (TIMESTAMPDIFF(SECOND, @chunk_start_time, @time_now))) IS NULL THEN NULL
          WHEN SLEEP(@time_diff) IS NULL THEN NULL
          WHEN (@chunk_start_time := SYSDATE()) IS NULL THEN NULL
          ELSE 0
        END
      ELSE 0
    END AS rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days
FROM
  sakila.rental,
  (SELECT @row_counter := 0) sel_row_counter,
  (SELECT @chunk_start_time := NOW()) sel_chunk_start_time
;

Proof

How can we prove that the queries do indeed work?

We can see if the total runtime sums up to the number of sleep calls, in seconds; but how do we know that sleeps do occur at the correct times?

A solution I offer is to use a stored routines which logs to a MyISAM table (a non transactional table) the exact time (using SYSDATE()) and value per row. The following constructs are introduced:

CREATE TABLE test.proof(
  id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  dt DATETIME NOT NULL,
  msg VARCHAR(255)
) ENGINE=MyISAM;

DELIMITER $$
CREATE FUNCTION test.prove_it(message VARCHAR(255)) RETURNS TINYINT
DETERMINISTIC
MODIFIES SQL DATA
BEGIN
  INSERT INTO test.proof (dt, msg) VALUES (SYSDATE(), message); RETURN 0;
END $$
DELIMITER ;

The prove_it() function records the immediate time and a message into the MyISAM table, which immediately accepts the write, being non-transactional. It returns with 0, so we will now embed it within the query. Of course, the function itself incurs some overhead, but it will nevertheless convince you that SLEEP()s do occur at the right time!

SELECT
  rental_id +
    CASE
      WHEN (@row_counter := @row_counter + 1) IS NULL THEN NULL
      WHEN @row_counter % 1000 = 0 THEN
        CASE
          WHEN (@time_now := SYSDATE()) IS NULL THEN NULL
          WHEN (@time_diff := (TIMESTAMPDIFF(SECOND, @chunk_start_time, @time_now))) IS NULL THEN NULL
          WHEN SLEEP(@time_diff) + test.prove_it(CONCAT('will sleep for ', @time_diff, ' seconds')) IS NULL THEN NULL
          WHEN (@chunk_start_time := SYSDATE()) IS NULL THEN NULL
          ELSE 0
        END
      ELSE 0
    END AS rental_id,
  TIMESTAMPDIFF(DAY, rental_date, return_date) AS rental_days
FROM
  sakila.rental,
  (SELECT @row_counter := 0) sel_row_counter,
  (SELECT @chunk_start_time := NOW()) sel_chunk_start_time
;

mysql> SELECT * FROM test.proof;
+----+---------------------+--------------------------+
| id | dt                  | msg                      |
+----+---------------------+--------------------------+
|  1 | 2011-11-01 09:22:36 | will sleep for 1 seconds |
|  2 | 2011-11-01 09:22:36 | will sleep for 0 seconds |
|  3 | 2011-11-01 09:22:36 | will sleep for 0 seconds |
|  4 | 2011-11-01 09:22:36 | will sleep for 0 seconds |
|  5 | 2011-11-01 09:22:36 | will sleep for 0 seconds |
|  6 | 2011-11-01 09:22:36 | will sleep for 0 seconds |
|  7 | 2011-11-01 09:22:38 | will sleep for 1 seconds |
|  8 | 2011-11-01 09:22:38 | will sleep for 0 seconds |
|  9 | 2011-11-01 09:22:38 | will sleep for 0 seconds |
| 10 | 2011-11-01 09:22:38 | will sleep for 0 seconds |
| 11 | 2011-11-01 09:22:38 | will sleep for 0 seconds |
| 12 | 2011-11-01 09:22:40 | will sleep for 1 seconds |
| 13 | 2011-11-01 09:22:40 | will sleep for 0 seconds |
| 14 | 2011-11-01 09:22:40 | will sleep for 0 seconds |
| 15 | 2011-11-01 09:22:40 | will sleep for 0 seconds |
+----+---------------------+--------------------------+

The above query is actually very fast. Try adding BENCHMARK(1000,ENCODE(‘hello’,’goodbye’)) to rental_id so as to make it slower, or just use it on a really large table, see what happens (this is what I actually used to make the query run for several seconds in the example above).

Observant reads will note that the “will sleep…” message actually gets written after the SLEEP() call. I leave this as it is.

Another very nice treat of the code is that you don’t need sub-second resolution for it to work. If you look at the above, we don’t actually go to sleep every 1,000 rows (1,000 is just too quick in the query — perhaps I should have used 10,000 seconds). But we do make it once a second has elapsed. Which means it works correctly on average. Of course, the entire discussion is only of interest when a query executes for a substantial number of seconds, so this is just an anecdote.

And the winner is…

Wow, this contest was anything but popular. Marc Alff is the obvious winner: he is the only one to suggest a solution 🙂

But Marc uses a very nice trick: he reads the PERFORMANCE_SCHEMA. Now, I’m not sure how the PERFORMANCE_SCHEMA gets updated. I know that the INFORMATION_SCHEMA.GLOBAL_STATUS table does not get updated by a query until the query completes (so you cannot expect a change in innodb_rows_read throughout the execution of the query). I just didn’t test it (homework, anyone?). If it does get updated, then we can throttle the query based on InnoDB page reads using a simple query. Otherwise, an access to /proc/diskstats is possible, assuming no apparmor or SELinux are blocking us.

Marc also uses a stored function, which is the clean way of doing it; however I distrust the overhead incurred by s stored routine and prefer my solution (which is, admittedly, not a pretty SQL sight!).

Happy throttling!

]]>
https://shlomi-noach.github.io/blog/mysql/self-throttling-mysql-queries/feed 5 4294
On generating unique IDs using LAST_INSERT_ID() and other tools https://shlomi-noach.github.io/blog/mysql/on-generating-unique-ids-using-last_insert_id-and-other-tools https://shlomi-noach.github.io/blog/mysql/on-generating-unique-ids-using-last_insert_id-and-other-tools#comments Wed, 02 Feb 2011 06:50:02 +0000 https://shlomi-noach.github.io/blog/?p=3283 There’s a trick for using LAST_INSERT_ID() to generate sequences in MySQL. Quoting from the Manual:

  1. Create a table to hold the sequence counter and initialize it:
    mysql> CREATE TABLE sequence (id INT NOT NULL);
    mysql> INSERT INTO sequence VALUES (0);
    
  2. Use the table to generate sequence numbers like this:
    mysql> UPDATE sequence SET id=LAST_INSERT_ID(id+1);
    mysql> SELECT LAST_INSERT_ID();
    

This trick calls for trouble.

Contention

A customer was using this trick to generate unique session IDs for his JBoss sessions. These IDs would eventually be written back to the database in the form of log events. Business go well, and one day the customer adds three new JBoss servers (doubling the amount of webapps). All of a sudden, nothing works quite as it used to. All kinds of queries take long seconds to complete; load average becomes very high.

A short investigation reveals that a very slight load is enough to make for an accumulation of sequence-UPDATE queries. Dozens of them are active at any given time, waiting for long seconds.

InnoDB or MyISAM both make for poor response times. No wonder! Everyone’s contending for one lock.

Not just one

Other queries seem to hang as well. Why?

It is easy to forget or let go unnoticed that there are quite a few global locks involved with each query. If query cache is activated, then any query must pass through that cache, holding the query cache mutex. There’s a global mutex on MyISAM’s key cache. There’s one on InnoDB’s buffer pool (see multiple buffer pools in InnoDB 5.5), albeit less of an overhead. And there’s the table cache.

When table cache is enabled, any completed query attempts to return file handles to the cache. Any new query attempts to retrieve handles from the cache. While writing to the cache (extracting, adding), the cache is locked. When everyone’s busy doing the sequence-UPDATE, table cache lock is being abused. Other queries are unable to find the time to squire the lock and get on with their business.

What can be done?

One could try and increase the table_open_cache value. That may help to some extent, and for limited time. But the more requests are made, the quicker the problem surfaces again. When, in fact, reducing the table_open_cache to zero (well, minimum value is 1) can make for a great impact. If there’s nothing to fight for, everyone just get by on their own.

I know the following is not a scientific explanation, but it hits me as a good comparison: when my daughter brings a friend over, and there’s a couple of toys, both are happy. A third friend makes for a fight: “I saw it first! She took it from me! I was holding it!”. Any parent knows the ultimate solution to this kind of fight: take away the toys, and have them find something else to enjoy doing. OK, sorry for this unscientific display, I had to share my daily stress.

When no table cache is available, a query will go on opening the table by itself, and will not attempt to return the file handle back to the cache. The file handle will simply be destroyed. Now, usually this is not desired. Caching is good. But in our customer’s case, the cost of not using a table cache was minified by the cost of having everyone fight for the sequence table. Reducing the table cache made for an immediate relaxation of the database, with observable poorer responsiveness on peak times, however way better than with large table cache.

Other tools?

I don’t consider the above to be a good solution. It’s just a temporary hack.

I actually don’t like the LAST_INSERT_ID() trick. Moreover, I don’t see that it’s the database’s job to provide with unique IDs. Let it do relational stuff. If generating IDs is too intensive, let someone else do it.

NoSQL solutions provide such a service. Memcached, redis, MongoDB (and probably more) all provide with increment functions. Check them out.

Application level solutions

I actually use an application level solution to generate unique IDs. I mean, there’s always GUID(), but it’s result is just too long. Take a look at the following Java code:

public class Utils {
  private static long lastUniqueNumber = 0;

  public static synchronized long uniqueNumber() {
    long unique = System.currentTimeMillis();
    if (unique <= lastUniqueNumber)
      unique = lastUniqueNumber + 1;
    lastUniqueNumber = unique;
    return unique;
  }
}

Within a Java application this above method returns with unique IDs, up to 1000 per second on average (and it can perform way more than 1000 times per second).

On consequential executions of applications on the same machine one would still expect unique values due to the time-related nature of values. However, computer time changes. It’s possible that System.currentTimeMillis() would return a value already used in the past.

And, what about two processes running on the same machine at the same time? Or on different machine?

Which is why I use the following combination to generate my unique IDs:

  • Server ID (much like MySQL’s server_id parameter). this could be the last byte in the server’s IP address, or just 4 or 5 bits if not too many players are expected.
  • Process ID (plain old pid) which I pass to the Java runtime in the form of system properties. Any two processes running on the same machine are assured to have different IDs. Two consequently spawned processes will have different IDs. The time it would take to cycle the process IDs is way more than would make for a “time glitch” problem as described above
  • Current time in milliseconds.

If you have to have everything withing 64 bit (BIGINT) then you’ll have to do bit manipulation, and drop some of the MSB on the milliseconds so as to overwrite with server & process IDs.

If you are willing to have your IDs unique in the bounds of a given time frame (so, for example, a month from now you wouldn’t mind reusing old IDs), then the problem is significantly easier. You may just use “day of month” and “millis since day start” and save those precious bits.

Still other?

Please share your solutions below!

]]>
https://shlomi-noach.github.io/blog/mysql/on-generating-unique-ids-using-last_insert_id-and-other-tools/feed 7 3283
How often should you use OPTIMIZE TABLE? – followup https://shlomi-noach.github.io/blog/mysql/how-often-should-you-use-optimize-table-followup https://shlomi-noach.github.io/blog/mysql/how-often-should-you-use-optimize-table-followup#comments Mon, 04 Oct 2010 08:07:45 +0000 https://shlomi-noach.github.io/blog/?p=2882 This post follows up on Baron’s How often should you use OPTIMIZE TABLE?. I had the opportunity of doing some massive purging of data from large tables, and was interested to see the impact of the OPTIMIZE operation on table’s indexes. I worked on some production data I was authorized to provide as example.

The use case

I’ll present a single use case here. The table at hand is a compressed InnoDB table used for logs. I’ve rewritten some column names for privacy:

mysql> show create table logs \G

Create Table: CREATE TABLE `logs` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(20) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
 `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 `origin` varchar(64) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
 `message` text NOT NULL,
 `level` tinyint(11) NOT NULL DEFAULT '0',
 `s` char(16) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT '',
 PRIMARY KEY (`id`),
 KEY `s` (`s`),
 KEY `name` (`name`,`ts`),
 KEY `origin` (`origin`,`ts`)
) ENGINE=InnoDB AUTO_INCREMENT=186878729 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8

The table had log records starting 2010-08-23 and up till 2010-09-02 noon. Table status:

mysql> show table status like 'logs'\G
*************************** 1. row ***************************
           Name: logs
         Engine: InnoDB
        Version: 10
     Row_format: Compressed
           Rows: 22433048
 Avg_row_length: 206
    Data_length: 4625285120
Max_data_length: 0
   Index_length: 1437073408
      Data_free: 4194304
 Auto_increment: 186878920
    Create_time: 2010-08-24 18:10:49
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options: row_format=COMPRESSED KEY_BLOCK_SIZE=8
        Comment:

(A bit puzzled on the Create_time; the table was taken from an LVM snapshot of another server, so it existed for a very long time before. Not sure why the Create_time field is as it is here; I assume the MySQL upgrade marked it so, did not have the time nor need to look into it).

I was using Percona-Server-5.1.47-11.2, and so was able to look at the index statistics for that table:

mysql> SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name='logs';
+--------------+------------+--------------+--------+----------------+------------+------------+
| table_schema | table_name | index_name   | fields | row_per_keys   | index_size | leaf_pages |
+--------------+------------+--------------+--------+----------------+------------+------------+
| newsminer    | logs       | PRIMARY      |      1 | 1              |     282305 |     246856 |
| newsminer    | logs       | s            |      2 | 17, 1          |      38944 |      33923 |
| newsminer    | logs       | name         |      3 | 2492739, 10, 2 |      22432 |      19551 |
| newsminer    | logs       | origin       |      3 | 1303, 4, 1     |      26336 |      22931 |
+--------------+------------+--------------+--------+----------------+------------+------------+

Status after massive purge

My first requirement was to purge out all record up to 2010-09-01 00:00:00. I did so in small chunks, using openark kit‘s oak-chunk-update (same can be achieved with maatkit‘s mk-archiver). The process purged 1000 rows at a time, with some sleep in between, and ran for about a couple of hours. It may be interesting to note that since ts is in monotonically ascending values, purging of old rows also means purging of lower PKs, which means we’re trimming the PK tree from left.

Even while purging took place, I could see the index_size/leaf_pages values dropping, until, finally:

mysql> SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name='logs';
+--------------+------------+--------------+--------+--------------+------------+------------+
| table_schema | table_name | index_name   | fields | row_per_keys | index_size | leaf_pages |
+--------------+------------+--------------+--------+--------------+------------+------------+
| newsminer    | logs       | PRIMARY      |      1 | 1            |      40961 |      35262 |
| newsminer    | logs       | s            |      2 | 26, 1        |      34440 |       3798 |
| newsminer    | logs       | name         |      3 | 341011, 4, 1 |       4738 |       2774 |
| newsminer    | logs       | origin       |      3 | 341011, 4, 2 |      10178 |       3281 |
+--------------+------------+--------------+--------+--------------+------------+------------+

The number of deleted rows was roughly 85% of total rows, so down to 15% number of rows.

Status after OPTIMIZE TABLE

Time to see whether OPTIMIZE really optimizes! Will it reduce number of leaf pages in PK? In secondary keys?

mysql> OPTIMIZE TABLE logs;
...
mysql> SELECT * FROM information_schema.INNODB_INDEX_STATS WHERE table_name='logs';
+--------------+------------+--------------+--------+--------------+------------+------------+
| table_schema | table_name | index_name   | fields | row_per_keys | index_size | leaf_pages |
+--------------+------------+--------------+--------+--------------+------------+------------+
| newsminer    | logs       | PRIMARY      |      1 | 1            |      40436 |      35323 |
| newsminer    | logs       | s            |      2 | 16, 1        |       5489 |       4784 |
| newsminer    | logs       | name         |      3 | 335813, 7, 1 |       3178 |       2749 |
| newsminer    | logs       | origin       |      3 | 335813, 5, 2 |       3951 |       3446 |
+--------------+------------+--------------+--------+--------------+------------+------------+
4 rows in set (0.00 sec)

The above shows no significant change in either of the indexes: not for index_size, not for leaf_pages, not for statistics (row_per_keys). The OPTIMIZE did not reduce index size. It did not reduce the number of index pages (leaf_pages are the major factor here). Some leaff_pages values have even increased, but in small enough margin to consider as equal.

Index-wise, the above example does not show an advantage to using OPTIMIZE. I confess, I was surprised. And for the better. This indicates InnoDB makes good merging of index pages after massive purging.

So, no use for OPTIMIZE?

Think again: file system-wise, things look different.

Before purging of data:

bash:~# ls -l logs.* -h
-rw-r----- 1 mysql mysql 8.6K 2010-08-15 17:40 logs.frm
-rw-r----- 1 mysql mysql 2.9G 2010-09-02 14:01 logs.ibd

After purging of data:

bash:~# ls -l logs.* -h
-rw-r----- 1 mysql mysql 8.6K 2010-08-15 17:40 logs.frm
-rw-r----- 1 mysql mysql 2.9G 2010-09-02 14:21 logs.ibd

Recall that InnoDB never releases table space back to file system!

After OPTIMIZE on table:

bash:~# ls -l logs.* -h
-rw-rw---- 1 mysql mysql 8.6K 2010-09-02 14:26 logs.frm
-rw-rw---- 1 mysql mysql 428M 2010-09-02 14:43 logs.ibd

On innodb_file_per_table an OPTIMIZE creates a new table space, and the old one gets destroyed. Space goes back to file system. Don’t know about you; I like to have my file system with as much free space as possible.

Need to verify

I’ve tested Percona Server, since this is where I can find INNODB_INDEX_STATS. But this begs the following questions:

  • Perhaps the results only apply for Percona Server? (I’m guessing not).
  • Or only for InnoDB plugin? Does the same hold for “builtin” InnoDB? (dunno)
  • Only on >= 5.1? (Maybe; 5.0 is becoming rare now anyway)
  • Only on InnoDB (Well, of course this test is storage engine dependent!)

Conclusion

The use case above is a particular example. Other use cases may include tables where deletions often occur in middle of table (remember we were trimming the tree from left side only). Other yet may need to handle UPDATEs to indexed columns. I have some more operations to do here, with larger tables (e.g. 40GB compressed). If anything changes, I’ll drop a note.

]]>
https://shlomi-noach.github.io/blog/mysql/how-often-should-you-use-optimize-table-followup/feed 7 2882
Views: better performance with condition pushdown https://shlomi-noach.github.io/blog/mysql/views-better-performance-with-condition-pushdown https://shlomi-noach.github.io/blog/mysql/views-better-performance-with-condition-pushdown#comments Thu, 20 May 2010 05:17:05 +0000 https://shlomi-noach.github.io/blog/?p=1328 Justin’s A workaround for the performance problems of TEMPTABLE views post on mysqlperformanceblog.com reminded me of a solution I once saw on a customer’s site.

The customer was using nested views structure, up to depth of some 8-9 views. There were a lot of aggregations along the way, and even the simplest query resulted with a LOT of subqueries, temporary tables, and vast amounts of data, even if only to return with a couple of rows.

While we worked to solve this, a developer showed me his own trick. His trick is now impossible to implement, but there’s a hack around this.

Let’s use the world database to illustrate. Look at the following view definition:

CREATE
  ALGORITHM=TEMPTABLE
VIEW country_languages AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON (Country.CODE = CountryLanguage.CountryCode)
  GROUP BY
    Country.CODE;

The view presents with a list of spoken languages per country. The execution plan for querying this view looks like this:

mysql> EXPLAIN SELECT * FROM country_languages;
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref                               | rows | Extra                                        |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
|  1 | PRIMARY     | <derived2>      | ALL    | NULL          | NULL    | NULL    | NULL                              |  233 |                                              |
|  2 | DERIVED     | CountryLanguage | index  | PRIMARY       | PRIMARY | 33      | NULL                              |  984 | Using index; Using temporary; Using filesort |
|  2 | DERIVED     | Country         | eq_ref | PRIMARY       | PRIMARY | 3       | world.CountryLanguage.CountryCode |    1 |                                              |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+

And, even if we only want to filter out a single country, we still get the same plan:

mysql> EXPLAIN SELECT * FROM country_languages WHERE Code='USA';
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref                               | rows | Extra                                        |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+
|  1 | PRIMARY     | <derived2>      | ALL    | NULL          | NULL    | NULL    | NULL                              |  233 | Using where                                  |
|  2 | DERIVED     | CountryLanguage | index  | PRIMARY       | PRIMARY | 33      | NULL                              |  984 | Using index; Using temporary; Using filesort |
|  2 | DERIVED     | Country         | eq_ref | PRIMARY       | PRIMARY | 3       | world.CountryLanguage.CountryCode |    1 |                                              |
+----+-------------+-----------------+--------+---------------+---------+---------+-----------------------------------+------+----------------------------------------------+

So, we need to scan the entire country_language and country tables in order to return results for just one row.

A non-working solution

The solution offered by the developer was this:

CREATE
  ALGORITHM=MERGE
  VIEW country_languages_non_working AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON
      (Country.CODE = CountryLanguage.CountryCode)
  WHERE
    Country.CODE = @country_code
  GROUP BY Country.CODE;

And follow by:

mysql> SET @country_code='USA';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM country_languages_2;
+------+---------------+----------------------------------------------------------------------------------------------------+
| CODE | country       | languages                                                                                          |
+------+---------------+----------------------------------------------------------------------------------------------------+
| USA  | United States | Chinese,English,French,German,Italian,Japanese,Korean,Polish,Portuguese,Spanish,Tagalog,Vietnamese |
+------+---------------+----------------------------------------------------------------------------------------------------+

So, pushdown a WHERE condition into the view’s definition. The session variable @country_code is used to filter rows. In the above simplified code the value is assumed to be set; tweak it as you see fit (using IFNULL, for example, or OR statements) to allow for full scan in case the variable is undefined.

This doesn’t work. It used to work a couple years back; but today you cannot create a view which uses session variables or parameters. It is a restriction imposed by views.

A workaround

Justin showed a workaround using an additional table. There is another workaround which does not involve tables, but rather stored routines. Now, this is a patch, and an ugly one. It may not work in future versions of MySQL for all I know. But, here it goes:

DELIMITER $$
CREATE DEFINER=`root`@`localhost` FUNCTION `get_session_country`() RETURNS CHAR(3)
    NO SQL
    DETERMINISTIC
BEGIN
  RETURN @country_code;
END $$
DELIMITER ;

CREATE
  ALGORITHM=MERGE
  VIEW country_languages_2 AS
  SELECT
    Country.CODE, Country.Name AS country,
    GROUP_CONCAT(CountryLanguage.Language) AS languages
  FROM
    world.Country
    JOIN world.CountryLanguage ON
      (Country.CODE = CountryLanguage.CountryCode)
  WHERE
    Country.CODE = get_session_country()
  GROUP BY Country.CODE;

And now:

mysql> SET @country_code='USA';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM country_languages_2;
+------+---------------+----------------------------------------------------------------------------------------------------+
| CODE | country       | languages                                                                                          |
+------+---------------+----------------------------------------------------------------------------------------------------+
| USA  | United States | Chinese,English,French,German,Italian,Japanese,Korean,Polish,Portuguese,Spanish,Tagalog,Vietnamese |
+------+---------------+----------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> EXPLAIN SELECT * FROM country_languages_2;
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table           | type   | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+
|  1 | PRIMARY     | <derived2>      | system | NULL          | NULL    | NULL    | NULL |    1 |                          |
|  2 | DERIVED     | Country         | const  | PRIMARY       | PRIMARY | 3       |      |    1 |                          |
|  2 | DERIVED     | CountryLanguage | ref    | PRIMARY       | PRIMARY | 3       |      |    8 | Using where; Using index |
+----+-------------+-----------------+--------+---------------+---------+---------+------+------+--------------------------+

Since views are allowed to call stored routines (Justing used this to call upon CONNECTION_ID()), and since stored routines can use session variables, we can take advantage and force the view into filtering out irrelevant rows before these accumulate to temporary tables and big joins.

Back in the customer’s office, we witnessed, what with their real data and multiple views, a reduction of query times from ~30 minutes to a few seconds.

Another kind of use

Eventually we worked to make better view definitions and query splitting, resulting in clearer code and fast queries, but this solution plays nicely into another kind of problem:

Can we force different customers to see different parts of a given table? e.g., only those rows that relate to the customers?

There can be many solutions: different tables; multiple views (one per customer), stored procedures, what have you. The above provides a solution, and I’ve seen it in use.

]]>
https://shlomi-noach.github.io/blog/mysql/views-better-performance-with-condition-pushdown/feed 1 1328
Reducing locks by narrowing primary key https://shlomi-noach.github.io/blog/mysql/reducing-locks-by-narrowing-primary-key https://shlomi-noach.github.io/blog/mysql/reducing-locks-by-narrowing-primary-key#comments Tue, 04 May 2010 06:46:01 +0000 https://shlomi-noach.github.io/blog/?p=1269 In a period of two weeks, I had two cases with the exact same symptoms.

Database users were experiencing low responsiveness. DBAs were seeing locks occurring on seemingly normal tables. In particular, looking at Innotop, it seemed that INSERTs were causing the locks.

In both cases, tables were InnoDB. In both cases, there was a PRIMARY KEY on the combination of all 5 columns. And in both cases, there was no clear explanation as for why the PRIMARY KEY was chosen as such.

Choosing a proper PRIMARY KEY

Especially with InnoDB, which uses clustered index structure, the PRIMARY KEY is of particular importance. Besides the fact that a bloated PRIMARY KEY bloats the entire clustered index and secondary keys (see: The depth of an index: primer), it is also a source for locks. It’s true that any UNIQUE KEY can serve as a PRIMARY KEY. But not all such keys are good candidates.

Reducing the locks

In both described cases, the solution was to add an AUTO_INCREMENT column to serve as the PRIMARY KEY, and have that 5 column combination under a secondary UNIQUE KEY. The impact was immediate: no further locks on that table were detected, and query responsiveness turned very high.

]]>
https://shlomi-noach.github.io/blog/mysql/reducing-locks-by-narrowing-primary-key/feed 7 1269