Misimproving performance problems with INSERT DELAYED

INSERT DELAYED may come in handy when using MyISAM tables. It may in particular be useful for log tables, where one is required to issue frequent INSERTs on one hand, but does not usually want or need to wait for DB response on the other hand.

It may even offer some performance boost, by aggregating such frequent INSERTs in a single thread.

But it is NOT a performance solution.

That is, in a case I’ve seen, database performance was poor. INSERTs were taking a very long time. Lot’s of locks were involved. The solution offered was to change all slow INSERTs to INSERT DELAYED. Voila! All INSERT queries now completed in no time.

But the database performance remained poor. Just as poor as before, with the additional headache: nobody knew what caused the low performance.

Using INSERT DELAYED to improve overall INSERT performance is like sweeping the dust under the carpet. It’s still there, only you can’t actually see it. When your queries are slow to return, you know which queries or which parts of your application are the immediate suspects. When everything happens in the background you lose that feeling.

The slow query log, fortunately, still provides with the necessary information, and all the other metrics are just as before. Good. But it now takes a deeper level of analysis to find a problem that was previously in plain sight.

So: use INSERT DELAYED carefully, don’t just throw it at your slow queries like a magic potion.

On restoring a single table from mysqldump

Following Restore one table from an ALL database dump and Restore a Single Table From mysqldump, I would like to add my own thoughts and comments on the subject.

I also wish to note performance issues with the two suggested solutions, and offer improvements.

Problem relevance

While the problem is interesting, I just want to note that it is relevant in very specific database dimensions. Too small – and it doesn’t matter how you solve it (e.g. just open vi/emacs and copy+paste). Too big – and it would not be worthwhile to restore from mysqldump anyway. I would suggest that the problem is interesting in the whereabouts of a few dozen GB worth of data.

Problem recap

Given a dump file (generated by mysqldump), how do you restore a single table, without making any changes to other tables?

Let’s review the two referenced solutions. I’ll be using the employees db on mysql-sandbox for testing. I’ll choose a very small table to restore: departments (only a few rows in this table).

Security based solution

Chris offers to create a special purpose account, which will only have write (CREATE, INSERT, etc.) privileges on the particular table to restore. Cool hack! But, I’m afraid, not too efficient, for two reasons: Continue reading » “On restoring a single table from mysqldump”

Performance analysis with mycheckpoint

mycheckpoint (see announcement) allows for both graph presentation and quick SQL access to monitored & analyzed data. I’d like to show the power of combining them both.

InnoDB performance

Taking a look at one of the most important InnoDB metrics: the read hit ratio (we could get the same graph by looking at the HTML report):

SELECT innodb_read_hit_percent FROM sv_report_chart_sample \G
*************************** 1. row ***************************
innodb_read_hit_percent: http://chart.apis.google.com/chart?cht=lc&chs=400x200&chts=303030,12&chtt=Nov+10,+11:40++-++Nov+11,+08:55+(0+days,+21+hours)&chdl=innodb_read_hit_percent&chdlp=b&chco=ff8c00&chd=s:400664366P6674y7176677677u467773y64ux166666764366646y616666666666644444434444s6u4S331444404433341334433646777666666074736777r1777767764776666F667777617777777777777777yaRi776776mlf667676xgx776766rou67767777u37797777x76676776u6A737464y67467761777666643u66446&chxt=x,y&chxr=1,99.60,100.00&chxl=0:||Nov+10,+15:55|Nov+10,+20:10|Nov+11,+00:25|Nov+11,+04:40|&chxs=0,505050,10

We see that read hit is usually high, but occasionally drops low, down to 99.7, or even 99.6. But it seems like most of the time we are above 99.95% read hit ratio. It’s hard to tell about 99.98%.

Can we know for sure?

We can stress our eyes, yet be certain of little. It’s best if we just query for the metrics! mycheckpoint provides with all data, accessible by simple SQL queries: Continue reading » “Performance analysis with mycheckpoint”

How to calculate a good InnoDB log file size – recap

Following Baron Schwartz’ post: How to calculate a good InnoDB log file size, which shows how to make an estimate for the InnoDB log file size, and based on SQL: querying for status difference over time, I’ve written a query to run on MySQL 5.1, which, upon sampling 60 seconds of status, estimates the InnoDB transaction log bytes that are expected to be written in the period of 1 hour.

Recap: this information can be useful if you’re looking for a good innodb_log_file_size value, such that will not pose too much I/O (smaller values will make for more frequent flushes), not will make for a too long recovery time (larger values mean more transactions to recover upon crash).

It is assumed that the 60 seconds period represents an average system load, not some activity spike period. Edit the sleep time and factors as you will to sample longer or shorter periods. Continue reading » “How to calculate a good InnoDB log file size – recap”

High Performance MySQL – a book to re-read

I first read High Performance MySQL, 2nd edition about a year ago, when it first came out. I since re-read a few pages on occasion.

In my previous posts I’ve suggested ways to improve upon the common ranking solution. Very innovative stuff! Or… so I thought.

I happened to browse through the book today, and a section on User Variables caught my eye. “Let’s see if I get get some insight“, I thought to myself. Imagine my surprise when I realized almost everything I’ve suggested is discussed in this modest section, black on white, sitting on my bookshelf for over a year!

I have read it a year back, have forgotten all about it, have re-invented stuff already solved and discussed… Oh, for more brain capacity…

To be honest, this has happened to me more than once in the past few months; I’m taking the habit of browsing the web when I’m looking for answers to my problems; I forget that this book contains the answers to so many common, practical MySQL problems, and does so in a very direct and helpful manner.

So, yet again, thumbs up to High Performance MySQL. Really a must book. Get it if you haven’t already!

InnoDB is dead. Long live InnoDB!

I find myself converting more and more customers’ databases to InnoDB plugin. In one case, it was a last resort: disk space was running out, and plugin’s compression released 75% space; in another, a slow disk made for IO bottlenecks, and plugin’s improvements & compression alleviated the problem; in yet another, I used the above to fight replication lag on a stubborn slave.

In all those case, I needed to justify the move to “new technology”. The questions “Is it GA? Is it stable?” are being asked a lot. Well, just a few days ago the MySQL 5.1 distribution started shipping with InnoDB plugin 1.0.4. That gives some weight to the stability question when facing a doubtful customer.

But I realized that wasn’t the point.

Continue reading » “InnoDB is dead. Long live InnoDB!”

SQL: finding a user’s country/region based on IP

I’ve encountered the same problem twice for different customers, so I guess it’s worth a discussion.

A common task for web applications is to find out the country/region of a user, based on her IP address, as can be detected in the HTTP request. Depending on the country of origin, the website can translate dates for different time zones, can change locale settings, and, perhaps most commonly, show advertisements in her native language.

To start with, there’s a table which lists the IP ranges per country/region. Let’s assume we’re only dealing with IPv4:

CREATE TABLE regions_ip_range (
  regions_ip_range_id INT UNSIGNED AUTO_INCREMENT,
  country VARCHAR(64) CHARSET utf8,
  region VARCHAR(64) CHARSET utf8,
  start_ip INT UNSIGNED,
  end_ip INT UNSIGNED,
  …
  PRIMARY KEY(regions_ip_range_id),
  ...
);

The table is fixed, and is populated. Now the question arises: how do we query this table, and which indexes should be created?

The wrong way

The form I’ve encountered is as follows: an index is declared on regions_ip_range:

KEY ip_range_idx (start_ip, end_ip)

And the query goes like this:

SELECT * FROM regions_ip_range
WHERE my_ip BETWEEN start_ip AND end_ip

Continue reading » “SQL: finding a user’s country/region based on IP”

Reasons to use innodb_file_per_table

When working with InnoDB, you have two ways for managing the tablespace storage:

  1. Throw everything in one big file (optionally split).
  2. Have one file per table.

I will discuss the advantages and disadvantages of the two options, and will strive to convince that innodb_file_per_table is preferable.

A single tablespace

Having everything in one big file means all tables and indexes, from all schemes, are ‘mixed’ together in that file.

This allows for the following nice property: free space can be shared between different tables and different schemes. Thus, if I purge many rows from my log table, the now unused space can be occupied by new rows of any other table.

This same nice property also translates to a not so nice one: data can be greatly fragmented across the tablespace.

An annoying property of InnoDB’s tablespaces is that they never shrink. So after purging those rows from the log table, the tablespace file (usually ibdata1) still keeps the same storage. It does not release storage to the file system.

I’ve seen more than once how certain tables are left unwatched, growing until disk space reaches 90% and SMS notifications start beeping all around. Continue reading » “Reasons to use innodb_file_per_table”