code.openark.org

Limiting table disk quota in MySQL

MySQLFile System InnoDB MyISAM Security TriggersMarch 7, 2011

Question asked by a student: is there a way to limit a table’s quote on disk? Say, limit a table to 2GB, after which it will refuse to grow? Note that the requirement is that rows are never DELETEd. The table must simply refuse to be updated once it reaches a certain size.

There is no built-in way to limit a table’s quota on disk. First thing to observe is that MySQL has nothing to do with this. It is entirely up to the storage engine to provide with such functionality. The storage engine is the one to handle data storage: how table and keys are stored on disk. Just consider the difference between MyISAM’s .MYD & .MYI to InnoDB’s shared tablespace ibdata1 to InnoDB’s file-per table .ibd files.

The only engine I know of that has a quota is the MEMORY engine: it accepts the max_heap_table_size, which limits the size of a single table in memory. Hrmmm… In memory…

Why limit?

I’m not as yet aware of the specific requirements of said company, but this is not the first time I heard this question.

The fact is: when MySQL runs out of disk space, it goes with a BOOM. It crashed ungracefully, with binary logs being out of sync, replication being out of sync. To date, and I’ve seen some cases, InnoDB merely crashes and manages to recover once disk space is salvaged, but I am not certain this is guaranteed to be the case. Anyone?

And, with MyISAM…, who knows?

Rule #1 of MySQL disk usage: don’t run out of disk space.

Workarounds

I can think of two workarounds, none of which is pretty. The first involves triggers (actually, a few variations for this one), the second involves privileges. Continue reading » “Limiting table disk quota in MySQL”

Generating Google line charts with SQL, part II

MySQLGraphs mycheckpoint SQLMarch 3, 2011March 3, 2011

This post continues Generating Google line charts with SQL, part I, in pursue of generating time series based image charts.

We ended last post with the following chart:


http://chart.apis.google.com/chart?cht=lc&chs=400x200&chtt=SQL%20chart&chxt=x,y&chxr=1,-4716.6,5340.0&chd=s:dddddddddeeeeeefffffffffeeeedddcccbbaaZZZYYYXXXXXXXXXYYYZZabbcdeefghhijkkllmmmmmmmmllkkjihgfedcbZYXWVUTSRRQQPPPPQQQRSTUVWXZacdfgijlmnpqrssttuuuttssrqonmkigfdbZXVTSQONMLKJIIIIIIJKLMOPRTVXZbegilnprtvwyz01111110zyxvtrpnkifcaXUSPNLJHFECBBAAABBCEFHJLNQTWZcfilortwy1346789999876420yvspmjfcYVSOL

which has a nice curve, and a proper y-legend, but incorrect x-legend and no ticks nor grids.

To date, Google Image Charts do not support time-series charts. We can’t just throw timestamp values and expect the chart to properly position them. We need to work these by hand.

This is not easily done; if our input consists of evenly spread timestamp values, we are in a reasonable position. If not, what do we do?

There are several solutions to this:

We can present whatever points we have on the chart, making sure to position them properly. This makes for an uneven distribution of ticks on the x-axis, and is not pleasant to watch.
We can extrapolate values for round hours (or otherwise round timestamp resolutions), and so show evenly spread timestamps. I don’t like this solution one bit, since we’re essentially inventing values here. Extrapolation is nice when you know you have nice curves, but not when you’re doing database monitoring, for example. You must have the precise values.
We can do oversampling, then group together several measurements within round timestamp resolutions. For example, we can make a measurement every 2 minutes, yet present only 6 measurements per hour, each averaging up 10 round minutes. This is the approach I take with mycheckpoint.

The latest approach goes even beyond that: what if we missed 30 minutes of sampling? Say the server was down. We then need to “invent” the missing timestamps. Note that we invent the timestamps, we do not invent values. We must present the chart with missing values on our invented timestamps.

I may show how to do this in a future post. Meanwhile, let’s simplify and assume our values are evenly spread. Continue reading » “Generating Google line charts with SQL, part II”

Upgrading passwords from old_passwords to “new passwords”

MySQLConfiguration SecurityFebruary 28, 2011March 1, 2011

You have old_passwords=1 in your my.cnf. I’m guessing this is because you used one of the my-small.cnf, my-large.cnf etc. templates provided with your MySQL distribution.

These files can easily win the “most outdated sample configuration file contest”.

Usually it’s no big deal: if some parameter isn’t right, you just go and change it. Some variables, though, have a long-lasting effect, and are not easily reversed.

What’s the deal with old_passwords?

No one should be using these anymore. This variable makes the password hashing algorithm compatible with that of MySQL 4.0. I’m pretty sure 4.0 was released 9 years ago. I don’t know of anyone still using it (or 4.0 client libraries).

The deal is this: with old_passwords you get a 16 hexadecimal digits (64 bit) hashing of your passwords. With so called “new passwords” you get 40 hexadecimal digits (plus extra “*“). So this is about better encryption of your password. Read more on the manual.

How do I upgrade to new password format?

You can’t just put a comment on the “old_passwords=1” entry in the configuration file. If you do so, the next client to connect will attempt to match a 41 characters hashed password to your existing 16 characters entry in the mysql.users table. So you need to make a simultaneous change: both remove the old_passwords entry and set a new password. You must know all accounts’ passwords before you begin.

Continue reading » “Upgrading passwords from old_passwords to “new passwords””

Upgrading to Barracuda & getting rid of huge ibdata1 file

MySQLBackup Configuration InnoDB mysqldump ReplicationFebruary 15, 2011

Some of this is old stuff, but more people are now converting to InnoDB plugin, so as to enjoy table compression, performance boosts. Same holds for people converting to Percona’s XtraDB. InnoDB plugin requires innodb_file_per_table. No more shared tablespace file.

So your ibdata1 file is some 150GB, and it won’t reduce. Really, it won’t reduce. You set innodb_file_per_table=1, do ALTER TABLE t ENGINE=InnoDB (optionally ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8), and you get all your tables in file-per-table .ibd files.

But the original ibdata1 file is still there. It has to be there, don’t delete it! It contains more than your old data.

InnoDB tablespace files never reduce in size, it’s an old-time annoyance. The only way to go round it, if you need the space, is to completely drop them and start afresh. That’s one of the things so nice about file-per-table: an ALTER TABLE actually creates a new tablespace file and drops the original one.

The procedure

The procedure is somewhat painful:

Dump everything logically (either use mysqldump, mk-parallel-dump, or do it your own way)
Erase your data (literally, delete everything under your datadir)
Generate a new empty database
Load your dumped data. Continue reading » “Upgrading to Barracuda & getting rid of huge ibdata1 file”

On generating unique IDs using LAST_INSERT_ID() and other tools

MySQLmemcached NoSQL Performance Query CacheFebruary 2, 2011February 2, 2011

There’s a trick for using LAST_INSERT_ID() to generate sequences in MySQL. Quoting from the Manual:

Create a table to hold the sequence counter and initialize it:
mysql> CREATE TABLE sequence (id INT NOT NULL);
mysql> INSERT INTO sequence VALUES (0);
Use the table to generate sequence numbers like this:
mysql> UPDATE sequence SET id=LAST_INSERT_ID(id+1);
mysql> SELECT LAST_INSERT_ID();

This trick calls for trouble.

Contention

A customer was using this trick to generate unique session IDs for his JBoss sessions. These IDs would eventually be written back to the database in the form of log events. Business go well, and one day the customer adds three new JBoss servers (doubling the amount of webapps). All of a sudden, nothing works quite as it used to. All kinds of queries take long seconds to complete; load average becomes very high. Continue reading » “On generating unique IDs using LAST_INSERT_ID() and other tools”

Generating Google line charts with SQL, part I

MySQLGraphs mycheckpoint SQLFebruary 1, 2011March 3, 2011

In this series of posts I wish to show how Google Charts can be generated via SQL. We discuss the Google Charts limitations which must be challenged, and work towards a simple chart.

I’m going to present the algorithm I use in mycheckpoint, a MySQL monitoring utility, which generates Google charts by raw data using views. An example of such chart follows:


http://chart.apis.google.com/chart?cht=lc&chs=370x180&chts=303030,12&chtt=Latest+24+hours:+Nov+9,+05:50++-++Nov+10,+05:50&chf=c,s,ffffff&chdl=Rentals+rate:+custom_1_psec&chdlp=b&chco=ff8c00&chd=s:GDGKGFLFGMJHRLMPPNULJRPLTOPRUMYPPVRNbQUSUSbSNWUOfSWTObVSUVWSVYVPbTPjfTbRTdXReUWhcTQRQZbTWYVYPaVZXdYYWPTabYUTbW99QLgLNIOIRNNMIKRJEHGFHGJGGFIFDFGDK&chxt=x,y&chxr=1,0,8.720000&chxl=0:|+||08:00||+||12:00||+||16:00||+||20:00||+||00:00||+||04:00||&chxs=0,505050,10,0,lt&chg=4.17,25,1,2,0.69,0&chxp=0,0.69,4.86,9.03,13.20,17.37,21.54,25.71,29.88,34.05,38.22,42.39,46.56,50.73,54.90,59.07,63.24,67.41,71.58,75.75,79.92,84.09,88.26,92.43,96.60&tsstart=2010-11-09+05:50:00&tsstep=600

mycheckpoint does not actually call on Google to do the chart rendering, but invokes its own JavaScript code to visualize the URL locally.

Here are some downsides for using Google charts:

The URL cannot be as long as you like. 2048 characters is an upper bound you’ll want to keep behind. [Google charts POST method calls are available, which leads to 16K equivalent of URL length — this is still not too helpful due to the nature of POST calls]
Features are inconsistent. To specify label or tick positions, one must specify exact positions. To specify grid positions, one must supply with step, offset, etc. There are more such inconsistencies.
Google charts are not too friendly. Taking the ticks and grids example from above, there really shouldn’t be a reason why grids would not be automatically generated according to ticks definitions. But we are required to specify positions for the ticks as well as for the grids.
There is no support for time-series. One must translate time as x-axis values.
Perhaps most intimidating to many people: to generate a Google chart, once must send data to Google. Which is the main reason I used local JavaScript rendering.

Anyway, let’s build a very simple chart. Since I will not cover everything in this post, we make for some relaxed conditions. Continue reading » “Generating Google line charts with SQL, part I”

Multi condition UPDATE query

MySQLIndexing SQLJanuary 27, 2011

A simple question I’ve been asked:

Is it possible to merge two UPDATE queries, each on different WHERE conditions, into a single query?

For example, is it possible to merge the following two UPDATE statements into one?

mysql> UPDATE film SET rental_duration=rental_duration+1 WHERE rating = 'G';
Query OK, 178 rows affected (0.01 sec)

mysql> UPDATE film SET rental_rate=rental_rate-0.5 WHERE length < 90;
Query OK, 320 rows affected (0.01 sec)

To verify our tests, we take a checksum:

mysql> pager md5sum
PAGER set to 'md5sum'
mysql> SELECT film_id, title, rental_duration, rental_rate FROM film ORDER BY film_id;
c2d253c3919efaa6d11487b1fd5061f3  -

Obviously, the following query is incorrect: Continue reading » “Multi condition UPDATE query”

Another use for “top N records per group” query

MySQLSQLJanuary 10, 2011January 10, 2011

A few days ago I published SQL: selecting top N records per group. It just dawned on me that the very same query solved another type of problem I was having a couple years ago.

BTW, for reference, Baron Schwartz posted this 4 years ago. There are very interesting approaches in text and in comments. Good to see the ancients already knew of such problems, I should study my history better.

(Kidding, kidding! This is self criticism of course)

In this case I present now I have a table with recurring data, which, to some extent, represents revision of data. For the sake of simplicity let’s describe it as a simple simulation of a revision system:

I have text files, whose content I store within a row
Each text file uses at least one row in the table
Text files can be edited, whereas an edited file is written in a new row (never UPDATEd).

Hold your horses: I’m not really implementing a revision system, I just can’t get into the actual details here.

The table becomes large quickly, and it’s desired to purge rows from the table. The rule is: we must obtain the most recent two versions for each file (if there are indeed more than one). All others can be purged. So the question is: Continue reading » “Another use for “top N records per group” query”

SQL: selecting top N records per group

MySQLSQLJanuary 6, 2011August 21, 2012

A while back I presented(*) an SQL trick to present with non-aggregated column on a GROUP BY query, without use of subquery or derived tables.

Based on a similar concept, combined with string walking, I now present a query which selects top-n records for each group, ordered by some condition. It will require no subqueries. It executes faster than its more conventional alternatives.

[UPDATE: this is MySQL only. Others can use Window Functions where available]

Using the simple world database, we answer the following question:

What are the top 5 largest (by area) countries for each continent? What are their names, surface area and population?

Step 1: getting the top

We already know how to get a single column’s value for the top country, as presented in the aforementioned post: Continue reading » “SQL: selecting top N records per group”

Speaking at the O’Reilly MySQL Conference 2011

MySQLmysqlconf openark kitJanuary 3, 2011January 3, 2011

I’m very pleased and humbled to announce that my submission to the upcoming O’Reilly MySQL Conference, April 2011, has been accepted.

I will present a 45 minute session titled openark-kit: MySQL utilities for everyday use.

In this session, I will present some of the tools in the openark kit. We’ll discuss some limitations of the MySQL server, and how openark kit tools overcome those limitations and provide with solutions to common maintenance and audit problems.

This will be a technical session and will discuss various topics of the MySQL server: security, execution plans, replication, triggers and more. I do not intend to discuss all tools, nor to cover the various options. Instead, I’ll present the “behind the scenes“, show why the tools work, present common problems and typical use case.

This will be the first time I present at the MySQL Conference (or any conference outside Israel, for that matter). I hope to have a good session. As extra measure of safety, I’ll bring along a couple basketballs; if the sun shines, we can all go outside and have a good time!

The idea to submit this talk (credit Roland Bouman) has given me the inspiration to put effort in making a new release with new and updated tools. So this talk is already a success as far as I’m concerned.

Hope to see you there!

[PS shameless plug: openark kit.]