Triggers – code.openark.org http://shlomi-noach.github.io/blog/ Blog by Shlomi Noach Mon, 07 Mar 2011 07:08:21 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.3 32412571 Limiting table disk quota in MySQL https://shlomi-noach.github.io/blog/mysql/limiting-table-disk-quota-in-mysql https://shlomi-noach.github.io/blog/mysql/limiting-table-disk-quota-in-mysql#comments Mon, 07 Mar 2011 07:08:21 +0000 https://shlomi-noach.github.io/blog/?p=3359 Question asked by a student: is there a way to limit a table’s quote on disk? Say, limit a table to 2GB, after which it will refuse to grow? Note that the requirement is that rows are never DELETEd. The table must simply refuse to be updated once it reaches a certain size.

There is no built-in way to limit a table’s quota on disk. First thing to observe is that MySQL has nothing to do with this. It is entirely up to the storage engine to provide with such functionality. The storage engine is the one to handle data storage: how table and keys are stored on disk. Just consider the difference between MyISAM’s .MYD & .MYI to InnoDB’s shared tablespace ibdata1 to InnoDB’s file-per table .ibd files.

The only engine I know of that has a quota is the MEMORY engine: it accepts the max_heap_table_size, which limits the size of a single table in memory. Hrmmm… In memory…

Why limit?

I’m not as yet aware of the specific requirements of said company, but this is not the first time I heard this question.

The fact is: when MySQL runs out of disk space, it goes with a BOOM. It crashed ungracefully, with binary logs being out of sync, replication being out of sync. To date, and I’ve seen some cases, InnoDB merely crashes and manages to recover once disk space is salvaged, but I am not certain this is guaranteed to be the case. Anyone?

And, with MyISAM…, who knows?

Rule #1 of MySQL disk usage: don’t run out of disk space.

Workarounds

I can think of two workarounds, none of which is pretty. The first involves triggers (actually, a few variations for this one), the second involves privileges.

Triggers

The following code (first presented in Triggers Use Case Compilation, Part II) assumed the DATA_LENGTH and INDEX_LENGTH values in INFORMATION_SCHEMA to be good indicators:

DROP TABLE IF EXISTS `world`.`logs`;
CREATE TABLE  `world`.`logs` (
  `logs_id` int(11) NOT NULL auto_increment,
  `ts` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  `message` varchar(255) character set utf8 NOT NULL,
  PRIMARY KEY  (`logs_id`)
) ENGINE=MyISAM;

DELIMITER $$

DROP TRIGGER IF EXISTS logs_bi $$
CREATE TRIGGER logs_bi BEFORE INSERT ON logs
FOR EACH ROW
BEGIN
  SELECT DATA_LENGTH+INDEX_LENGTH FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='world' AND TABLE_NAME='LOGS' INTO @estimated_table_size;
  IF (@estimated_table_size > 25*1024) THEN
    SELECT 0 FROM `logs table is full` INTO @error;
  END IF;
END $$

DELIMITER ;

Or, you could write your own UDF, e.g. get_table_file_size(fully_qualified_table_name) and be more accurate:

DELIMITER $$

DROP TRIGGER IF EXISTS logs_bi $$
CREATE TRIGGER logs_bi BEFORE INSERT ON logs
FOR EACH ROW
BEGIN
  SELECT get_table_file_size('world.logs') INTO @table_size;
  IF (@table_size > 25*1024) THEN
    SELECT 0 FROM `logs table is full` INTO @error;
  END IF;
END $$

DELIMITER ;

(Same should be done for UPDATE operations)

In both workarounds above, triggers are pre-defined. But triggers are performance-killers.

How about preventing writing to the table only when it’s truly on the edge? A simple shell script, spawned by a cronjob, could do this well: get the file size of a specific table, and test if it’s larger than n bytes. If not, the script exits. If the file is indeed too large, the scripts invokes the following on mysql:

DELIMITER $$

DROP TRIGGER IF EXISTS logs_bi $$
CREATE TRIGGER logs_bi BEFORE INSERT ON logs
FOR EACH ROW
BEGIN
  SELECT 0 FROM `logs table is full` INTO @error;
END $$

DELIMITER ;

So, during most of the time, there is no trigger. Only when the external script detects that table is too large, does it create a trigger. The trigger has no logic: it simply raises an error (PS, use raise in MySQL 5.5).

Privileges

Another way to work around the problem is to use security features. Instead of creating a trigger on the table, REVOKE the INSERT & UPDATE privileges from the appropriate user on that table.

This may turn out to be a difficult task, since MySQL has no notion of fine grain changes. That is, suppose we have:

GRANT INSERT, UPDATE, DELETE, SELECT ON mydb.* TO 'webuser'@'%.webdomain'

If we just do:

REVOKE SELECT ON mydb.logs FROM 'webuser'@'%.webdomain'

We get:

There is no such grant defined for user 'webuser' on host '%.webdomain' on table 'logs'.

So this requires setting up privileges on the table level in the first place. Plus note that as long as the grants on the database level do allow for INSERTs, you cannot override it on the table level.

Other ideas?

I never actually implemented table disk quota. I’m not sure this is a viable solution; but I haven’t heard all the arguments in favor as yet, so I don’t want to rule this out.

Please share below if you are using other means of table size control, other than the trivial cleanup of old records.

]]>
https://shlomi-noach.github.io/blog/mysql/limiting-table-disk-quota-in-mysql/feed 2 3359
Thoughts and ideas for Online Schema Change https://shlomi-noach.github.io/blog/mysql/thoughts-and-ideas-for-online-schema-change https://shlomi-noach.github.io/blog/mysql/thoughts-and-ideas-for-online-schema-change#comments Thu, 07 Oct 2010 08:29:10 +0000 https://shlomi-noach.github.io/blog/?p=3005 Here’s a few thoughts on current status and further possibilities for Facebook’s Online Schema Change (OSC) tool. I’ve had these thoughts for months now, pondering over improving oak-online-alter-table but haven’t got around to implement them nor even write them down. Better late than never.

The tool has some limitations. Some cannot be lifted, some could. Quoting from the announcement and looking at the code, I add a few comments. I conclude with a general opinion on the tool’s abilities.

“The original table must have PK. Otherwise an error is returned.”

This restriction could be lifted: it’s enough that the table has a UNIQUE KEY. My original oak-online-alter-table handled that particular case. As far as I see from their code, the Facebook code would work just as well with any unique key.

However, this restriction is of no real interest. As we’re mostly interested in InnoDB tables, and since any InnoDB table should have a PRIMARY KEY, we shouldn’t care too much.

“No foreign keys should exist. Otherwise an error is returned.”

Tricky stuff. With oak-online-alter-table, changes to the original table were immediately reflected in the ghost table. With InnoDB tables, that meant same transaction. And although I never got to update the text and code, there shouldn’t be a reason for not using child-side foreign keys (the child-side is the table on which the FK constraint is defined).

The Facebook patch works differently: it captures changes and writes them to a delta table,  to be later (asynchronously) analyzed and make for a replay of actions on the ghost table.

So in the Facebook code, some cases will lead to undesired behavior. Consider two tables, country and city, with city holding a RESTRICT/NO ACTION foreign key on country‘s id. Now consider the scenario:

  1. Rows from city are DELETEd, where the country Id is Spain’s.
    • city‘s ghost table is still unaffected, Spain’s cities are still there.
    • A change is written to the delta table to mark these rows for deletion.
  2. A DELETE is issued on country‘s Spain record.
    • The DELETE should work, from the user’s perspective
    • But it will fail: city’s ghost table has not received the changes yet. There’s still matching rows. The NO ACTION constraint will fail the DELETE statement.

Now, this does not lead to corruption, just to seemingly unreasonable behavior on the database part. This behavior is probably undesired. NO ACTION constraint won’t do.

However, with CASCADE or SET NULL options, there is less of an issue: operations on the parent table (e.g. country) cannot fail. We must make sure operations on the ghost table make it consistent with the original table (e.g. city).

Consider the following scenario:

  1. A new country is created, called “Sleepyland”. An INSERT is made to country.
    • Both city and city‘s ghost are immediately aware of it.
  2. A new town is created and INSERTed to city. The town is called “Naphaven”.
    • The change takes time to propagate to city‘s ghost table.
  3. Meanwhile, we realized we made a mistake. We’ve been had. There’s no such city nor country.
    1. We DELETE “Naphaven” from city.
    2. We DELETE “Sleepyland” from country.
    • Note that city‘s ghost table still hasn’t caught up with the changes.
  4. Eventually, the INSERT statement for “Naphaven” reaches city‘s ghost table.
    • What should happen now? The INSERT cannot succeed.
    • Will this fail the entire process?

Looking at the PHP code, I see that changes written on the delta table are blindly replayed on the ghost table.

Since the process is asynchronous, this should not be the case. We can solve the above if we use INSERT IGNORE instead of INSERT. The statement will fail without failing anything else. The row cannot exist, and that’s because the original row does not exist anymore.

Unlike a replication corruption, this does not lead to accumulation mistakes. The replay is static, somewhat like in binary log format. Changes are just written, regardless of existing data.

I have given this considerable thought, and I can’t say I’ve covered all the possible scenario. However I believe that with proper use of INSERT IGNORE and REPLACE INTO (two statements I heavily relied on with oak-online-alter-table), correctness can be achieved.

There’s the small pain of re-generating the foreign key definition on the “ghost” table (CREATE TABLE LIKE … does not copy FK definitions). And since foreign key names are unique, a new name must be picked up. Not pretty, but perfectly doable.

“No AFTER_{INSERT/UPDATE/DELETE} triggers must exist.”

It would be nicer if MySQL had an ALTER TRIGGER statement. There isn’t such statement. If there were such an atomic statement, then we would be able to rewrite the trigger, so as to add our own code to the end of the trigger’s code. Yuck. Would be even nicer if we were allowed to have multiple triggers of same event.

So, we are left with DROP and CREATE triggers. Alas, this makes for a short period where the trigger does not exist. Bad. The easy solution would be to LOCK WRITE the table, but apparently you can’t DROP the trigger (*) when the table is locked. Sigh.

(*) Happened to me, apparently to Facebook too; With latest 5.1 (5.1.51) version this actually works. With 5.0 it didn’t use to; this needs more checking.

Use of INFORMATION_SCHEMA

As with oak-online-alter-table, the OSC checks for triggers, indexes, column by searching on the INFORMATION_SCHEMA tables. This makes for nice SQL for getting the exact listing and types of PRIMARY KEY columns, whether or not AFTER triggers exist, and so on.

I’ve always considered this to be the weak part of openark-kit, that it relies on INFORMATION_SCHEMA so much. It’s easier, it’s cleaner, it’s even more correct to work that way — but it just puts too much locks. I think Baron Schwartz (and now Daniel Nichter) did amazing work on analyzing table schemata by parsing the SHOW CREATE TABLE and other SHOW commands regex-wise with Maatkit. It’s a crazy work! Had I written openark-kit in Perl, I would have just import their code. But I’m too lazy busy to do the conversion from Perl to Python, and rewrite that code, what with all the debugging.

OSC is written in PHP. Again, much conversion work. I think performance-wise this is an important step to make.

A word for the critics

Finally, a word for the critics. I’ve read some Facebook/MySQL bashing comments and wish to relate.

In his interview to The Register, Mark Callaghan gave the example that “Open Schema Change lets the company update indexes without user downtime, according to Callaghan”.

PostgreSQL was mentioned for being able to add index with only read locks taken, or being able to do the work with no locks using CREATE INDEX CONCURRENTLY. I wish MySQL had that feature! Yes, MySQL has a lot to improve upon, and the latest PostgreSQL 9.0 brings valuable new features. (Did I make it clear I have no intention of bashing PostgreSQL? If not, please re-read this paragraph until convinced).

Bashing related to the notion of MySQL being so poor that Facebook used an even poorer mechanism to work out the ALTER TABLE.

Well, allow me to add a few words: the CREATE INDEX is by far not the only thing you can achieve with OSC (although it may be Facebook’s major concern). You should be able to:

  • Add columns
  • Drop columns
  • Convert character sets
  • Modify column types
  • Add partitioning
  • Reorganize partitioning
  • Compress the table
  • Otherwise changing table format
  • Heck, you could even modify the storage engine! (To other transactional engine)

These are giant steps. How easy would it be to write these down into the database? It only takes a few weeks time to work out a working solution with reasonable limitations, just using the resources the MySQL server provides you with. The MySQL@Facebook team should be given credit for that.

]]>
https://shlomi-noach.github.io/blog/mysql/thoughts-and-ideas-for-online-schema-change/feed 8 3005
Triggers Use Case Compilation, Part III https://shlomi-noach.github.io/blog/mysql/triggers-use-case-compilation-part-iii https://shlomi-noach.github.io/blog/mysql/triggers-use-case-compilation-part-iii#comments Mon, 02 Feb 2009 11:23:38 +0000 https://shlomi-noach.github.io/blog/?p=412 The previous two parts have looked at some solutions offered by triggers. Let’s look now at some wishful triggers solutions, which are currently unavailable because of triggers limitations.

Triggers Use Case Compilation, Part I

Triggers Use Case Compilation, Part II

Limitations and wishful features

Triggers are slow

The overhead of adding triggers is usually an even breaker. But I would like to believe speed will improve in time!

Triggers cannot act on the same table which activated them.

A thing I would like to do is have a rotating table. A log table is a perfect example: I only want to store logs up to 7 days back, or up to 1M rows. ON INSERT, (or once every 1000 inserts or so), I wish to remove oldest rows. This is not possible today since I can’t DELETE rows from the same table which caused the ON INSERT trigger to run. It can’t be hacked by calling on another table, then doing a circular trigger trick. MySQL will raise an error on run time, complaining about a loop.

Triggers cannot act on system tables.

Now why would I want to do that? Well, one of the first things I look at when reviewing a database is the users grants. I always find a list of users which is just too permissive, with far too many users than required. I once came upon a database with 273 users, where only 5 of them were actually in use. “When were these added?”, I asked – but nobody knew.

I would love to have an ON INSERT and an ON UPDATE trigger on the mysql.user table, which lists down the time of user creation and the invoking user (who would usually be ‘root’) and host, so it’s easier to track down who did what.

You cannot execute prepared statements from within a trigger.

Not much to add here. The possibilities are too many.

You can’t spawn an ANALYZE TABLE from a trigger

What I want to do is to run an ANALYZE TABLE once every 10K inserts or deletes, so the table takes care of itself. I’ve tried hacking this with prepared statements (you can’t use them); with cursors (you can only run a cursor on SELECT queries) or otherwise SQL hacks (none worked). If anyone finds a hack around it – please let me know!

You can’t have more than one trigger on the same event per table

This is more of a design issue. If I want to have two things BEFORE INSERT on City, I need to code both in the same trigger. This means adding functionality involves editing existing, tested, working code. It would be much nicer if two such triggers could play along.

A dirty workaround to problematic issues

There is a dirty workaround to some issues.

Take, for example, the rotating tables problem. Instead of the trigger executing the following query:

DELETE FROM logs WHERE time < DATE_ADD(NOW(), INTERVAL -7 DAY)

(as we’ve already noted was impossible), the triggers can write down the query as TEXT into some queries_to_run table. A cronjob can periodically check this table and execute whatever is in it, removing executed rows.

MySQL 5.1’s event scheduler can also be used for such statements which are invokable (like said DELETE).

]]>
https://shlomi-noach.github.io/blog/mysql/triggers-use-case-compilation-part-iii/feed 2 412
Triggers Use Case Compilation, Part II https://shlomi-noach.github.io/blog/mysql/triggers-use-case-compilation-part-ii https://shlomi-noach.github.io/blog/mysql/triggers-use-case-compilation-part-ii#comments Thu, 15 Jan 2009 08:01:39 +0000 https://shlomi-noach.github.io/blog/?p=388 In Triggers Use Case Compilation, Part I, I’ve demonstrated some triggers use scenarios.

We continue our examples of triggers usage.

Counters and aggregations bookkeeping

Consider the City table: each city belongs to a certain country. Some questions we may be interested in are:

  • How many cities are there per country?
  • What’s the sum of cities population per country?
  • What’s the population of the largest city per country?

Answering any of these questions is an easy SQL excercise. But aggregation is required, and full table scan (or full index scan, if we’re lucky) is essentially part of any execution plan. What if we can’t pay the price for these queries? What if we need immediate, or near immediate response?

One solution is to use counter tables, or summary tables. For example, to answer the first questions, we create the following table:

CREATE TABLE  CityCount (
  `CountryCode` char(3) NOT NULL,
  `NumCities` int(11) NOT NULL,
  PRIMARY KEY  (`CountryCode`)
);

By following all INSERTs and DELETEs on the City table, we can manage the CityCount table’s data.

DELIMITER $$

DROP TRIGGER IF EXISTS City_ai $$
CREATE TRIGGER City_ai AFTER INSERT ON City
FOR EACH ROW
BEGIN
  INSERT INTO CityCount (CountryCode, NumCities)
    VALUES (NEW.CountryCode, 1)
    ON DUPLICATE KEY
    UPDATE NumCities = NumCities+1;
END $$

DROP TRIGGER IF EXISTS City_au $$
CREATE TRIGGER City_au AFTER UPDATE ON City
FOR EACH ROW
BEGIN
  IF (OLD.CountryCode != NEW.CountryCode) THEN
    UPDATE CityCount SET NumCities = NumCities-1
      WHERE CountryCode = OLD.CountryCode;
    INSERT INTO CityCount (CountryCode, NumCities)
      VALUES (NEW.CountryCode, 1)
      ON DUPLICATE KEY
      UPDATE NumCities = NumCities+1;
  END IF;
END $$

DROP TRIGGER IF EXISTS City_ad $$
CREATE TRIGGER City_ad AFTER DELETE ON City
FOR EACH ROW
BEGIN
  UPDATE CityCount SET NumCities = NumCities-1
    WHERE CountryCode = OLD.CountryCode;
END $$

DELIMITER ;

To illustrate the impact of triggers, let’s do a ‘massive’ data load here:

mysql> CREATE TABLE City_2 LIKE City;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO City_2 SELECT * FROM City;
Query OK, 3998 rows affected (0.23 sec)
Records: 3998  Duplicates: 0  Warnings: 0

mysql> TRUNCATE TABLE City;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO City SELECT * FROM City_2;
Query OK, 3998 rows affected (3.58 sec)
Records: 3998  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM CityCount LIMIT 10;
+-------------+-----------+
| CountryCode | NumCities |
+-------------+-----------+
| AFG         |         4 |
| NLD         |        28 |
| ANT         |         1 |
| ALB         |         1 |
| DZA         |        18 |
| ASM         |         2 |
| AND         |         1 |
| AGO         |         5 |
| AIA         |         2 |
| ATG         |         1 |
+-------------+-----------+
10 rows in set (0.01 sec)

The results seem satisfactory. We can now query CityCount directly, no need for complex queries on City. But look at the times: INSERTing data into City_2 took 0.23 seconds. INSERTing the same data into City took 3.58 seconds. That’s the triggers overhead. There is an advantage to using triggers here (and in general) if you’re doing many SELECTs, but few INSERT/UPDATE/DELETE.

Enhance security

In Using triggers to block malicious code: an example, I have shown how a trigger may block changes to sensitive data. A trigger is aware of the invoker, and can implement a row-based privileges system.

As another example, let’s see how we can do a “privileges table partitioning”. We look at the world‘s City table. What if we’re working on some world-nations-wiki, and we want to assign users to countries, in such way that a user can only modify data for a country she is assigned to?

We create a privileges table which maps users to countries:

DROP TABLE IF EXISTS `CountryUser`;
CREATE TABLE `CountryUser` (
  `CountryCode` char(3) NOT NULL,
  `mysql_User` char(16) collate utf8_bin NOT NULL default '',
  PRIMARY KEY  (`CountryCode`, `mysql_User`)
);

And then add the security triggers on City:

DELIMITER $$

DROP TRIGGER IF EXISTS City_bi $$
CREATE TRIGGER City_bi BEFORE INSERT ON City
FOR EACH ROW
BEGIN
  SELECT SUBSTRING_INDEX(USER(),'@',1) INTO @current_mysql_user;
  IF (@current_mysql_user NOT IN (SELECT mysql_User FROM CountryUser WHERE CountryCode = NEW.CountryCode)) THEN
    SELECT 0 FROM `Unauthorized access` INTO @error;
  END IF;
END $$

DROP TRIGGER IF EXISTS City_bu $$
CREATE TRIGGER City_bu BEFORE UPDATE ON City
FOR EACH ROW
BEGIN
  SELECT SUBSTRING_INDEX(USER(),'@',1) INTO @current_mysql_user;
  IF (@current_mysql_user NOT IN (SELECT mysql_User FROM CountryUser WHERE CountryCode = OLD.CountryCode)) THEN
    SELECT 0 FROM `Unauthorized access` INTO @error;
  END IF;
END $$

DROP TRIGGER IF EXISTS City_bd $$
CREATE TRIGGER City_bd BEFORE DELETE ON City
FOR EACH ROW
BEGIN
  SELECT SUBSTRING_INDEX(USER(),'@',1) INTO @current_mysql_user;
  IF (@current_mysql_user NOT IN (SELECT mysql_User FROM CountryUser WHERE CountryCode = OLD.CountryCode)) THEN
    SELECT 0 FROM `Unauthorized access` INTO @error;
  END IF;
END $$

DELIMITER ;

Testing (as user root):

mysql> INSERT INTO CountryUser (CountryCode, mysql_User) VALUES ('NLD', 'root');
Query OK, 1 row affected (0.00 sec)

mysql> DELETE FROM City WHERE Name = 'Milano';
ERROR 1146 (42S02): Table 'world.Unauthorized access' doesn't exist

mysql> DELETE FROM City WHERE Name = 'Amsterdam';
Query OK, 1 row affected (0.05 sec)

Managing cache (e.g. invalidating memcached)

In Using memcached functions for MySQL; an automated alternative to Query Cache, I’ve shown how triggers can be used to invalidate memcached values. But cache management can apply to local tables as well.

It is common practice to have summary tables (we used such one in our counters example). Summary tables are just normal tables which are filled with aggregated data, and save the need to re-aggregate that data. Much like the memcached example, triggers can be used to invalidate or reload the summary table data when relevant changes occur in underlying tables.

Limiting table size

Out last use case shows how it is possible to limit table size using triggers.

By “limiting table size” we can think of row-count limitation, or storage limitation. In the following example, the logs table is limited by a certain byte size.

DROP TABLE IF EXISTS `world`.`logs`;
CREATE TABLE  `world`.`logs` (
  `logs_id` int(11) NOT NULL auto_increment,
  `ts` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  `message` varchar(255) character set utf8 NOT NULL,
  PRIMARY KEY  (`logs_id`)
) ENGINE=MyISAM;

DELIMITER $$

DROP TRIGGER IF EXISTS logs_bi $$
CREATE TRIGGER logs_bi BEFORE INSERT ON City
FOR EACH ROW
BEGIN
  SELECT DATA_LENGTH+INDEX_LENGTH FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='world' AND TABLE_NAME='LOGS' INTO @estimated_table_size;
  IF (@estimated_table_size > 25*1024) THEN
    SELECT 0 FROM `logs table is full` INTO @error;
  END IF;
END $$

DELIMITER ;

No more than 25KB of storage is allowed for this table. Let’s put it to the test:

mysql> INSERT INTO logs (message) VALUES ('this line is 31 characters long');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO logs SELECT * FROM logs;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> INSERT INTO logs SELECT * FROM logs;
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

...

mysql> INSERT INTO logs SELECT * FROM logs;
Query OK, 256 rows affected (1.84 sec)
Records: 256  Duplicates: 0  Warnings: 0

mysql> INSERT INTO logs SELECT * FROM logs;
ERROR 1146 (42S02): Table 'world.logs table is full' doesn't exist

mysql> INSERT INTO logs (message) VALUES ('this line is 31 characters long');
ERROR 1146 (42S02): Table 'world.logs table is full' doesn't exist

A few important notes:

  • INFORMATION_SCHEMA only presents estimated size.
  • For InnoDB, small tables may claim to possess much more storage than they really do (see this post by MySQL Performance Blog).
  • InnoDB does not release storage (unless you use TRUNCATE or ALTER TABLE), which means even if you delete rows from the table, it still occupies the same storage.
  • Memory tables do not release memory unless you use TRUNCATE or ALTER TABLE.
  • You may wish to limit table size by row count. There, again, INFORMATION_SCHEMA only provides an estimated value, and querying InnoDB for count(*) is a lengthy operation.

A more interesting implementation of table size limitation is the notion of rotating tables. More on that in the next post.

More to come

In the next and final part we will look at some problems which cannot be solved with triggers due to current trigger limitations.

]]>
https://shlomi-noach.github.io/blog/mysql/triggers-use-case-compilation-part-ii/feed 3 388
Triggers Use Case Compilation, Part I https://shlomi-noach.github.io/blog/mysql/triggers-use-case-compilation-part-i https://shlomi-noach.github.io/blog/mysql/triggers-use-case-compilation-part-i#comments Mon, 05 Jan 2009 09:55:15 +0000 https://shlomi-noach.github.io/blog/?p=313 I’ve run by quite a few triggers lately on production systems. In previous posts, I’ve written about problems solved with triggers. So here’s a compilation of some solutions based on triggers; and some problems which are not (yet?) solvable due to current triggers limitations.

Triggers can be used to:

  • Maintain integrity
  • Enhance security
  • Enhance logging
  • Assist with archiving
  • Restrict table size
  • Manage caching
  • Manage counters

Triggers are not fast. In fact, they can add quite an overhead if misused. Some of the triggers presented here are known to work on real life production systems, though, and work well. But make sure you benchmark before embarking on extensive application changes.

I’ll be using MySQL’s world database in some of the examples.

Maintaining value integrity

MySQL can enforce (is you’re using the right sql_mode), some of the values you set for a column. For example, you should not be allowed to set a TINYINT column value to 500. You may not be allowed to set NULL, or you may have to provide default values.

However, within allowed range, SQL or MySQL in general won’t help you. Assume you have a “percent” column, which holds integer values 0..100. It would be a TINYINT, of course. But setting the value to 103 is perfectly valid in MySQL’s point of view, though not so in yours.

This is where triggers come in handy. With a trigger, you may truncate illegal values or completely abort the operation if something doesn’t seem right. For example, we may wish to enforce a city’s district to be non-empty. We may also wish to ensure that the city’s population does not exceed its country’s population:

DELIMITER $$
DROP TRIGGER IF EXISTS City_bu $$
CREATE TRIGGER City_bu BEFORE UPDATE ON City
FOR EACH ROW
BEGIN
  DECLARE country_population INT;

  IF (CHAR_LENGTH(NEW.District) = 0) THEN
    SELECT 0 FROM `District must not be empty` INTO @error;
  END IF;

  SELECT MAX(Population) FROM Country
    WHERE Code = NEW.CountryCode INTO country_population;
  IF (NEW.Population > country_population) THEN
    SELECT 0 FROM `City population cannot exceed that of country!` INTO @error;
  END IF;
END $$
DELIMITER ;

For example:

mysql> UPDATE City SET Population=100000000 WHERE Name='London';
ERROR 1146 (42S02): Table 'world.City population cannot exceed that of country!' doesn't exist

We force the trigger to fail under certain circumstances. Since this is a BEFORE INSERT trigger, failure of the trigger causes aborting the INSERT itself.

Forcing referential integrity

If you’re using MyISAM, Memory or even Maria or Falcon, you don’t get to use Foreign Keys. MySQL’s plan is to add foreign keys for all storage engines. The plan is on print for quite a few years now. Till then, you may use triggers to simulate foreign keys, including cascading deletes and updates.

Let’s consider the tables City and Country. If we could, we would add the contraint that City.CountryCode references Country.Code. How can this be achieved with triggers? Here’s a partial solution, showing a DELETE CASCADE:

DELIMITER $$

DROP TRIGGER IF EXISTS City_bi $$
CREATE TRIGGER City_bi BEFORE INSERT ON City
FOR EACH ROW
BEGIN
  IF (NOT EXISTS (SELECT NULL FROM Country WHERE Code=NEW.CountryCode)) THEN
    SELECT 0 FROM `CountryCode does not exist in Country table` INTO @error;
  END IF;
END $$

DROP TRIGGER IF EXISTS Country_ad $$
CREATE TRIGGER Country_ad AFTER DELETE ON Country
FOR EACH ROW
BEGIN
  DELETE FROM City WHERE CountryCode = OLD.Code;
END $$

DELIMITER ;

Trying out some queries:

mysql> INSERT INTO City (Name, CountryCode) VALUES ('zzimbwawa', 'ZWZ');
ERROR 1146 (42S02): Table 'world.CountryCode does not exist in Country table' doesn't exist

mysql> INSERT INTO City (Name, CountryCode) VALUES ('zzimbwawa', 'GBR');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT COUNT(*) FROM City WHERE CountryCode = 'GBR';
+----------+
| COUNT(*) |
+----------+
|       82 |
+----------+
1 row in set (0.01 sec)

mysql> DELETE FROM Country WHERE Code='GBR';
Query OK, 1 row affected (0.04 sec)

mysql> SELECT COUNT(*) FROM City WHERE CountryCode = 'GBR';
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

The above example is partial. It does not handle UPDATEs on both tables. You may also modify it to simulate ON DELETE SET NULL instead of ON DELETE CASCADE.

Maintaining denormalized data integrity

Denormalized tables can hold data duplicated in several places. When such data changes in one place, triggers can help out with updating the change in the rest occurrences. A post was recently written which discusses this issue.

Archiving

Assume the following table:

DROP TABLE IF EXISTS `logs`;
CREATE TABLE  `logs` (
  `id` int(11) NOT NULL auto_increment,
  `subject` varchar(64) NOT NULL,
  `message` varchar(255) NOT NULL,
  `severity` tinyint(4) NOT NULL default '0',
  PRIMARY KEY  (`id`)
);

Logs are something that you want to cleanup regularly, on one hand, but keep at a safe place on the other hand. Let’s create a logs_archive table:

CREATE TABLE logs_archive LIKE logs;

We can automatically move records from the logs table to the logs_archive table:

DELIMITER $$
DROP TRIGGER IF EXISTS logs_bd $$
CREATE TRIGGER logs_bd BEFORE DELETE ON logs
FOR EACH ROW
BEGIN
  INSERT INTO logs_archive SELECT * FROM logs WHERE id=OLd.id;
END $$
DELIMITER ;

Example:

mysql> INSERT INTO logs (subject, message) VALUES ('info', 'new user created');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO logs (subject, message) VALUES ('info', 'cleanup completed');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM logs;
+----+---------+-------------------+----------+
| id | subject | message           | severity |
+----+---------+-------------------+----------+
|  1 | info    | new user created  |        0 |
|  2 | info    | cleanup completed |        0 |
+----+---------+-------------------+----------+
2 rows in set (0.00 sec)

mysql> DELETE FROM logs WHERE id = 1;
Query OK, 1 row affected (0.01 sec)

mysql> SELECT * FROM logs;
+----+---------+-------------------+----------+
| id | subject | message           | severity |
+----+---------+-------------------+----------+
|  2 | info    | cleanup completed |        0 |
+----+---------+-------------------+----------+
1 row in set (0.01 sec)

mysql> SELECT * FROM logs_archive;
+----+---------+------------------+----------+
| id | subject | message          | severity |
+----+---------+------------------+----------+
|  1 | info    | new user created |        0 |
+----+---------+------------------+----------+
1 row in set (0.00 sec)

We can see that the logs_archive table has been filled with rows deleted from logs table.

Logging

Triggers can be used to automatically log significant events. As an example, let’s say I have a social network application, in which an ‘online_user’ table lists those users which have logged in and have not yet logged out (hence they are assumed to be online):

DROP TABLE IF EXISTS `online_user`;
CREATE TABLE `online_user` (
  `online_user_id` int(11) NOT NULL auto_increment,
  `login` VARCHAR(64) CHARSET ascii NOT NULL,
  `ipv4` INT UNSIGNED NOT NULL,
  `ts` TIMESTAMP,
  PRIMARY KEY  (`online_user_id`)
);

Our application knows how to handle this table. I can enhance my database with logging by adding a logs table, and additional triggers:

DROP TABLE IF EXISTS `logs`;
CREATE TABLE `logs` (
  `logs_id` int(11) NOT NULL auto_increment,
  `ts` TIMESTAMP,
  `message` VARCHAR(255) CHARSET utf8 NOT NULL,
  PRIMARY KEY  (`logs_id`)
);

DELIMITER $$

DROP TRIGGER IF EXISTS online_user_ai $$
CREATE TRIGGER online_user_ai AFTER INSERT ON online_user
FOR EACH ROW
BEGIN
  INSERT INTO logs (message) VALUES (CONCAT('User ',NEW.login, ' has logged in from ', INET_NTOA(NEW.ipv4)));
END $$

DROP TRIGGER IF EXISTS online_user_ad $$
CREATE TRIGGER online_user_ad AFTER DELETE ON online_user
FOR EACH ROW
BEGIN
  INSERT INTO logs (message) VALUES (CONCAT('User ',OLD.login, ' has logged out'));
END $$

DELIMITER ;

Let’s see the effect of managing online users:

INSERT INTO online_user (login, ipv4) VALUES ('john', 123456);
INSERT INTO online_user (login, ipv4) VALUES ('mark', 654321);
SELECT SLEEP(12);
DELETE FROM online_user WHERE login = 'john';

Checking up on the logs table, we get:

mysql> SELECT * FROM `logs`;
+---------+---------------------+------------------------------------------+
| logs_id | ts                  | message                                  |
+---------+---------------------+------------------------------------------+
|       1 | 2008-12-22 11:16:31 | User john has logged in from 0.1.226.64  |
|       2 | 2008-12-22 11:16:31 | User mark has logged in from 0.9.251.241 |
|       3 | 2008-12-22 11:16:43 | User john has logged out                 |
+---------+---------------------+------------------------------------------+
3 rows in set (0.00 sec)

The logs table can be used for logging any change in any table. The application need not be aware what exactly is being logged.

If the logs table uses the MyISAM storage engine, the triggers may want to replace the INSERT with an INSERT DELAYED, so that they return immediately without waiting for locks on the logs table. Assuming no crash occurs right after, a separate thread will collect all inserts on the logs table, and handle them in its own free time.

More to come

More triggers use case, as well as limitations and workarounds, will be presented in following posts.

Triggers Use Case Compilation, Part II

]]>
https://shlomi-noach.github.io/blog/mysql/triggers-use-case-compilation-part-i/feed 19 313
Using triggers to block malicious code: an example https://shlomi-noach.github.io/blog/mysql/using-triggers-to-block-malicious-code-an-example https://shlomi-noach.github.io/blog/mysql/using-triggers-to-block-malicious-code-an-example#comments Thu, 01 Jan 2009 21:05:54 +0000 https://shlomi-noach.github.io/blog/?p=145 Web applications face constant exploitation attempts. Those with a user base must keep their users’ private data, well… private.

While the MySQL security model allows restricting users access to databases, tables and even columns, it has no built in feature for restricting the rows access within the given table.

One cannot allow a user to only update rows 0 through 99, but restrict that user from updating rows 100 to 199. Such restrictions are usually managed in the application level, by adding a necessary “… AND filtering_column = some_value…”

Many web application have the notion of an ‘admin’ account, or several such accounts, which provide greater control over the application. The ‘admin’ account is one account to which many attacks are targeted. One such attack is an attempt to modify the admin’s password, such that the attacker can later log in with and access restricted data.

Assume the following table:

CREATE TABLE my_users (
  ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  username VARCHAR(32) CHARSET ascii NOT NULL,
  password VARCHAR(32) CHARSET ascii NOT NULL COLLATE ascii_bin,
  UNIQUE KEY(username)
);

Let us also assume we are somewhat careful, so that the passwords are not plaintext, but rather encoded with MD5.

INSERT INTO my_users (username, password) VALUES
  ('admin', MD5('qwerty')) ; -- Safe password as can be found!
INSERT INTO my_users (username, password) VALUES
  ('alice', MD5('123456')) ; -- Safer yet!

SELECT * FROM my_users;
+----+----------+----------------------------------+
| ID | username | password                         |
+----+----------+----------------------------------+
|  1 | admin    | d8578edf8458ce06fbc5bb76a58c5ca4 |
|  2 | alice    | e10adc3949ba59abbe56e057f20f883e |
+----+----------+----------------------------------+
2 rows in set (0.00 sec)

An attacker will try to set the password for the admin account using security holes in the web application. The web application may execute the following query:

UPDATE my_users SET password=MD5('att@cker!') WHERE username='admin';

The issued query is valid, and should generally be allowed. However, we may decide to block changes to the specific ‘admin’ row, in the following manner:

DELIMITER $$
DROP TRIGGER IF EXISTS my_users_bu $$
CREATE TRIGGER my_users_bu BEFORE UPDATE ON my_users
FOR EACH ROW
BEGIN
  IF (NEW.username='admin') THEN
    SELECT 0 INTO @admin_error FROM `Cannot modify admin data!`;
  END IF;
END $$
DELIMITER ;

Let’s try running again the query:

UPDATE my_users SET password=MD5('att@cker!') WHERE username='admin';

ERROR 1146 (42S02): Table 'world.Cannot modify admin data!' doesn't exist

The query fails, since the BEFORE UPDATE trigger fails.
We can tweak the trigger to only allow specific users to modify the row:

DELIMITER $$
DROP TRIGGER IF EXISTS my_users_bu $$
CREATE TRIGGER my_users_bu BEFORE UPDATE ON my_users
FOR EACH ROW
BEGIN
  IF (NEW.username='admin' AND USER() != 'root@localhost') THEN
    SELECT 0 INTO @admin_error FROM `Cannot modify admin data!`;
  END IF;
END $$
DELIMITER ;

This way it is possible for the root user to modify the password at will. We can further tweak the trigger to INSERT INTO some log table. The information we may wish to register is USER(), the CURRENT_TIMESTAMP(), old password and new password, and perhaps the CONNECTION_ID(). More data means more means to locate the security breach, and monitoring the log table allows for immediate response for such an attempt.

]]>
https://shlomi-noach.github.io/blog/mysql/using-triggers-to-block-malicious-code-an-example/feed 8 145
Using memcached functions for MySQL; an automated alternative to Query Cache https://shlomi-noach.github.io/blog/mysql/using-memcached-functions-for-mysql-an-automated-alternative-to-query-cache https://shlomi-noach.github.io/blog/mysql/using-memcached-functions-for-mysql-an-automated-alternative-to-query-cache#comments Mon, 15 Dec 2008 05:56:14 +0000 https://shlomi-noach.github.io/blog/?p=89 There’s a lot of buzz around memcached. memcached is widely used, and has clients for many programming languages and platforms. TangentOrg have developed a memcached client in the form of MySQL UDFs (User Defined Functions).

I wish to discuss the memcached functions for MySQL: if and how they should be used.

Disclaimer: I do not work with memcached functions for MySQL on a production system; all that is written here reflects my opinion on how things should be done.

With memcached functions for MySQL, we can do the following:

SELECT memc_set('mykey', 'The answer is 42');
SELECT memc_get('mykey');

(See my previous post on how to install memcached functions for MySQL).

In what scenario should we use these functions?

I believe memcached is the right tool for the application level. I am less enthusiastic about using it from MySQL. Sure, pushing it down to MySQL centralizes everything. Instead of having all my application code (PHP, Java etc.) access memcached separately, they can all access one single MySQL node, which gets to access memcached. I see two problems with this approach:

  • Doing this adds load on the database. I think the greatest advantage of memcached is that it allows us to alleviate load from the database. By pushing everything into MySQL we counter that benefit. We pay here both for loading the MySQL network and for the CPU consumed by MySQL to do the job. In a distributed application which used memcached, every server gets to take some of the load.
  • It seems to me as a flawed design. The database should be at an end point, and should not rely on anything except the operating system, file system and network. Sure, there could be applications talking to the database, but the database should be able to work all by itself. By putting memcached behind the database, we make the database dependent upon an external application.

How about memcached increments?

memcached provides an increment mechanism, which can be used by MySQL to create distinct PRIMARY KEYs, like sequences in other databases. While this seems attractive, this feature fits most into the second point above: it makes MySQL completely dependant on memcached. So if memcached is down, MySQL is unable to generate keys.

memcahced invalidation

I believe a very good use would be to let MySQL invalidate cached data. Not set or get anything, just invalidate. To explain, let’s compare with MySQL’s query cache. I’ll be using MySQL’s world database.

It is a known issue with the query cache, that if you change (INSERT/UPDATE/DELETE) data within a certain table, all queries involved with that table are invalidated. Take a look at the following:

SELECT * FROM City WHERE CountryCode='BLZ';
UPDATE City SET Population=Population+1 WHERE CountryCode='CHE';
SELECT * FROM City WHERE CountryCode='BLZ';

The UPDATE does not affect the results for the SELECT query. Nevertheless, the second SELECT does not return from the query cache, since it’s invalidated by the UPDATE.

memcached can be used to solve this problem in a programmatic way. Let’s look at a short python program: memcached_test.py. What is does (see blue highlighted rows) is connect to memcached; connect to MySQL, and try to get the results for following from memcached:

SELECT * FROM City WHERE CountryCode='BLZ';
SELECT * FROM City WHERE CountryCode='CHE';

If these results are in memcached, they are returned immediately. If not, they are retrieved from MySQL, then inserted into memcached. The results for ‘CHE’ are under the ‘City:CHE’ key, and ‘BLZ’ is under ‘City:BLZ’.

import MySQLdb
import memcache

def select_cities_by_country(country_code):
	key = "City:"+country_code
	cities = memcache_client.get(key)
	if cities:
		found_in_memcached = True
	else:
		cursor = conn.cursor()
		cursor.execute("""
			SELECT Name, CountryCode,
			Population FROM City
			WHERE CountryCode=%s""",
				country_code)
		cities = cursor.fetchall()
		memcache_client.set(key, cities, 100)
		cursor.close()
		found_in_memcached = False
	for row in cities:
		print "%s, %s: %d" % (row[0], row[1], row[2])
	print "%s found in memcached? %s\n" % (
                country_code, found_in_memcached)

conn = None
try:
	try:
		conn = MySQLdb.connect(
			host="localhost",
                        user="myuser",
			passwd="mypassword",
			unix_socket="/tmp/mysql.sock",
                        db="world")
		memcache_client = memcache.Client(["127.0.0.1:11211"])

		select_cities_by_country("BLZ");
		select_cities_by_country("CHE");
	except Exception, err:
		print err
finally:
	if conn:
		conn.close()

Let’s run this program. This is a first time run, so obviously nothing is in memcached:

$ python memcached_test.py
Belize City, BLZ: 55810
Belmopan, BLZ: 7105
BLZ found in memcached? False

Zurich, CHE: 336800
Geneve, CHE: 173500
Basel, CHE: 166700
Bern, CHE: 122700
Lausanne, CHE: 114500
CHE found in memcached? False

Immediately executed again, we get results from memcached:

$ python memcached_test.py
Belize City, BLZ: 55810
Belmopan, BLZ: 7105
BLZ found in memcached? True

Zurich, CHE: 336800
Geneve, CHE: 173500
Basel, CHE: 166700
Bern, CHE: 122700
Lausanne, CHE: 114500
CHE found in memcached? True

We are going to execute the following query:

UPDATE City SET Population=Population+1 WHERE CountryCode='CHE';

But nothing as yet will invalidate our memcached values. Let’s set up TRIGGERs on the City table:

DELIMITER $$

DROP TRIGGER IF EXISTS City_AI $$
CREATE TRIGGER City_AI AFTER INSERT ON City
FOR EACH ROW
BEGIN
  SELECT memc_delete(CONCAT('City:',NEW.CountryCode)) INTO @discard;
END;
$$

DROP TRIGGER IF EXISTS City_AU $$
CREATE TRIGGER City_AU AFTER UPDATE ON City
FOR EACH ROW
BEGIN
  SELECT memc_delete(CONCAT('City:',OLD.CountryCode)) INTO @discard;
  SELECT memc_delete(CONCAT('City:',NEW.CountryCode)) INTO @discard;
END;
$$

DROP TRIGGER IF EXISTS City_AD $$
CREATE TRIGGER City_AD AFTER DELETE ON City
FOR EACH ROW
BEGIN
  SELECT memc_delete(CONCAT('City:',OLD.CountryCode)) INTO @discard;
END;
$$

DELIMITER ;

These triggers will cause any change to a city invalidates all cities in the same country. Naive? Far less than MySQL’s query cache. Let’s put this to the test:

mysql> UPDATE City SET Population=Population+1 WHERE CountryCode='CHE';
Query OK, 5 rows affected (0.01 sec)
Rows matched: 5  Changed: 5  Warnings: 0

And run out python program one last time:

$ python memcached_test.py
Belize City, BLZ: 55810
Belmopan, BLZ: 7105
BLZ found in memcached? True

Zurich, CHE: 336801
Geneve, CHE: 173501
Basel, CHE: 166701
Bern, CHE: 122701
Lausanne, CHE: 114501
CHE found in memcached? False

Right! The ‘CHE’ values were invalidated, and could not be found in memcaches. ‘BLZ’, however, wasn’t disturbed.

We can further improve our invalidation mechanism to check only for changes for desired columns. This will require some more code in our triggers.

Notes

The triggers themselves pose a performance penalty on our code. It is assumed that SELECTs are more important here, or else we would not use caching at all. At any case, the example provided here has not been benchmarked, and its value can only be estimated in your real life situation.

Conclusion

I believe invalidation is the most interesting part of memcached functions for MySQL. It makes the most sense:

  • No data passes between MySQL and memcached.
  • The application isn’t even aware that MySQL is talking to memcached. MySQL does everything internally using triggers.
  • MySQL does not depend on memcached. If memcached goes away, the triggers will simply have no effect. It is still possible that due to temporary network failure, an invalidation is skipped. But memcached supports us by adding a timeout for cached values, so we have some kind of “backup plan”.

Please share below your insights and real life experience with memcached functions for MySQL.

]]>
https://shlomi-noach.github.io/blog/mysql/using-memcached-functions-for-mysql-an-automated-alternative-to-query-cache/feed 1 89