MySQL/QueryScript use case: DELETE all but top N records per group

Some administrative tasks can be simplified by using common_schema/QueryScript. I’m collecting a bunch of these for documentation. Here’s one for example:

The DBA/developer has the task of retaining only top 3 most populated countries per continent. That is, she has to DELETE 4th, 5th, 6th, … most populated counties in each continent.

Is it possible to work out with a single query? Yes. But the query is not pretty. In fact, it is quite complicated, and either involves unintuitive subqueries, or unintuitive hacks. A normal DBA would not want to write, neither maintain this kind of query, unless top-notch-geek, which is fine.

Since this is a one time job, we just need to get it done. And common_schema/QueryScript provide with the intuitive solution: if we read our demand aloud, we realize we want to delete 4th, 5th, 6th, … populated countries for each continent.

I present a solution made available by QueryScript, and discuss the ways in which the code overcomes limitations, or simplifies complexity:

var $num_countries_to_delete;
foreach($continent, $num_countries: SELECT continent, COUNT(*) FROM world.Country GROUP BY continent)
{
  if ($num_countries > 3)
  {
    set $num_countries_to_delete := $num_countries - 3;
    DELETE FROM world.Country WHERE Continent = $continent ORDER BY Population ASC LIMIT :$num_countries_to_delete;
  }
}

Discussion

The first thing that should be apparent from the above is that this is a programmatic solution. Queries are declarative, which is why complex ones sometimes look incomprehensible. The above is more straightforward. Continue reading » “MySQL/QueryScript use case: DELETE all but top N records per group”

common_schema rev. 218: QueryScript, throttling, processes, documentation

common_schema, revision 218 is released, with major new features, top one being server side scripting. Here are the highlights:

  • QueryScript: server side scripting is now supported by common_schema, which acts as an interpreter for QueryScript code.
  • Throttling for queries is now made available via the throttle() function.
  • Enhancements to processlist-related views, including the new slave_hosts view.
  • Inline documentation/help is available via the help() routine.
  • more…

QueryScript

common_schema makes for a QueryScript implementation for MySQL. You can run server side scripts, interpreted by common_schema, which allow for easy syntax and greater power than was otherwise previously available on the MySQL server. For example:

foreach($table, $schema, $engine: table like '%')
  if ($engine = 'ndbcluster')
    ALTER ONLINE TABLE :$schema.:$table REORGANIZE PARTITION;

QueryScript includes flow control, conditional branching, variables & variable expansion, script throttling and more.

Read more on common_schema’s QueryScript implementation.

Continue reading » “common_schema rev. 218: QueryScript, throttling, processes, documentation”

QueryScript: SQL scripting language

Introducing QueryScript: a programming language aimed for SQL scripting, seamlessly combining scripting power such as flow control & variables with standard SQL statements or RDBMS-specific commands.

QueryScript is available fro MySQL via common_schema, which adds MySQL-specific usage.

What does QueryScript look like? Here are a few code samples:

Turn a bulk DELETE operation into smaller tasks. Throttle in between.

while (DELETE FROM archive.events WHERE ts < CURDATE() LIMIT 1000)
{
  throttle 2;
}

Convert all InnoDB tables in the ‘sakila’ database to compressed format:

foreach ($table, $schema, $engine: table in sakila)
{
  if ($engine = 'InnoDB')
    ALTER TABLE :$schema.:$table ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;
}

Shard your data across multiple schemata:

foreach($shard: {USA, GBR, JAP, FRA})
{
  CREATE DATABASE db_:$shard;
  CREATE TABLE db_:$shard.city LIKE world.City;
  INSERT INTO db_:$shard.city SELECT * FROM world.City WHERE CountryCode = $shard;
}

Continue reading » “QueryScript: SQL scripting language”

MySQL command line vs. visual editors – reflections

My previous post drew some attention, and in particular two comments I wish to relate to. I also wish to make a more fine-grained observation on visual editors.

One comment is by Peter Laursen, who rejected the generalization in my post, and another by wlad, who was harsher (but to the point), suggesting my post translated to “I don’t know it, therefore it sucks”.

I must have delivered the wrong message, since apparently people read my post as “don’t use visual editors, they are bad for you”, which is not what I intended to say, nor is it what I actually think. I took a very specific aspect of visual editors and commented on that. My comment should not extrapolate to “anything about visual editors is bad”.

I don’t think people should only be using the command line, and am not in the least interested in vi/emacs/eclipse/netbeans kind of wars. Of course everyone should be using whatever works best for them in terms of productivity and ease of use. And of course visual editors have great advantages over command line.

Continue reading » “MySQL command line vs. visual editors – reflections”

MySQL command line vs. visual editors

Students in my training classes usually prefer to use some kind of visual editor for MySQL. Typically this would be the software they’re using at work. Sometimes they just bring over their laptops with the software installed. Or they would use MySQL Workbench, which is what I usually have pre-installed on their desktops.

I see MySQL Workbench, SQLyog, Toad for MySQL, or several more.

I always humbly suggest they close down their software and open up a command line.

It isn’t fancy. It may not even be convenient (especially on Windows, in my opinion). And repeating your last command with a minor modification requires a lot of key stroking. Or you would copy+paste from some text editor. Most students will give it a shot, then go back to their favorite editor.

Well, again and again I reach the same conclusion:

Visual editors are not as trustworthy as the command line.

Time and again students show me something on their editor. Behavior seems strange to me. Opening up a console shows a completely different picture.

Things like: Continue reading » “MySQL command line vs. visual editors”

Syntax of the day: IS TRUE and IS FALSE

What makes for a true statement?

We usually test statements using a WHERE clause:

SELECT * FROM world.City WHERE Population > 1000000

The “Population > 1000000” statement makes for a boolean expression. Using WHERE is just one way of evaluating it. One can also test with IF():

SET @val := 7;
SELECT IF(@val > 2, 'Yes', 'No')

TRUE and FALSE

The two are keywords. They also map for the numerals 1 and 0, as follows:

mysql> SELECT TRUE, FALSE;
+------+-------+
| TRUE | FALSE |
+------+-------+
|    1 |     0 |
+------+-------+

Like in the C programming language, a nonzero value evaluates to a true value. A zero evaluates to false. A NULL evaluates to… well, NULL. But aside from 3-valued logic, what’s important in our case is that it is not true. Continue reading » “Syntax of the day: IS TRUE and IS FALSE”

Documentation in SQL: CALL for help()

Documentation is an important part of any project. On the projects I maintain I put a lot of effort on documentation, and, frankly, the majority of time spent on my projects is on documentation.

The matter of keeping the documentation faithful is a topic of interest. I’d like to outline a few documentation bundling possibilities, and the present the coming new documentation method for common_schema. I’ll talk about any bundling that is NOT man pages.

High level: web docs

This is the initial method of documentation I used for openark kit and mycheckpoint. It’s still valid for mycheckpoint. Documentation is web-based. You need Internet access to read it. It’s in HTML format.

Well, not exactly HTML format: I wrote it in WordPress. Yes, it’s HTML, but there’s a lot of noise around (theme, menus, etc.) which is not strictly part of the documentation.

While this is perhaps the easiest way to go, here’s a few drawbacks: Continue reading » “Documentation in SQL: CALL for help()”

Oracle “Technologist of the Year: Developer” Award

I am honored to receive Oracle’s Technologist of the Year: Developer award, formerly Oracle Magazine Editors’ Choice Awards.

Technologist of the Year Award is given for individuals for their technical achievements with regard to Oracle products.

As opposed to community based awards, to win this award one must be nominated by himself or his company. There are several award categories: Developer, DBA, IT Manager etc., and many nominations per category. I have been nominated by my company and am happy to have won the award in the Developer category.

Allow me to take the pleasant opportunity to make some acknowledgements. Continue reading » “Oracle “Technologist of the Year: Developer” Award”

Monitor your MySQL servers as done before, for free!

With genuine respect to other products, the people making those products and the efforts involved, I would like to make an issue:

Some things have been done before, done pretty well (if I may say so), and for free.

mycheckpoint is a free monitoring tool for MySQL. It uses an unorthodox approach of monitoring your MySQL servers by storing the monitored data in SQL format (actually, within a MySQL server). This allows you to either use the fancy JavaScript charting (by running mycheckpoint as HTTP server), or Do-It-Yourself diagnostics.

It allows for a great many more things, among which are custom queries and charts: you may add your own queries, to be included in the monitoring scheme. Query your AUTO_INCREMENT values to detect growth rate, query your INFORMATION_SCHEMA tables (if you dare) to learn about your data dimensions, query for query response time to detect performance issues with your website/database, or query your data itself (e.g. sales per day) to get insight about your product. Here are sample charts to illustrate. Continue reading » “Monitor your MySQL servers as done before, for free!”

INFORMATION_SCHEMA Optimizations: still crashing my servers

[Update: need to take more breaks: now NOT crashing my servers! See clarifications below]

INFORMATION_SCHEMA Optimizations are meant to make your INFORMATION_SCHEMA queries lighter and safer.

For example, if you’re going to query the COLUMNS table for just the columns of a single table, then the following:

SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA='sakila' AND TABLE_NAME='rental'

makes for an optimization: specifying a literal on TABLE_SCHEMA avoid scanning the directories of other schemata. Specifying a literal on TABLE_NAME avoids checking up on other tables. So it’s a one-schema-one-table read operation, as opposed to “first read every single column from all and any single schema and table, then return only those I’m interested in”.

Here’s the execution plan for the above query:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: COLUMNS
         type: ALL
possible_keys: NULL
          key: TABLE_SCHEMA,TABLE_NAME
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Using where; Open_frm_only; Scanned 0 databases

What I tried to do is to read the entire COLUMNS table, one schema at a time, one table at a time. I’m good with this taking longer time.

I have a production system on which reads from COLUMNS consistently crash the servers. Well, one read at a time can’t do harm, right? Continue reading » “INFORMATION_SCHEMA Optimizations: still crashing my servers”