Hierarchical data in INFORMATION_SCHEMA and derivatives

Just how often do you encounter hierarchical data? Consider a table with some parent-child relation, like the this classic employee table:

CREATE TABLE employee (
  employee_id INT UNSIGNED PRIMARY KEY,
  employee_name VARCHAR(100),
  manager_id INT UNSIGNED,
  CONSTRAINT `employee_manager_fk` FOREIGN KEY (manager_id) REFERENCES employee (employee_id)
) engine=innodb
;
+-------------+---------------+------------+
| employee_id | employee_name | manager_id |
+-------------+---------------+------------+
|           1 | Rachel        |       NULL |
|           2 | John          |          1 |
|           3 | Stan          |          1 |
|           4 | Naomi         |          2 |
+-------------+---------------+------------+

Questions like “What is the (nested) list of employees managed by Rachel?” or “Get Naomi’s managers chain of command” are classical questions. There are sometimes dependencies: if John leaves, does that mean Naomi loses her job, or does she get promoted, or something else? If John and Stan are on sick leave, does Rachel have any reason to come to work?

Hierarchical data is not limited to a single-table structure. Sometimes it takes a combination of a few tables (it’s just a JOIN) to make out the parent-child relation.

Hierarchical data is difficult to manage with SQL. This is especially true for MySQL, which does not support the WITH recursive query syntax.

Where can you find hierarchical data?

Even if you do not provide it by yourself, MySQL’s INFORMATION_SCHEMA has some for you. Partly obvious, partly implicit, here are some examples:

Foreign Keys

This is probably the most obvious hierarchical dataset in INFORMATION_SCHEMA. The KEY_COLUMN_USAGE table lists the table dependencies based on foreign key constraints. It’s a bit confusing, as it also lists UNIQUE constraints, and the complete list of FKs are in REFERENTIAL_CONSTRAINTS — but the latter table does not list the dependencies. You typically want to join both tables to get the complete information.

So, taking sakila‘s DVD rental shop sample,  film_actor table depends on film as well as on actor via foreign keys. film can depend on the category table, etc. The hierarchies can turn to be very complex, with multiple roots an very deep branches.

Dependency questions are very clear when speaking of foreign keys: what happens when we delete some film record? Does it hold that we also delete all references to that film on film_actor? Or do we deny deletion of said film in such case?

Redundant Keys

The KEY (col1, col2) makes the KEY (col1) redundant. The latter is not strictly required (though you may wish to keep it for covering index performance reason). The list of dominant-redundant keys makes for hierarchical data. It is typically very shallow (keys can only be redundant within the scope of a single table — how deep can you get with such small dataset?)

There is no immediate reference in INFORMATION_SCHEMA as for redundant keys, but it can be inferred by self joining the STATISTICS table onto itself. It is not immediate, since you need to do some aggregation and check for particular cases. For example, UNIQUE KEY (col1, col2) is actually made redundant by UNIQUE KEY (col1), which is just the opposite from our previous example.

common_schema provides with this implicit information now turned explicit, in the for of the redundant_keys view: the redundant_index_name and dominant_index_name columns make for parent-child relationship.

Dependency questions are a bit redundant here: the general objective is to not have dependencies. So get rid of redundant keys – and do so wisely. There’s a good discussion of index redundancies on redundant_keys‘s documentation.

Locked transactions

A transaction is locked. It is locked because another transaction holds some locks needed by locked transaction. But this transaction in itself can obtain locks needed by yet other transactions. And so we can get a hierarchy of locked transaction. The “parent” is the one blocking the “child”.

Combining InnoDB’s INNODB_TRXINNODB_LOCK_WAITSINNODB_TRX tables, we get this information. common_schema provides with this inferred data in the form of innodb_locked_transactions.

Hierarchies in locked-transactions could be deep, and they can most commonly be wide. A single transaction can lock dozens of others.

Dependency questions are “would killing this transaction release all other waiting transactions?”; “What is the one transaction I need to kill in order to release the bottleneck?”; “Why are these transactions related in the first place? Can I remove this dependency?”. etc.

Locks

You can look at the same dataset as above from a different angle. Instead of looking at transaction-lock-transaction, you can look at lock-transaction-lock. Which locks are causing other locks to be held? This is counter-intuitive to our understanding of how things work, but is valid nonetheless.

Views

A VIEW can query a table or another view. It can join multiple views. This hierarchy of view-reading-from-view can turn out to be complex; if not for the human mind then for the optimizer.

Surprisingly, there is no data in INFORMATION_SCHEMA, other than the CREATE VIEW clause, to help us out in resolving these dependencies. Even more surprising is MySQL’s own inability to get clear conclusions itself. The view definition clause is parsed, re-parsed and re-evaluated whenever information on “parent” views is required. For example, try to DESCRIBE a view. How does MySQL deduce the data types of columns? It dives in head first into the hierarchy, crawls and parses view definitions, till it resolves the answer. The mycheckpoint projects uses view hierarchies intensively. It draws powers from the hierarchy and produces some surprising data (charts from raw data, for example). But it also suffers from MySQL indirect inference of views. Checking up a deep-nested mycehckpoint view in INFORMATION_SCHEMA makes for a heavyweight dive for MySQL into locks and table handles.

Dependency questions are “what is the type of this column?”, “are there any TEMPTABLE views along the chain of execution? Or are all MERGE views?” and more.

2 thoughts on “Hierarchical data in INFORMATION_SCHEMA and derivatives

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.