{"id":5311,"date":"2013-01-08T13:19:56","date_gmt":"2013-01-08T11:19:56","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=5311"},"modified":"2013-01-08T13:19:56","modified_gmt":"2013-01-08T11:19:56","slug":"hierarchical-data-in-information_schema-and-derivatives","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/hierarchical-data-in-information_schema-and-derivatives","title":{"rendered":"Hierarchical data in INFORMATION_SCHEMA and derivatives"},"content":{"rendered":"<p>Just how often do you encounter hierarchical data? Consider a table with some parent-child relation, like the this classic <strong>employee<\/strong> table:<\/p>\n<blockquote>\n<pre>CREATE TABLE employee (\r\n\u00a0 employee_id INT UNSIGNED PRIMARY KEY,\r\n\u00a0 employee_name VARCHAR(100),\r\n\u00a0 manager_id INT UNSIGNED,\r\n\u00a0 CONSTRAINT `employee_manager_fk` FOREIGN KEY (manager_id) REFERENCES employee (employee_id)\r\n) engine=innodb\r\n;\r\n+-------------+---------------+------------+\r\n| employee_id | employee_name | manager_id |\r\n+-------------+---------------+------------+\r\n|\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1 | Rachel\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 NULL |\r\n|\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2 | John\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1 |\r\n|\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 3 | Stan\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1 |\r\n|\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 4 | Naomi\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2 |\r\n+-------------+---------------+------------+<\/pre>\n<\/blockquote>\n<p>Questions like <em>&#8220;What is the (nested) list of employees managed by Rachel?&#8221;<\/em> or <em>&#8220;Get Naomi&#8217;s managers chain of command&#8221;<\/em> are classical questions. There are sometimes dependencies: if John leaves, does that mean Naomi loses her job, or does she get promoted, or something else? If John and Stan are on sick leave, does Rachel have any reason to come to work?<\/p>\n<p>Hierarchical data is not limited to a single-table structure. Sometimes it takes a combination of a few tables (it&#8217;s just a <strong>JOIN<\/strong>) to make out the parent-child relation.<\/p>\n<p>Hierarchical data is difficult to manage with SQL. This is especially true for MySQL, which does not support the <strong>WITH<\/strong> recursive query syntax.<\/p>\n<h4>Where can you find hierarchical data?<\/h4>\n<p>Even if you do not provide it by yourself, MySQL&#8217;s <strong>INFORMATION_SCHEMA<\/strong> has some for you. Partly obvious, partly implicit, here are some examples:<!--more--><\/p>\n<h4>Foreign Keys<\/h4>\n<p>This is probably the most obvious hierarchical dataset in <strong>INFORMATION_SCHEMA<\/strong>. The <strong>KEY_COLUMN_USAGE<\/strong> table lists the table dependencies based on foreign key constraints. It&#8217;s a bit confusing, as it also lists <strong>UNIQUE<\/strong> constraints, and the complete list of FKs are in <strong>REFERENTIAL_CONSTRAINTS<\/strong> &#8212; but the latter table does not list the dependencies. You typically want to join both tables to get the complete information.<\/p>\n<p>So, taking <strong>sakila<\/strong>&#8216;s DVD rental shop sample,\u00a0 <strong>film_actor<\/strong> table depends on <strong>film<\/strong> as well as on <strong>actor<\/strong> via foreign keys. <strong>film<\/strong> can depend on the <strong>category<\/strong> table, etc. The hierarchies can turn to be very complex, with multiple roots an very deep branches.<\/p>\n<p>Dependency questions are very clear when speaking of foreign keys: what happens when we delete some <strong>film<\/strong> record? Does it hold that we also delete all references to that film on <strong>film_actor<\/strong>? Or do we deny deletion of said film in such case?<\/p>\n<h4>Redundant Keys<\/h4>\n<p>The <strong>KEY (col1, col2)<\/strong> makes the <strong>KEY (col1)<\/strong> redundant. The latter is not strictly required (though you may wish to keep it for covering index performance reason). The list of <em>dominant-redundant<\/em> keys makes for hierarchical data. It is typically very shallow (keys can only be redundant within the scope of a single table &#8212; how deep can you get with such small dataset?)<\/p>\n<p>There is no immediate reference in <strong>INFORMATION_SCHEMA<\/strong> as for redundant keys, but it can be inferred by self joining the\u00a0<strong>STATISTICS<\/strong> table onto itself. It is not immediate, since you need to do some aggregation and check for particular cases. For example, <strong>UNIQUE KEY (col1, col2)<\/strong> is actually made redundant by <strong>UNIQUE KEY (col1)<\/strong>, which is just the opposite from our previous example.<\/p>\n<p><a href=\"http:\/\/code.google.com\/p\/common-schema\">common_schema<\/a> provides with this implicit information now turned explicit, in the for of the <a href=\"http:\/\/common-schema.googlecode.com\/svn\/trunk\/common_schema\/doc\/html\/redundant_keys.html\">redundant_keys<\/a> view: the <strong>redundant_index_name<\/strong> and <strong>dominant_index_name<\/strong> columns make for parent-child relationship.<\/p>\n<p>Dependency questions are a bit redundant here: the general objective is to <em>not have<\/em> dependencies. So get rid of redundant keys &#8211; and do so wisely. There&#8217;s a good discussion of index redundancies on <strong>redundant_keys<\/strong>&#8216;s <a href=\"http:\/\/common-schema.googlecode.com\/svn\/trunk\/common_schema\/doc\/html\/redundant_keys.html\">documentation<\/a>.<\/p>\n<h4>Locked transactions<\/h4>\n<p>A transaction is locked. It is locked because another transaction holds some locks needed by locked transaction. But this transaction in itself can obtain locks needed by yet other transactions. And so we can get a hierarchy of locked transaction. The &#8220;parent&#8221; is the one blocking the &#8220;child&#8221;.<\/p>\n<p>Combining InnoDB&#8217;s <strong>INNODB_TRX<\/strong> &#8211; <strong>INNODB_LOCK_WAITS<\/strong> &#8211; <strong>INNODB_TRX<\/strong> tables, we get this information. <em>common_schema<\/em> provides with this inferred data in the form of <a href=\"http:\/\/common-schema.googlecode.com\/svn\/trunk\/common_schema\/doc\/html\/innodb_locked_transactions.html\">innodb_locked_transactions<\/a>.<\/p>\n<p>Hierarchies in locked-transactions could be deep, and they can most commonly be <em>wide<\/em>. A single transaction can lock dozens of others.<\/p>\n<p>Dependency questions are <em>&#8220;would killing this transaction release all other waiting transactions?&#8221;<\/em>; <em>&#8220;What is <\/em>the one<em> transaction I need to kill in order to release the bottleneck?&#8221;<\/em>; <em>&#8220;Why are these transactions related in the first place? Can I remove this dependency?&#8221;<\/em>. etc.<\/p>\n<h4>Locks<\/h4>\n<p>You can look at the same dataset as above from a different angle. Instead of looking at transaction-lock-transaction, you can look at lock-transaction-lock. Which locks are causing other locks to be held? This is counter-intuitive to our understanding of how things work, but is valid nonetheless.<\/p>\n<h4>Views<\/h4>\n<p>A <strong>VIEW<\/strong> can query a table or another view. It can join multiple views. This hierarchy of view-reading-from-view can turn out to be complex; if not for the human mind then for the optimizer.<\/p>\n<p>Surprisingly, there is no data in <strong>INFORMATION_SCHEMA<\/strong>, other than the <strong>CREATE VIEW<\/strong> clause, to help us out in resolving these dependencies. Even more surprising is MySQL&#8217;s own inability to get clear conclusions itself. The view definition clause is parsed, re-parsed and re-evaluated whenever information on &#8220;parent&#8221; views is required. For example, try to <strong>DESCRIBE<\/strong> a view. How does MySQL deduce the data types of columns? It dives in head first into the hierarchy, crawls and parses view definitions, till it resolves the answer. The <a href=\"http:\/\/code.openark.org\/forge\/mycheckpoint\">mycheckpoint<\/a> projects uses view hierarchies intensively. It draws powers from the hierarchy and produces some surprising data (<a href=\"http:\/\/code.openark.org\/forge\/mycheckpoint\/documentation\/generating-google-charts\">charts<\/a> from raw data, for example). But it also suffers from MySQL indirect inference of views. Checking up a deep-nested <em>mycehckpoint<\/em> view in <strong>INFORMATION_SCHEMA<\/strong> makes for a heavyweight dive for MySQL into locks and table handles.<\/p>\n<p>Dependency questions are <em>&#8220;what is the type of this column?&#8221;<\/em>, <em>&#8220;are there any <strong>TEMPTABLE<\/strong> views along the chain of execution? Or are all <strong>MERGE<\/strong> views?&#8221;<\/em> and more.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Just how often do you encounter hierarchical data? Consider a table with some parent-child relation, like the this classic employee table: CREATE TABLE employee ( \u00a0 employee_id INT UNSIGNED PRIMARY KEY, \u00a0 employee_name VARCHAR(100), \u00a0 manager_id INT UNSIGNED, \u00a0 CONSTRAINT `employee_manager_fk` FOREIGN KEY (manager_id) REFERENCES employee (employee_id) ) engine=innodb ; +&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;+ | employee_id | employee_name [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[24,21],"class_list":["post-5311","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-information_schema","tag-sql"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-1nF","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/5311","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=5311"}],"version-history":[{"count":19,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/5311\/revisions"}],"predecessor-version":[{"id":5981,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/5311\/revisions\/5981"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=5311"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=5311"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=5311"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}