August 23, 2013
Just read Ronald Bradford's post on an unnecessary 3am (emergency) call. I sympathize! Running out of disk space makes for some weird MySQL behaviour, and in fact whenever I encounter weird behaviour I verify disk space.
But here's a trick I've been using for years to avoid such cases and to be able to recover quickly. It helped me on such events as running out of disk space during ALTER TABLEs or avoiding purging of binary logs when slave is known to be under maintenance.
Ronald suggested it -- just put a dummy file in your @@datadir! I like putting a 1GB dummy file: I typically copy+paste a 1GB binary log file and call it "placeholder.tmp". Then I forget all about it. My disk space should not run out -- if it does it's a cause for emergency. I have monitoring, but sometimes I'm hoping to make an operation on 97%-99% utilization.
If I do run out of disk space: well, MySQL won't let me connect; won't complete an important statement; not sync transaction to disk -- bad situation. Not a problem in our case: we can magically recover 1GB worth of data from the @@datadir, buying us enough time (maybe just minutes) to gracefully complete so necessary operations; connect, KILL, shutdown, abort etc.
August 13, 2013
common_schema 2.2 is released. This is shortly after the 2.1 release; it was only meant as bug fixes release but some interesting things came up, leading to new functionality.
Highlights of the 2.2 release:
- Better QueryScript isolation & cleanup: isolation improved across replication topology, cleanup done even on error
- Added TokuDB related views
- split with "index" hint (Ike, this is for you)
- table_rotate(): a logrotate-like mechanism for tables
- better throw()
Better QueryScript isolation & cleanup
common_schema 2.1 introduced persistent tables for QueryScript. This also introduced the problem of isolating concurrent scripts, all reading from and writing to shared tables. In 2.1 isolation was based on session id. However although unique per machine, collisions were possible across replication topology: a script could be issued on master, another on slave (I have such use cases) and both use same (local) session id.
With 2.2 isolation is based on server_id & session id combination; this is unique across a replication topology.
Until 2.1, QueryScript used temporary tables. This meant any error would just break the script, and the tables were left (isolated as they were, and auto-destroyed in time). With persistent tables a script throwing an error meant legacy code piling up. With common_schema 2.2 and on MySQL >= 5.5 all exceptions are caught, cleanup is made, leaving exceptions to be RESIGNALled.
A couple TokuDB related views help out in converting to TokuDB and in figuring out tables status on disk: Continue Reading »
August 8, 2013
If you work with command line and know your SQL, q is a great tool to use:
q allows you to query your text files or standard input with SQL. You can:
SELECT c1, COUNT(*) FROM /home/shlomi/tmp/my_file.csv GROUP BY c1
And you can:
SELECT all.c2 FROM /tmp/all_engines.txt AS all LEFT JOIN /tmp/innodb_engines.txt AS inno USING (c1, c2) WHERE inno.c3 IS NULL
And you can also combine with your favourite shell commands and tools:
grep "my_term" /tmp/my_file.txt | q "SELECT c4 FROM - JOIN /home/shlomi/static.txt USING (c1)" | xargs touch
Some of q's functionality (and indeed, SQL functionality) can be found in command line tools. You can use grep for pseudo WHERE filtering, or cut for projecting, but you can only get so far with cat my_file.csv | sort | uniq -c | sort -n. SQL is way more powerful for working with tabulated data, and so q makes for a great addition into one's toolbox.
The tool is authored by my colleague Harel Ben-Attia, and is in daily use over at our company (it is in fact installed on all production servers).
It is of course free and open source (get it on GitHub, where you can also find documentation), and very easy to setup. Enjoy!
July 22, 2013
I'm happy with common_schema; it is in fact a tool I use myself on an almost daily basis. I'm also happy to see that it gains traction; which is why I'm exposing a little bit of my thoughts on general future development. I'd love to get feedback.
At this moment, common_schema supports MySQL >= 5.1, all variants. This includes 5.5, 5.6, MySQL, Percona Server & MariaDB.
5.1 is today past end of line, and I'm really missing the SIGNAL/RESIGNAL syntax that I would like to use; I can do in the meanwhile with version-specific code such as /*!50500 ... */. Nevertheless, I'm wondering whether I will eventually have to:
- Support different branches of common_schema (one that supports 5.1, one that supports >= 5.5)
- Stop support for 5.1
Of course community-wise, the former is preferred; but I have limited resources, so I would like to make a quick poll here:
I'll use the poll's results as a vague idea of what people use and want. Or please use comments below to sound your voice!
This was a crazy jump at providing a stored routine debugger and debugging API. From some talk I made I don't see this getting traction. For the time being, I don't see that I will concentrate my efforts on this. Actually it is almost complete. You can step-into, step-out, step-over, set breakpoints, read variables, modify variables -- it's pretty cool. Continue Reading »
July 17, 2013
common_schema 2.1 is released! common_schema is your free & open source companion schema within your MySQL server, providing with a function library, scripting capabilities, powerful routines and ready-to-apply information and recommendations.
New and noteworthy in version 2.1:
- Better QueryScript's split() functionality
- Persistent tables for QueryScript: no long held temporary tables
- Index creation analysis, further range partition analysis
- grant_access(): allow everyone to use common_schema
- Ascii charts, google charts
- debugged_routines: show routines with debug code
Other minor enhancements and bugfixes not listed.
Here's a breakdown of the above:
split is one of those parts of common_schema that (should) appeal to every DBA. Break a huge transaction automagically into smaller chunks, and don't worry about how it's done. If you like, throttle execution, or print progress, or...
split enhancements include:
- A much better auto-detection-and-selection of the chunking index. split now consults all columns covered by the index, and uses realistic heuristics to decide which UNIQUE KEY on your table is best for the chunking process. A couple bugs are solved on the way; split is much smarter now.
- Better support for multi-column chunking keys. You may now utilize the start/stop parameters even on multi column keys, passing a comma delimited of values for the split operation to start/end with, respectively. Also fixed issue for nonexistent start/stop values, which are now valid: split will just keep to the given range.
- split no longer requires a temporary table open through the duration of its operation. See next section. Continue Reading »
June 26, 2013
I had the pleasure of joining into @DBHangOps today, and speak about common_schema and openark-kit. What was meant to be a 15 minute session turned to be 50 -- sorry, people, I don't talk as much at home, but when it comes to my pet projects...
I also realized I was missing on a great event: DBHangOps is a hangout where you can chat and discuss MySQL & related technologies with friends and colleagues, with whom you typically only meet at conferences. I will certainly want to attend future events.
Thanks to John Cesario and Geoffrey Anderson who invited me to talk, and to the friends and familiar faces who attended; I was happy to talk about my work, and very interested in hearing about how it's being put to use. We also had time to discuss ps_helper with no other than Mark Leith!
The video is available on Twitter/YouTube.
openark-kit has also been featured on the OurSQL podcast by Sheeri & Gerry, who did great coverage of some tools. I will disclose that more is to come; I'm happy this is in capable hands and look further to hear the next episode!
June 21, 2013
I'm starting my new position as Senior Software Engineer at Outbrain. While I'm still to fully grasp the scope of my work, I will be handling data on all things MySQL, Hadoop, Cassandra, Kafka, more, more and more, at very large volumes.
I find Outbrain a great supporter of community and open source:
- Open source solutions are highly preferred over commercial solutions
- Outbrain is an early adopter for many open source technologies
- Employees are encouraged to contribute back as much as possible to the open source community
- It is happy to pay for support from companies developing open source
- Outbrain is a major participant in the ILTechTalks initiative: free, volunteer, professional exchange of knowledge
I can already testify I will be working with very smart and knowledgeable people; I expect to learn a lot and of course contribute from my own knowledge and skills.
I've been in process with some other companies, I'd like to kindly thank those companies I've been in touch with for their good will.
June 8, 2013
The two conservative ways of getting the number of rows in an InnoDB table are:
- SELECT COUNT(*) FROM my_table:
provides with an accurate number, but makes for a long running transaction which take ages on large tables. Long transactions make for locks
- SELECT TABLE_ROWS FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='my_schema' AND TABLE_NAME='my_table', or get same info via SHOW TABLE STATUS.
Gives immediate response, but the value can be way off; it can be two times as large as real value, or half the value. For query execution plans this may be a "good enough" estimation, but typically you just can't trust it for your own purposes.
Get a good estimate using chunks
You can get a good estimate by calculating the total number of rows in steps. Walk the table 1,000 rows at a time, and keep a counter. Each chunk is its own transaction, so, if the table is modified while counting, the final value does not make for an accurate account at any point in time. Typically this should be a far better estimate than TABLE_ROWS.
QueryScript's split() construct provides you with the means to work this out. Consider this script: Continue Reading »
June 5, 2013
Or: how to make it work in TokuDB version 7.0.1. This is a follow up on a discussion on the tokudb-user group.
I wanted to test TokuDB's compression. I took a staging machine of mine, with production data, and migrated it from Percona Server 5.5 To MariaDB 5.5+TokuDB 7.0.1. Migration went well, no problems.
To my surprise, when I converted tables from InnoDB to TokuDB, I saw an increase in table file size on disk. As explained by Tim Callaghan, this was due to TokuDB interpreting my compressed table's "KEY_BLOCK_SIZE=4" as an instruction for TokuDB's page size. TokuDB should be using 4MB block size, but thinks it's being instructed to use 4KB. Problem is, you can't get rid of table options. When one converts a table to InnoDB in ROW_FORMAT=COMPACT, or even to MyISAM, the KEY_BLOCK_SIZE option keeps lurking in the dark.
So until this is hopefully resolved in TokuDB's next version, here's a way to go around the problem. Continue Reading »
May 23, 2013
A new release for mycheckpoint: lightweight, SQL oriented MySQL monitoring solution.
If you're unfamiliar with mycheckpoint, well, the one minute sales pitch is: it's a free and open source monitoring tool for MySQL, which is extremely easy to install and execute, and which includes custom queries, alerts (via emails), and out of the box HTTP server and charting.
This is mostly a maintenance release, with some long-time requested features, and of course solved bugs. Here are a few highlights:
- Supports MariaDB and MySQL 5.6 (issues with new variables, space padded variables, text-valued variables)
- Supports alerts via function invocation on monitored host (so not only checking alerts via aggregated data like 'Seconds_behind_master' but also by SELECT my_sanity_check_function() on monitored instance). See alerts.
- Supports single-running-instance via "--single" command line argument
- Supports strict sql_mode, including ONLY_FULL_GROUP_BY, overcoming bug #69310.
- Supports sending of pending email HTML report
- Better re-deployment process
- Better recognizing of SIGNED/UNSIGNED values
- Some other improvements in charting, etc.
mycheckpoint is released under the BSD license.
Downloads are available from the project's page.