Tool of the day: q

If you work with command line and know your SQL, q is a great tool to use:

q allows you to query your text files or standard input with SQL. You can:

SELECT c1, COUNT(*) FROM /home/shlomi/tmp/my_file.csv GROUP BY c1

And you can:

SELECT all.c2 FROM /tmp/all_engines.txt AS all LEFT JOIN /tmp/innodb_engines.txt AS inno USING (c1, c2) WHERE inno.c3 IS NULL

And you can also combine with your favourite shell commands and tools:

grep "my_term" /tmp/my_file.txt | q "SELECT c4 FROM - JOIN /home/shlomi/static.txt USING (c1)" | xargs touch

Some of q‘s functionality (and indeed, SQL functionality) can be found in command line tools. You can use grep for pseudo WHERE filtering, or cut for projecting, but you can only get so far with cat my_file.csv | sort | uniq -c | sort -n. SQL is way more powerful for working with tabulated data, and so q makes for a great addition into one’s toolbox.

The tool is authored by my colleague Harel Ben-Attia, and is in daily use over at our company (it is in fact installed on all production servers).

It is of course free and open source (get it on GitHub, where you can also find documentation), and very easy to setup. Enjoy!

4 thoughts on “Tool of the day: q

  1. This is indeed a nice idea, but I’d appreciate if it will have some additional file formats, such as CSV, XML and JSON ready for importing and exporting directly from the tool. Also, since it does create temporary sqlite table, I afraid it’d be inefficient for processing large files as well as repeated tasks.

  2. @Tomer,

    I’ll pass along 🙂
    While I don’t speak for Harel, my personal opinion is that for importing XML and JSON you really need an altogether different tool, based on XPath or similar expression language; not SQL.

    SQL over unstructured data would be limited, to say the least.

    Moreover, I don’t see that SQL, having the diversities it offers, can ever be a stream editor. a GROUP BY over random data is example enough to present the need of “get all the data upfront”.

    For the really large data, well, that’s what RDBMS is for. I agree q should be used with the smaller datasets.

    Again, all the above my own personal understanding.

    Cheers

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.