If you work with command line and know your SQL, q is a great tool to use:
q allows you to query your text files or standard input with SQL. You can:
SELECT c1, COUNT(*) FROM /home/shlomi/tmp/my_file.csv GROUP BY c1
And you can:
SELECT all.c2 FROM /tmp/all_engines.txt AS all LEFT JOIN /tmp/innodb_engines.txt AS inno USING (c1, c2) WHERE inno.c3 IS NULL
And you can also combine with your favourite shell commands and tools:
grep "my_term" /tmp/my_file.txt | q "SELECT c4 FROM - JOIN /home/shlomi/static.txt USING (c1)" | xargs touch
Some of q‘s functionality (and indeed, SQL functionality) can be found in command line tools. You can use grep for pseudo WHERE filtering, or cut for projecting, but you can only get so far with cat my_file.csv | sort | uniq -c | sort -n. SQL is way more powerful for working with tabulated data, and so q makes for a great addition into one’s toolbox.
The tool is authored by my colleague Harel Ben-Attia, and is in daily use over at our company (it is in fact installed on all production servers).
It is of course free and open source (get it on GitHub, where you can also find documentation), and very easy to setup. Enjoy!
This is indeed a nice idea, but I’d appreciate if it will have some additional file formats, such as CSV, XML and JSON ready for importing and exporting directly from the tool. Also, since it does create temporary sqlite table, I afraid it’d be inefficient for processing large files as well as repeated tasks.
@Tomer,
I’ll pass along 🙂
While I don’t speak for Harel, my personal opinion is that for importing XML and JSON you really need an altogether different tool, based on XPath or similar expression language; not SQL.
SQL over unstructured data would be limited, to say the least.
Moreover, I don’t see that SQL, having the diversities it offers, can ever be a stream editor. a GROUP BY over random data is example enough to present the need of “get all the data upfront”.
For the really large data, well, that’s what RDBMS is for. I agree q should be used with the smaller datasets.
Again, all the above my own personal understanding.
Cheers
It can be a nice addition to my script/tool collection,
for most part of my needs i use bash scripting.
And if the data is messy, i use OpenRefine( http://openrefine.org/) formally know as GoogleRefine.
Oh, happy day. A dream come true. Thanks for sharing!