Documentation is an important part of any project. On the projects I maintain I put a lot of effort on documentation, and, frankly, the majority of time spent on my projects is on documentation.
The matter of keeping the documentation faithful is a topic of interest. I’d like to outline a few documentation bundling possibilities, and the present the coming new documentation method for common_schema. I’ll talk about any bundling that is NOT man pages.
High level: web docs
This is the initial method of documentation I used for openark kit and mycheckpoint. It’s still valid for mycheckpoint. Documentation is web-based. You need Internet access to read it. It’s in HTML format.
Well, not exactly HTML format: I wrote it in WordPress. Yes, it’s HTML, but there’s a lot of noise around (theme, menus, etc.) which is not strictly part of the documentation.
While this is perhaps the easiest way to go, here’s a few drawbacks:
- You’re bound to some framework (WordPress in this case)
- Docs are split between MySQL database (my underlying WordPRess storage) & WordPress files (themes, style, header, footer etc.)
- Documentation is separate from your code – they’re just not in the same place
- There is no version control over the documentation.
The result is a single source of documentation, which applies to whatever version is latest. It’s impossible to maintain docs for multiple versions. You must manually synchronize your WordPress updates with code commits (or rather – code release!).
Mid level: version controlled HTML docs
I first saw this approach on Baron’s Aspersa gets a user manual post. I loved it: the documentation is HTML, but stored as part of your project’s code, in same version control.
This means one can browse the documentation (openark kit in this example) exactly as it appears in the baseline. Depending on your project hosting, one may be able to do so per version.
The approach has the great benefit of having the docs tightly coupled with the code in terms of development. Before committing code, one updates documentation for that code, then commits/releases both together.
You’re also not bound to any development framework. You may edit with vim, emacs, gedit, bluefish, eclipse, … any tool of your choice. It’s all down to plain old text files.
Mid level #2: documentation bundling
One thing I started doing with common_schema is to release a doc bundle with the code. So one can download a compressed bundle of all HTML files. That way one is absolutely certain what’s the right documentation for revision 178. There’s no effort about it: the docs are already tightly coupled with code versions. Just compress and distribute.
Low level: documentation coupled with your code
Perl scripts can be written as Perl modules, in which case they are eligible for using the perldoc convention. You code your documentation within your script itself, as comment. Perldoc can extract the documentation and present in man-like format. Same happens with Python’s pydoc. Baron’s When documentation is code illustrates that approach. Maatkit (now Percona Toolkit) has been using it for years.
This method has the advantage of having the documentation ready right within your shell. You don’t need a browser, nor firewall access. The docs are just there for you in the same environment where you’re executing the code.
SQL Low level: CALL for help()
common_schema is a different type of project. It is merely a schema. There’s no Perl nor Python. One imports the schema into one’s MySQL server.
What’s the low-level approach for this type of code?
For common_schema I use three levels of documentation: the mid-level, where one can browse through the versioned docs, the 2nd mid-level, where one can download bundled documentation, and then a low-level approach: documentation embedded within the code.
MySQL’s documentation is also built into the server: see the help_* tables within the mysql schema. The mysql command line client allows one to access help by supporting the help command, e.g.
mysql> help create table;
The client intercepts this command (this is not server side command) and searches through the mysql.help_* docs.
With common_schema, I don’t have control over the client; it’s all on server side. But the code being a schema, what with stored routines and tables, it’s easy enough to set up documentation.
As of the next version of common_schema, and following MySQL’s method, common_schema provides a help table:
DESC help; +--------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------------+-------------+------+-----+---------+-------+ | topic | varchar(32) | NO | PRI | NULL | | | help_message | text | NO | | NULL | | +--------------+-------------+------+-----+---------+-------+
And a help() procedure, so that you can call for help(). The procedure will look for the best matching document based on your search expression:
root@mysql-5.1.51> CALL help('match'); +-------------------------------------------------------------------------------+ | help | +-------------------------------------------------------------------------------+ | | | NAME | | | | match_grantee(): Match an existing account based on user+host. | | | | TYPE | | | | Function | | | | DESCRIPTION | | | | MySQL does not provide with identification of logged in accounts. It only | | provides with user + host:port combination within processlist. Alas, these do | | not directly map to accounts, as MySQL lists the host:port from which the | | connection is made, but not the (possibly wildcard) user or host. | | This function matches a user+host combination against the known accounts, | | using the same matching method as the MySQL server, to detect the account | | which MySQL identifies as the one matching. It is similar in essence to | | CURRENT_USER(), only it works for all sessions, not just for the current | | session. | | | | SYNOPSIS | | | | | | | | match_grantee(connection_user char(16) CHARSET utf8, | | connection_host char(70) CHARSET utf8) | | RETURNS VARCHAR(100) CHARSET utf8 | | | | | | Input: | | | | * connection_user: user login (e.g. as specified by PROCESSLIST) | | * connection_host: login host. May optionally specify port number (e.g. | | webhost:12345), which is discarded by the function. This is to support | | immediate input from as specified by PROCESSLIST. | | | | | | EXAMPLES | | | | Find an account matching the given use+host combination: | | | | | | mysql> SELECT match_grantee('apps', '192.128.0.1:12345') AS | | grantee; | | +------------+ | | | grantee | | | +------------+ | | | 'apps'@'%' | | | +------------+ | | | | | | | | ENVIRONMENT | | | | MySQL 5.1 or newer | | | | SEE ALSO | | | | processlist_grantees | | | | AUTHOR | | | | Shlomi Noach | | | +-------------------------------------------------------------------------------+
I like HTML for documentation. I think it’s a good format, provided you don’t start doing funny things. Perhaps TROFF is more suitable; certainly more popular on Unix machines. But I already have everything in HTML. So, what do I do?
My decision was to keep documentation in HTML, and use the handy html2text tool to do the job. And it does it pretty well! The sample you see above is an automated translation of HTML to plain text.
I add a few touches of my own: SELECTing long texts is ugly, whether you do it via “;” or “\G“. The help() routine breaks the text by ‘\n‘, returning a multi row result set. The above sample makes for some 60+ rows, nicely formatted, broken from the original single text appearing in the help table.
So now you have an internal help method for common_schema, right where the code is. You don’t have to leave the command line client in order to get help.
Giuseppe offered me the idea for this, even while my own thinking about it was in early stages.
The next version of common_schema will be available in a few weeks. The code is pretty much ready. I just need to work on, ahem…, the documentation.
Awesome! Also note that if the docs are part of the code, they have the same license as the code. 😀
Sheeri, interesting observation about license. Didn’t think about documentation licensing in the first place 🙂
Though I don’t see a problem with BSD licensed documentation. Any insights are appreciated.