Rotating SQL graphs horizontally

We all love graphs. We all love SQL hacks. We all know the SQL hack which displays a character-based graph (example follows for those unfamiliar).

But we all love horizontal graphs, not vertical ones. We are used to the X axis being horizontal, Y being vertical. Not vice versa.

In this post I’ll present a SQL hack which rotates a vertical graph to horizontal. In fact, the technique shown will rotate any ‘textual image’; but graphs are a nice example.

A vertical graph example

What’s prettier than a sinus curve? I have prepared a simple table that will serve nicely, and can be found in graph_tables.sql.

Continue reading » “Rotating SQL graphs horizontally”

The DB problem inherent to dynamic web pages

When building web sites, a popular demand is a maximum page load time.

For example, many would require < 0.5 seconds (or even less) for major pages loading time. Of course, there are numerous factors for page load time: network, caching, web servers, scripting language/code, database access and more.

Naturally I want to discuss the use of database access when creating web pages. I’ll be referring to dynamic web pages, such that are created by common languages as PHP, Java/J2EE, Ruby, ASP(.NET) etc.

A very common programming style is – what’s called in the Java jargon – using “scriptlets” as in the following JSP page:

<html>
<body>
    Time now is <%= new java.util.Date() %>
</body>
</html>

The above replaces the “<%= new java.util.Date() %>” part with a text representation of the current time.

If I were to produce a dynamic content site, say, a WordPress blog, like the one you’re reading, I would need to generate several dynamic contents: the latest posts, the popular tags, the comments for this post, etc. These are generated by calling upon the database and running some queries. I suppose there’s nothing new in what I’ve explained so far.

The problem

When generating a “heavyweight” page, like some online newspaper or bookstore, there may be many queries involved. Are you logged in? Do we have recommendations for you? What are the latest topics? What have you been interested in before? Do you have friends online? What content have you produced on the website?

Continue reading » “The DB problem inherent to dynamic web pages”

Character sets: latin1 vs. ascii

Unless specified otherwise, latin1 is the default character set in MySQL.

What I usually find in schemes are columns which are either utf8 or latin1. The utf8 columns being those which need to contain multilingual characters (user names, addresses, articles etc.), and latin1 column being all the rest (passwords, digests, email addresses, hard-coded values etc.)

I find latin1 to be improper for such purposes and suggest that ascii be used instead. The reason being that latin1 implies a European text (with swedish collation). It is unclear for an outsider, when finding a latin1 column, whether it should actually contain West European characters, or is it just being used for ascii text, utilizing the fact that a character in latin1 only requires 1 byte of storage. Continue reading » “Character sets: latin1 vs. ascii”

New and Noteworthy in openark kit

A new release of openark kit is out. Some interesting additions.changes are:

openark kit is a set of utilities for MySQL, helping in easing out everyday’s work. Let’s look more closely at the changes.

oak-online-alter-table

The utility allows for non-blocking ALTER TABLE operations, under certain limitations. One limitation which has been removed in the current release was the single-column UNIQUE KEY limitation. As of now, a requirement for running oak-online-alter-table is that the altered table has some UNIQUE KEY. It could be numerical, textual, single column, multi-column (compound), anything. Continue reading » “New and Noteworthy in openark kit”

Unwalking a string with GROUP_CONCAT

“Walking a string” is an SQL technique to convert a single value into multiple rows result set. For example, walking the string ‘hello’ results with 5 rows, each of which contains a single character from the text.

I’ll present a brief example of walking a string, and then show how to “unwalk” the string: do the reverse operation.

To walk a string, an integers table is required (or this could be a good use for SeqEngine): Continue reading » “Unwalking a string with GROUP_CONCAT”

Even more on MySQL password security

This post follows Ronald Bradford’s More Basic MySQL Security, and Lenz Grimmer’s Basic MySQL Security: Providing passwords on the command line and More on MySQL password security.

In Ronald’s post I’ve argued that passwords provided on command line are visible in plaintext on “ps aux”. Lenz has argued that this is incorrect, providing the source code to support that. Giuseppe commenting that this has been fixed since 2002. Later on, Lenz shows that passwords are visible in plaintext on OpenSolaris, Solaris and variants of BSD and SysV.

Mental note: old habits die hard; I must remember to revisit issues from time to time.

Centralizing

Back to the question: why use a file to store your password, and not provide it on command line?

Continue reading » “Even more on MySQL password security”

SQL: finding a user’s country/region based on IP

I’ve encountered the same problem twice for different customers, so I guess it’s worth a discussion.

A common task for web applications is to find out the country/region of a user, based on her IP address, as can be detected in the HTTP request. Depending on the country of origin, the website can translate dates for different time zones, can change locale settings, and, perhaps most commonly, show advertisements in her native language.

To start with, there’s a table which lists the IP ranges per country/region. Let’s assume we’re only dealing with IPv4:

CREATE TABLE regions_ip_range (
  regions_ip_range_id INT UNSIGNED AUTO_INCREMENT,
  country VARCHAR(64) CHARSET utf8,
  region VARCHAR(64) CHARSET utf8,
  start_ip INT UNSIGNED,
  end_ip INT UNSIGNED,
  …
  PRIMARY KEY(regions_ip_range_id),
  ...
);

The table is fixed, and is populated. Now the question arises: how do we query this table, and which indexes should be created?

The wrong way

The form I’ve encountered is as follows: an index is declared on regions_ip_range:

KEY ip_range_idx (start_ip, end_ip)

And the query goes like this:

SELECT * FROM regions_ip_range
WHERE my_ip BETWEEN start_ip AND end_ip

Continue reading » “SQL: finding a user’s country/region based on IP”

Reasons to use innodb_file_per_table

When working with InnoDB, you have two ways for managing the tablespace storage:

  1. Throw everything in one big file (optionally split).
  2. Have one file per table.

I will discuss the advantages and disadvantages of the two options, and will strive to convince that innodb_file_per_table is preferable.

A single tablespace

Having everything in one big file means all tables and indexes, from all schemes, are ‘mixed’ together in that file.

This allows for the following nice property: free space can be shared between different tables and different schemes. Thus, if I purge many rows from my log table, the now unused space can be occupied by new rows of any other table.

This same nice property also translates to a not so nice one: data can be greatly fragmented across the tablespace.

An annoying property of InnoDB’s tablespaces is that they never shrink. So after purging those rows from the log table, the tablespace file (usually ibdata1) still keeps the same storage. It does not release storage to the file system.

I’ve seen more than once how certain tables are left unwatched, growing until disk space reaches 90% and SMS notifications start beeping all around. Continue reading » “Reasons to use innodb_file_per_table”

The importance of report_host & report_port

The two server variables, report_host and report_port, are not commonly used. One of the reasons, I suspect, is that they do not appear in any of the standard .cnf files provided with a MySQL installation.

For a replication environment, I find these two variables to be very useful.

Where are these variables used?

Here’s a slightly modified exerpt from a MySQL master node I have, and which has two slaves: Continue reading » “The importance of report_host & report_port”