Java – code.openark.org http://shlomi-noach.github.io/blog/ Blog by Shlomi Noach Thu, 11 Mar 2010 12:13:51 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.3 32412571 Proper SQL table alias use conventions https://shlomi-noach.github.io/blog/mysql/proper-sql-table-alias-use-conventions https://shlomi-noach.github.io/blog/mysql/proper-sql-table-alias-use-conventions#comments Thu, 11 Mar 2010 07:10:09 +0000 https://shlomi-noach.github.io/blog/?p=2156 After seeing quite some SQL statements over the years, something is bugging me: there is no consistent convention as for how to write an SQL query.

I’m going to leave formatting, upper/lower-case issues aside, and discuss a small part of the SQL syntax: table aliases. Looking at three different queries, I will describe what I find to be problematic table alias use.

Using the sakila database, take a look at the following queries:

Query #1

SELECT
 R.rental_date, C.customer_id, C.first_name, C.last_name
FROM
 rental R
 JOIN customer C USING (customer_id)
WHERE
 R.rental_date >= DATE('2005-10-01')
 AND C.store_id=1;

The above looks for film rentals done in a specific store (store #1), as of Oct. 1st, 2005.

Query #2

SELECT
 F.title, C.name
FROM
 film AS F
 JOIN film_category AS S ON (F.film_id = S.film_id)
 JOIN category AS C ON (S.category_id = C.category_id)
WHERE F.length > 180;

The above lists the title and category for all films longer than three hours.

Query #3

SELECT c.customer_id, c.last_name
FROM
  customer c
  INNER JOIN address a ON (c.address_id = a.address_id)
  INNER JOIN (
    SELECT
      c.city_id
    FROM
      city AS c
      JOIN country s ON (c.country_id = s.country_id)
    WHERE
      s.country LIKE 'F%'
  ) s1 USING (city_id)
WHERE
  create_date >= DATE('2005-10-01');

The above lists customers created as of Oct. 1st, 2005, and who live in countries starting with an ‘F’. The query could be solved without a subquery, but there’s a good reason why I made it so.

The problems

I used very different conventions on any one of the queries, and sometimes within each query. And it’s common that I see the same on a customer’s site, what with having many programmers do the SQL coding. Again, I will only discuss the table aliases conventions. I’ll leaver the rest to the reader.

Here’s where I see problems:

  • Query #1: In itself, it looks fine. Rental turns to R, Customer turns to C. I will comment on this slightly later on when I provide my full opinion.
  • Query #2: So film turns to F, category turns to C. What should film_category turn into? Out of letters? Let’s just go for S, shall we? But S has nothing do with film_category. Yet it’s so commonly seen.
  • Query #2: We’re using the AS keyword now. We didn’t use it before.
  • Queries #1, #2: Hold on. Wasn’t C taken for customer in Query #1? Now, in Query #2 it stands for category? I’m beginning to get confused.
  • Query #3: Now aliases are lower case; I was just getting used to them being upper case.
  • Query #3: But, hey, c is back to customer!
  • Query #3: Or, is it? Take a look at the subquery. Theres another c in there! This time it’s city! And it’s perfectly valid syntax. We actually have two identical aliases in the same query.
  • Query #3: If I could, I would name country with c as well. But I can’t. So why not throw in s again?
  • Query #3: and now I don’t even bother using the alias when accessing the create_date. Well, there’s no such column in any of the other tables!

Proper conventions

What I find so disturbing is that whenever I read a complex query, I need to go back and forth, back and forth between table aliases (found everywhere in the query) and their declaration point. Such irregularities make the queries difficult to read.

Any of the above issues could be justified. But I wish to make some suggestions:

  • Decide whether you’re going for upper or lower case.
  • Do not use the same alias twice in your query, even if it’s valid.
  • Aliases do not have to be single character. film_category may just as well be FC.
  • Do not alias something that is hard to interpret. s does not stand for country.
  • Think ahead: use same aliases throughout all your queries, as far as you can. If uniqueness is a problem, make for longer aliases. Use cust instead of c.

The above should make for more organized and readable SQL code. Remember: what one programmer finds as a very intuitive alias, is unintuitive to another!

My own convention

Simple: I only use aliases when using self joins. I am aware that queries are much longer what with long table names. I go farther than that: I prefer fully qualifying questionable columns throughout the query. Yes, it makes the query even longer.

I know this does not appeal to many. But there’s no confusion. And it’s easily searchable. And it’s consistent. And if properly formatted, as in the above queries, is well readable.

Now please join me in asking Oracle if they can add multi-line Strings for java, as there are for python.

]]>
https://shlomi-noach.github.io/blog/mysql/proper-sql-table-alias-use-conventions/feed 9 2156
The DB problem inherent to dynamic web pages https://shlomi-noach.github.io/blog/mysql/the-db-problem-inherent-to-dynamic-web-pages https://shlomi-noach.github.io/blog/mysql/the-db-problem-inherent-to-dynamic-web-pages#comments Mon, 20 Jul 2009 11:48:56 +0000 https://shlomi-noach.github.io/blog/?p=955 When building web sites, a popular demand is a maximum page load time.

For example, many would require < 0.5 seconds (or even less) for major pages loading time. Of course, there are numerous factors for page load time: network, caching, web servers, scripting language/code, database access and more.

Naturally I want to discuss the use of database access when creating web pages. I’ll be referring to dynamic web pages, such that are created by common languages as PHP, Java/J2EE, Ruby, ASP(.NET) etc.

A very common programming style is – what’s called in the Java jargon – using “scriptlets” as in the following JSP page:

<html>
<body>
    Time now is <%= new java.util.Date() %>
</body>
</html>

The above replaces the “<%= new java.util.Date() %>” part with a text representation of the current time.

If I were to produce a dynamic content site, say, a WordPress blog, like the one you’re reading, I would need to generate several dynamic contents: the latest posts, the popular tags, the comments for this post, etc. These are generated by calling upon the database and running some queries. I suppose there’s nothing new in what I’ve explained so far.

The problem

When generating a “heavyweight” page, like some online newspaper or bookstore, there may be many queries involved. Are you logged in? Do we have recommendations for you? What are the latest topics? What have you been interested in before? Do you have friends online? What content have you produced on the website?

I’ve recently reviewed a site which generated > 500 queries per single page. I personally thought that was a very high number, but that was a necessity. The problem was: the page took 2 seconds to load.

Some tuning, rewriting and indexing later, time dropped to 0.6 seconds to load; but that was not fast enough. It was then that we got to a major conclusion:

All database calls are serialized. They need to be parallelized.

Remember that MySQL can only utilize a single thread for the computation of a single query (though more threads can handle IO in the meantime). This leads to only one CPU being used on your standard Linux distribution, for a given web page.

Really, that sounds just too obvious! But not so easy to achieve when doing “scriptlets”. The templating engine parses the scriptlets one by one, executing them in order. In fact, you assume it does so, so that you can rely on the outcome of the previous scriptlet in the next one. In Java, for example, it goes beyond that: a JSP page is rewritten as a normal Java Servlet class, where the “scriptlets” become the main code, and the HTML becomes just printing to standard output. So you get linear executing code.

Even with more sophisticated frameworks, the “normal” way of doing things is linear. For example, using the Spring framework, you have Java objects — controllers — which are responsible for web pages. You can avoid doing scpriting within your dynamic web pages, and only ask for data provided by those controllers. So, for example, using Spring + Velocity, a web page could look like this:

<html>
<body>
    Login time as recorded in DB is: ${user.loginTime}
</body>
</html>

This (usually) translates to calling the getLoginTime() method on a pre-built user object. But just how does this method work?

  • Does it do lazy initialization, so that it calls upon the DB to get the answer?
  • Did the controller set up the value during some init() method?
  • Did the controller set up the value in response to the web page’s request parameter, parsing them one by one?

All the above options lead to linear, or serial execution.

How to parallelize?

Parallelization with web pages is not so simple, and requires understanding of multi threading programming. The programmer needs to be aware of race conditions, deadlocks, starvation issues, etc. (though, to be honest, in dynamic web pages context these do not usually become a real issue). Some programming languages provide good support for multi threaded programming. Java is one such language.

Let’s assume, then, that we need to spawn some 10 queries in response to a page request. With Jjava, we can write something like:

CountDownLatch doneSignal = new CountDownLatch(10);

Runnable task1 = new Runnable() {
    public void run()
    {
        user.setLoginTime(this.jdbcTemplate.queryForInt("SELECT ... FROM ..."));
        doneSignal.countDown();
    }
} ;

Runnable task2 = new Runnable() {
    public void run()
    {
        headlines = getSimpleJdbcTemplate().query("SELECT * FROM headline WHERE...",
            new ParameterizedRowMapper<Headline>() {
                public Headline mapRow(ResultSet rs, int rowNum)
                {
                    Headline headline = new Headline();
                    headline.setTitle(rs.getString("title");
                    headline.setUrl(rs.getString("url");
                    ...
                }
            }
        doneSignal.countDown();
    }
} ;

...

Runnable task10 = new Runnable() {
    ...
    doneSignal.countDown();
}
Executor executor = Executors.newFixedThreadPool(numberOfAvailableProcessors);
executor.execute(task1);
...
executor.execute(task10);

doneSignal.await();

// Now fill in the Model

The above code is simplified and presented in a way which is more readable. What it says is:

  • Let’s create the 10 tasks, but not execute them: just lay out the commands.
  • Each task, upon completion, lets the CountDownLatch know it has completed (but remeber we have not executed it yet).
  • We create or use a thread pool, using some n threads; n may relate to the number of processors we have.
  • We ask the pool to execute all threads. At the discretion of the pool, it will either run them all concurrently, or some sequentially – depending on how many threads are available.
  • We ask the CountDownLatch — a one-time barrier — to block, until all 10 tasks have notified they’re done.
  • We can now go on and do our stuff.

Spring has a built in TaskExecutor mechanism to provide solution similar to the thread pool above.

I’m mostly a C/C++/Java programmer; I have no knowledge on how this can be achieved in PHP, Ruby, ASP.NET or other languages. The above code is certainly not the most straightforward to use. I would like to see frameworks provide wrappers for this kind of solution, so as to support the common web developer with parallelization.

]]>
https://shlomi-noach.github.io/blog/mysql/the-db-problem-inherent-to-dynamic-web-pages/feed 6 955