Web – code.openark.org http://shlomi-noach.github.io/blog/ Blog by Shlomi Noach Mon, 08 Nov 2010 10:45:45 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.3 32412571 mycheckpoint (rev 208): aggregation tables, enhanced charting, RPM distribution https://shlomi-noach.github.io/blog/mysql/mycheckpoint-rev-208-aggregation-tables-enhanced-charting-rpm-distribution https://shlomi-noach.github.io/blog/mysql/mycheckpoint-rev-208-aggregation-tables-enhanced-charting-rpm-distribution#comments Mon, 08 Nov 2010 10:45:45 +0000 https://shlomi-noach.github.io/blog/?p=3066 Revision 208 of mycheckpoint, a MySQL monitoring solution, has been released. New and updated in this revision:

  • Aggregation tables: aggregated data makes for fast reports on previously slow queries.
  • Enhanced charting: interactive charts now present time stamps dynamically (see demo); “Zoom in” charts are available (see demo) on mycheckpoint‘s HTTP server.
  • RPM distribution: a “noarch” RPM mycheckpoint build is now available.
  • Initial work on formalizing test environment

mycheckpoint celebrates one year of existence!

Aggregation tables

I really wanted to avoid using these: everything was so more beautiful with one single dataset and dozens of supporting views (OK, the views themselves are hardly “beautiful”).

However it was impossible (for my level of expertise) to optimize query performance what with all those views on per-hour and per-day aggregation. The GROUP BYs and the JOINs did not make it possible for condition pushdown (i.e. using MERGE algorithm) where desired.

As result, mycheckpoint now manages aggregation tables: per-hour and per-day. The impact on sample taking is neglect able (making for two additional fast queries), but the impact on reading aggregated data is overwhelming. Generating a HTML full report could take a few minutes to complete. It now returns in no time. This makes charting more attractive, and allows for enhanced charting, such as zooming in on charts, as described following.

Aggregation tables will automatically be created and retroactively populated upon using revision 208. There’s nothing special to do; be advised that for one single execution of mycheckpoint, many INSERT queries are going to be executed. Shouldn’t take more than a couple minutes on commodity hardware and a few months of history.

It is possible to disable aggregation tables, or make for a complete rebuild of tables; by default, though, aggregation is ON.

Enhanced charting

Two enhancements here:

  1. The interactive line charts already know how to update legend data as mouse hovers over them. Now they also present accurate date & time. This provides with fully informative charts.
  2. As with other monitoring tools, it is possible to “zoom in” on a chart: zooming in will present any chart in “last 24 hours”, “last 10 days” and “complete history” views, magnified on screen. See demo here.

RPM distribution

No excuse for this being so late, I know. But RPM distribution is now available. Yeepee!

This is a noarch distribution, courtesy of Python’s distutils; you should be able to install the package on any RPM supporting platform. I have only tested in on CentOS; feedback is welcome.

Future plans

Work is going on. These are the non-scheduled future tasks I see:

  • Monitoring InnoDB Plugin & XtraDB status.
  • A proper man page.
  • Anything else that interests me & the users.

Try it out

Try out mycheckpoint. It’s a different kind of monitoring solution. Simple monitoring (charting) is immediate. For more interesting results you will need basic SQL skills, and in return you’ll get a lot of power under your hands.

mycheckpoint is released under the New BSD License.

Umm, I’ll repeat this last one: mycheckpoint is released under the New BSD License. Still, and will continue to be. Thanks for the good advice by Lenz, Domas and others.

]]>
https://shlomi-noach.github.io/blog/mysql/mycheckpoint-rev-208-aggregation-tables-enhanced-charting-rpm-distribution/feed 5 3066
mycheckpoint (rev. 190): HTTP server; interactive charts https://shlomi-noach.github.io/blog/mysql/mycheckpoint-rev-190-http-server-interactive-charts https://shlomi-noach.github.io/blog/mysql/mycheckpoint-rev-190-http-server-interactive-charts#comments Tue, 07 Sep 2010 05:53:01 +0000 https://shlomi-noach.github.io/blog/?p=2866 Revision 190 of mycheckpoint, a MySQL monitoring solution, has been released. New and updated in this revision:

  • HTTP server: mycheckpoint can now act as a web server. Point your browser and start browsing through HTML reports. See mock up demo.
  • Interactive charts: HTML line charts are now interactive, presenting with accurate data as you move over them. See sample.
  • Enhanced auto-deploy: now auto-recognizing failed upgrades.
  • Reduced footprint: much code taken out of the views, leading to faster loading times.
  • Better configuration file use: now supporting all command line options in config file.
  • Remote host monitoring accessibility: now supporting complete configurable accessibility details.
  • Bug fixes: thanks to the bug reporters!

mycheckpoint is free, simple, easy to use (now easier with HTTP server) and useful. I encourage you to try it out: even compared with other existing and emerging monitoring tools, I believe you will find it a breeze; it’s low impact and lightness appealing; it’s alerts mechanism assuring; its geeky SQL-based nature with ability to drill down to fine details — geeky-kind-of-attractive.

</encouragement>

HTTP server

You can now run mycheckpoint in http mode:

bash$ mycheckpoint http

mycheckpoint will listen on port 12306, and will present you with easy browsing through the reports of your mycheckpoint databases.

The http server automatically detects those schemata used by mycheckpoint, and utilizes the existing HTML views, integrating them into the greater web framework.

While in http mode, mycheckpoint does nothing besides serving web pages. It does not actively exercise monitoring: you must still use the usual cron jobs or other scheduled tasks by which you invoke mycheckpoint for monitoring.

The http server is directed at a single MySQL server, as with the following example:

bash$ mycheckpoint --host=slave1.localdomain --port=3306 --http-port=12306 http

It is assumed that this server has the monitoring schemata.

See mock up demo. The demo uses presents with real output from a mycheckpoint HTTP server; I haven’t got the means to put up a live demo.

Interactive charts

The openark line charts, used in the HTML reports, are now interactive. As you scroll over, the legend presents you with series values.

No more “I have this huge spike once every 4 hours, which reduces all other values to something that looks like zero but is actually NOT”. Hover, and see the real values.

See sample.

Enhanced auto-deploy

The idea with mycheckpoint is that it should know how to self upgrade the schema on version upgrade (much like automatic WordPress upgrades). mycheckpoint does bookkeeping of installed versions within the database, and upgrades by simple comparison.

It now, following a couple of reported bugs, also recognizes failure of partial, failed upgrades. This adds to the automation of mycheckpoint‘s installation.

Reduced footprint

Some of mycheckpoint‘s views are complicated, and lead to a large amount of code in view declaration. This leads to increased table definition size (large .frm files). There has been some work to reduce this size where possible. Work is still ongoing, but some 30% has been taken off already. This leads to faster table (view) load time.

Better configuration file use

Any argument supported on the command line is now also supported in the config style. Much like is handled with MySQL. For example, one can issue:

mycheckpoint --monitored-host=sql02.mydb.com  --monitored-user=monitor --monitored-password=123456

But now also:

mycheckpoint

With the following in /etc/mycheckpoint.cnf:

[mycheckpoint]
monitored_host     = sql02.mydb.com
monitored_user     = monitor
monitored_password = 123456

Rules are:

  • If an option is specified on command line, it takes precedence over anything else.
  • Otherwise, if it’s specified in the configuration file, value is read from file.
  • Otherwise use default value is used.
  • On command line, option format is xxx-yyy-zzz: words split with dash/minus character.
  • On configuration file, option format is xxx_yyy_zzz: words split with underscore. Unlike MySQL configuration format, dashes cannot be used.
  • If an option is specified multiple times on configuration file — well — I have the answer, but I won’t tell. Just don’t do it. It’s bad for your health.

Future plans

Work is going on. These are the non-scheduled future tasks I see:

  • Monitoring InnoDB Plugin & XtraDB status.
  • A proper man page.
  • Anything else that interests me & the users.

Try it out

Try out mycheckpoint. It’s a different kind of monitoring solution. Simple monitoring (charting) is immediate. For more interesting results you will need basic SQL skills, and in return you’ll get a lot of power under your hands.

mycheckpoint is released under the New BSD License.

]]>
https://shlomi-noach.github.io/blog/mysql/mycheckpoint-rev-190-http-server-interactive-charts/feed 1 2866
Static charts vs. interactive charts https://shlomi-noach.github.io/blog/mysql/static-charts-vs-interactive-charts https://shlomi-noach.github.io/blog/mysql/static-charts-vs-interactive-charts#comments Tue, 02 Mar 2010 13:28:08 +0000 https://shlomi-noach.github.io/blog/?p=2027 I’m having my usual fun with charts. Working on mycheckpoint, I’ve generated monitoring charts using the Google Chars API. But I’ve also had chance to experiment and deploy interactive charts, JavaScript based. In particular, I used and tweaked dygraphs.

I’d like to note some differences in using charts of both kinds. And I think it makes a very big difference.

Static charts

I’ll call any image-based chart by “static chart”. It’s just a static image. Example of such charts are those generated by Google Image Charts (they now also have new, interactive charts), or RRDtool. Show below is an example of a static chart; in this example, generated by Google:

Pros and cons of static charts

Pros

  • Images can be viewed on any graphical platform. Browsers, email clients, cell phones, whatever.
  • Self contained: chart image, legend, scales: all in one image.
  • As such, easy to move around.
  • Are safe to use.

Cons

  • Images are fuzzy. Is the com_replace_psec really 0? Maybe it’s 0.1? A larger value can make lower values hard to tell.
  • Images are inaccurate: the colors can lie. The red and green lines showing are hard to tell apart. The red is painted above the green. Data gets “lost”.
  • They do not zoom (one needs to regenerate larger image)
  • Unless encoded with base64, HTML pages which include images need to link outside.
  • In the particular case of Google Charts, one is limited to 2K length URL. Trust me, it’s a big limitation! (PS, Google now support POST method to allow for up to 16K. But… it’s a POST method…)
  • In the particular case of Google Charts, one must have an internet connection.
  • In the particular case of Google Charts, one must submit data to Google.

Interactive charts

Interactive charts are those which react to your commands. These are either JavaScript or Flash based, mostly. They allow for really nice features. Take the following chart as an example: try and move over with your mouse; or select sections to zoom in.


DML

[graphDiv]
[labelsDiv]

The above chart is generated with dygraphs. Since it is embedded within my WordPress page, the layout is affected by that of my theme. Take a look at this example page to see similar charts outside this blog site (Internet Explorer users: Maxmimize/minimize button will not work well for now. And, may I suggest Mozilla Firefox?)

Pros and cons of interactive charts

Pros

  • Can present you with exact values. No more doubt about the com_replace_psec values.
  • Can allow for zoom in, zoom out.

Cons

  • Need supporting platform. The above cannot be viewed by non-JavaScript browsers (cell phones, etc.)
  • Browser support is also an issue with JavaScript.
  • Emailing such report will result in mail blocking in many companies: mail filters will not allow for JavaScript code to pass.
  • Charts are not necessarily self-contained, in terms of the chart entity With Flash charts (e.g. Fusion Charts) this works. But in the above, the legend and scales are outside the image. As such, they cannot be just moved around.
  • HTML pages which include such charts can be self contained. The HTML page can include all the JavaScript dependencies, in addition to the chart generating code. Flash based charts cannot be self contained.

Summary

Interactive charts are cool!

I’m now integrating dygraphs into mycheckpoint (How nice it is to work with BSD & MIT licenses!). Though I may later switch to flot, interactive charts will be the next standard charting way in mycheckpoint. I will continue supporting static Google Charts, as follows from the above pros and cons list.

]]>
https://shlomi-noach.github.io/blog/mysql/static-charts-vs-interactive-charts/feed 7 2027
The DB problem inherent to dynamic web pages https://shlomi-noach.github.io/blog/mysql/the-db-problem-inherent-to-dynamic-web-pages https://shlomi-noach.github.io/blog/mysql/the-db-problem-inherent-to-dynamic-web-pages#comments Mon, 20 Jul 2009 11:48:56 +0000 https://shlomi-noach.github.io/blog/?p=955 When building web sites, a popular demand is a maximum page load time.

For example, many would require < 0.5 seconds (or even less) for major pages loading time. Of course, there are numerous factors for page load time: network, caching, web servers, scripting language/code, database access and more.

Naturally I want to discuss the use of database access when creating web pages. I’ll be referring to dynamic web pages, such that are created by common languages as PHP, Java/J2EE, Ruby, ASP(.NET) etc.

A very common programming style is – what’s called in the Java jargon – using “scriptlets” as in the following JSP page:

<html>
<body>
    Time now is <%= new java.util.Date() %>
</body>
</html>

The above replaces the “<%= new java.util.Date() %>” part with a text representation of the current time.

If I were to produce a dynamic content site, say, a WordPress blog, like the one you’re reading, I would need to generate several dynamic contents: the latest posts, the popular tags, the comments for this post, etc. These are generated by calling upon the database and running some queries. I suppose there’s nothing new in what I’ve explained so far.

The problem

When generating a “heavyweight” page, like some online newspaper or bookstore, there may be many queries involved. Are you logged in? Do we have recommendations for you? What are the latest topics? What have you been interested in before? Do you have friends online? What content have you produced on the website?

I’ve recently reviewed a site which generated > 500 queries per single page. I personally thought that was a very high number, but that was a necessity. The problem was: the page took 2 seconds to load.

Some tuning, rewriting and indexing later, time dropped to 0.6 seconds to load; but that was not fast enough. It was then that we got to a major conclusion:

All database calls are serialized. They need to be parallelized.

Remember that MySQL can only utilize a single thread for the computation of a single query (though more threads can handle IO in the meantime). This leads to only one CPU being used on your standard Linux distribution, for a given web page.

Really, that sounds just too obvious! But not so easy to achieve when doing “scriptlets”. The templating engine parses the scriptlets one by one, executing them in order. In fact, you assume it does so, so that you can rely on the outcome of the previous scriptlet in the next one. In Java, for example, it goes beyond that: a JSP page is rewritten as a normal Java Servlet class, where the “scriptlets” become the main code, and the HTML becomes just printing to standard output. So you get linear executing code.

Even with more sophisticated frameworks, the “normal” way of doing things is linear. For example, using the Spring framework, you have Java objects — controllers — which are responsible for web pages. You can avoid doing scpriting within your dynamic web pages, and only ask for data provided by those controllers. So, for example, using Spring + Velocity, a web page could look like this:

<html>
<body>
    Login time as recorded in DB is: ${user.loginTime}
</body>
</html>

This (usually) translates to calling the getLoginTime() method on a pre-built user object. But just how does this method work?

  • Does it do lazy initialization, so that it calls upon the DB to get the answer?
  • Did the controller set up the value during some init() method?
  • Did the controller set up the value in response to the web page’s request parameter, parsing them one by one?

All the above options lead to linear, or serial execution.

How to parallelize?

Parallelization with web pages is not so simple, and requires understanding of multi threading programming. The programmer needs to be aware of race conditions, deadlocks, starvation issues, etc. (though, to be honest, in dynamic web pages context these do not usually become a real issue). Some programming languages provide good support for multi threaded programming. Java is one such language.

Let’s assume, then, that we need to spawn some 10 queries in response to a page request. With Jjava, we can write something like:

CountDownLatch doneSignal = new CountDownLatch(10);

Runnable task1 = new Runnable() {
    public void run()
    {
        user.setLoginTime(this.jdbcTemplate.queryForInt("SELECT ... FROM ..."));
        doneSignal.countDown();
    }
} ;

Runnable task2 = new Runnable() {
    public void run()
    {
        headlines = getSimpleJdbcTemplate().query("SELECT * FROM headline WHERE...",
            new ParameterizedRowMapper<Headline>() {
                public Headline mapRow(ResultSet rs, int rowNum)
                {
                    Headline headline = new Headline();
                    headline.setTitle(rs.getString("title");
                    headline.setUrl(rs.getString("url");
                    ...
                }
            }
        doneSignal.countDown();
    }
} ;

...

Runnable task10 = new Runnable() {
    ...
    doneSignal.countDown();
}
Executor executor = Executors.newFixedThreadPool(numberOfAvailableProcessors);
executor.execute(task1);
...
executor.execute(task10);

doneSignal.await();

// Now fill in the Model

The above code is simplified and presented in a way which is more readable. What it says is:

  • Let’s create the 10 tasks, but not execute them: just lay out the commands.
  • Each task, upon completion, lets the CountDownLatch know it has completed (but remeber we have not executed it yet).
  • We create or use a thread pool, using some n threads; n may relate to the number of processors we have.
  • We ask the pool to execute all threads. At the discretion of the pool, it will either run them all concurrently, or some sequentially – depending on how many threads are available.
  • We ask the CountDownLatch — a one-time barrier — to block, until all 10 tasks have notified they’re done.
  • We can now go on and do our stuff.

Spring has a built in TaskExecutor mechanism to provide solution similar to the thread pool above.

I’m mostly a C/C++/Java programmer; I have no knowledge on how this can be achieved in PHP, Ruby, ASP.NET or other languages. The above code is certainly not the most straightforward to use. I would like to see frameworks provide wrappers for this kind of solution, so as to support the common web developer with parallelization.

]]>
https://shlomi-noach.github.io/blog/mysql/the-db-problem-inherent-to-dynamic-web-pages/feed 6 955