Sphinx & MySQL: facts and misconceptions

Sphinx search is a full text search engine, commonly used with MySQL.

There are some misconceptions about Sphinx and its usage. Following is a list of some of Sphinx’ properties, hoping to answer some common questions.

  • Sphinx is not part of MySQL/Oracle.
  • It is a standalone server; an external application to MySQL.
  • Actually, it is not MySQL specific. It can work with other RDBMS: PostgreSQL, MS SQL Server.
  • And, although described as “free open-source SQL full-text search engine”, it is not SQL-specific: Sphinx can read documents from XML.
  • It is often described as “full text search for InnoDB”. This description is misleading. Sphinx indexes text; be it from any storage engine or external source. It solves, in a way, the issue of “FULLTEXT is only supported by MyISAM”. Essentially, it provided full-text indexing for InnoDB tables, but in a very different way than the way MyISAM’s FULLTEXT index works.

Sphinx works by reading documents, usually from databases. Considering the case of MySQL, Sphinx issues a SQL query which retrieves relevant data (mostly the text you want to index, but other properties allowed). Continue reading » “Sphinx & MySQL: facts and misconceptions”

mylvmbackup HOWTO: minimal privileges & filesystem copy

This HOWTO discusses two (unrelated) issues with mylvmbackup:

  • The minimal privileges required to take MySQL backups with mylvmbackup.
  • Making (non compressed) file system copy of one’s data files.

Minimal privileges

Some just give mylvmbackup the root account, which is far too permissive. We now consider what the minimal requirements of mylvmbackup are.

The queries mylvmbackup issues are:

  • FLUSH TABLES
  • FLUSH TABLES WITH READ LOCK
  • SHOW MASTER STATUS
  • SHOW SLAVE STATUS
  • UNLOCK TABLES

Both SHOW MASTER STATUS & SHOW SLAVE STATUS require either the SUPER or REPLICATION CLIENT privilege. Since SUPER is more powerful, we choose REPLICATION CLIENT.

The FLUSH TABLES * and UNLOCK TABLES require the RELOAD privilege.

However, we are not done yet. mylvmbackup connects to the mysql database, which means we must also have some privilege there, too. We choose the SELECT privilege.

Continue reading » “mylvmbackup HOWTO: minimal privileges & filesystem copy”

MMM for MySQL single reader role

The standard documentation and tutorials on MMM for MySQL, for master-master replication setup, suggest one Virtual IP for the writer role, and two Virtual IPs for the reader role. It can be desired to only have a single virtual IP for the reader role, as explained below.

The two IPs for the reader role

A simplified excerpt from the mmm_common.conf sample configuration file, as can be found on the project’s site and which is most quoted: Continue reading » “MMM for MySQL single reader role”

Table refactoring & application version upgrades, Part II

Continuing Table refactoring & application version upgrades, Part I, we now discuss code & database upgrades which require DROP operations. As before, we break apart the upgrade process into sequential steps, each involving either the application or the database, but not both.

As I’ll show, DROP operations are significantly simpler than creation operations. Interestingly, it’s the same as in life.

DROP COLUMN

A column turns to be redundant, unused. Before it is dropped from the database, we must ensure no one is using it anymore. The steps are:

  1. App: V1 -> V2. Remove all references to column; make sure no queries use said column.
  2. DB: V1 -> V2 (possibly failover from M1 to M2), change is DROP COLUMN.

DROP INDEX

A possibly simpler case here. Why would you drop an index? Is it because you found out you never use it anymore? Then all you have to do is just drop it.

Or perhaps you don’t need the functionality the index supports anymore? Then first drop the functionality:

  1. (optional) App: V1 -> V2. Discard using functionality which relies on index.
  2. DB: V1 -> V2 (possibly failover from M1 to M2), change is DROP INDEX. Check out InnoDB Plugin here. Continue reading » “Table refactoring & application version upgrades, Part II”

Another Python MySQL template

Following up on Matt Reid’s simple python, mysql connection and iteration, I would like to share one of my own, which is the base for mycheckpoint & openark kit scripts.

It is oriented to provide with clean access to the data: the user is not expected to handle cursors and connections. Result sets are returned as python lists and dictionaries. It is also config file aware and comes with built in command line options.

I hope it comes to use: my.py

Table refactoring & application version upgrades, Part I

A developer’s major concern is: How do I do application & database upgrades with minimal downtime? How do I synchronize between a DB’s version upgrade and an application’s version upgrade?

I will break down the discussion into types of database refactoring operations, and I will limit to single table refactoring. The discussion will try to understand the need for refactoring and will dictate the steps towards a successful upgrade.

Reader prerequisites

I will assume MySQL to be the underlying database. To take a major component out of the equation: we may need to deal with very large tables, for which an ALTER command may take long hours. I will assume familiarity with Master-Master (Active-Passive) replication, with possible use of MMM for MySQL. When I describe “Failover from M1 to M2“, I mean “Make the ALTER changes on M2 (passive), then switch your application from M1 to M2 (change of IPs, VIP, etc.), promoting M2 to active position, then apply same changes on M1 (now passive) or completely rebuild it”.

Phew, a one sentence description of M-M usage…

I also assume the reader’s understanding that a table’s schema can be different on master & slave, which is the basis for the “use replication for refactoring” trick. But it cannot be too different, or, to be precise, the two schemata must both support the ongoing queries for the table.

A full discussion of the above is beyond the scope of this post.

Types of refactoring needs

As I limit this discussion to single table refactoring,we can look at major refactoring operations and their impact on application & upgrades. We will discuss ADD/DROP COLUMN, ADD/DROP INDEX, ADD/DROP UNIQUE INDEX, ADD/DROP FOREIGN KEY, ADD/DROP TABLE.

We will assume the database and application are both in Version #1 (V1), and need to be upgraded to V2 or greater. Continue reading » “Table refactoring & application version upgrades, Part I”

Tips for taking MySQL backups using LVM

LVM uses copy-on-write to implement snapshots. Whenever you’re writing data to some page, LVM copies the original page (the way it looked like when the snapshot was taken) to the snapshot volume. The snapshot volume must be large enough to accommodate all pages written to for the duration of the snapshot’s lifetime. In other words, you must be able to copy the data somewhere outside (tape, NFS, rsync, etc.) in less time than it would take for the snapshot to fill up.

While LVM allows for hot backups of MySQL, it still poses an impact on the disks. An LVM snapshot backup may not go unnoticed by the MySQL users.

Some general guidelines for making life easier with LVM backups follow.

Lighter, longer snapshots

If you’re confident that you have enough space on your snapshot volume, you may take the opportunity to make for a longer backup time. Why? Because you would then be able to reduce the stress from the file system. Use ionice when copying your data from the snapshot volume:

ionice -c 2 cp -R /mnt/mysql_snapshot /mnt/backup/daily/20100719/

[Update: this is only on the cfq I/O scheduler; thanks, Vojtech]

Are you running out of space?

Monitor snapshot’s allocated size: if there’s just one snapshot, do it like this: Continue reading » “Tips for taking MySQL backups using LVM”

SQL trick: overcoming GROUP_CONCAT limitation in special cases

In Verifying GROUP_CONCAT limit without using variables, I have presented a test to verify if group_concat_max_len is sufficient for known limitations. I will follow the path where I assume I cannot control group_concat_max_len, not even in session scope, and show an SQL solution, dirty as it is, to overcome the GROUP_CONCAT limitation, under certain conditions.

Sheeri rightfully asks why I wouldn’t just set group_concat_max_len in session scope. The particular case I have is that I’m providing a VIEW definition. I’d like users to “install” that view, i.e. to CREATE it on their database. The VIEW does some logic, and uses GROUP_CONCAT to implement that logic.

Now, I have no control on the DBA or developer who created the view. The creation of the view has nothing to do with the group_concat_max_len setting on her database instance.

An example

OK, apologies aside. Using the sakila database, I execute:

mysql> SELECT GROUP_CONCAT(last_name) FROM actor \G
*************************** 1. row ***************************
GROUP_CONCAT(last_name): AKROYD,AKROYD,AKROYD,ALLEN,ALLEN,ALLEN,ASTAIRE,BACALL,BAILEY,BAILEY,BALE,BALL,BARRYMORE,BASINGER,BENING,BENING,BERGEN,BERGMAN,BERRY,BERRY,BERRY,BIRCH,BLOOM,BOLGER,BOLGER,BRIDGES,BRODY,BRODY,BULLOCK,CAGE,CAGE,CARREY,CHAPLIN,CHASE,CHASE,CLOSE,COSTNER,CRAWFORD,CRAWFORD,CRONYN,CRONYN,CROWE,CRUISE,CRUZ,DAMON,DAVIS,DAVIS,DAVIS,DAY-LEWIS,DEAN,DEAN,DEE,DEE,DEGENERES,DEGENERES,DEGENERES,DENCH,DENCH,DEPP,DEPP,DERN,DREYFUSS,DUKAKIS,DUKAKIS,DUNST,FAWCETT,FAWCETT,GABLE,GARLAND,GARLAND,GARLAND,GIBSON,GOLDBERG,GOODING,GOODING,GRANT,GUINESS,GUINESS,GUINESS,HACKMAN,HACKMAN,HARRIS,HARRIS,HARRIS,HAWKE,HESTON,HOFFMAN,HOFFMAN,HOFFMAN,HOPE,HOPKINS,HOPKINS,HOPKINS,HOPPER,HOPPER,HUDSON,HUNT,HURT,JACKMAN,JACKMAN,JOHANSSON,JOHANSSON,JOHANSSON,JOLIE,JOVOVICH,KEITEL,KEITEL,KEITEL,KILMER,KILMER,KILMER,KILMER,KILMER,LEIGH,LOLLOBRIGIDA,MALDEN,MANSFIELD,MARX,MCCONAUGHEY,MCCONAUGHEY,MCDORMAND,MCKELLEN,MCKELLEN,MCQUEEN,MCQUEEN,MIRANDA,MONROE,MONROE,MOSTEL,MOSTEL,NEESON,NEESON,NICHOLSON,NOLTE,NOLTE,NOLTE,NOLTE,OLIVIER,OLIVIER,PALTROW,PALTROW,P
1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS;
+---------+------+--------------------------------------+
| Level   | Code | Message                              |
+---------+------+--------------------------------------+
| Warning | 1260 | 1 line(s) were cut by GROUP_CONCAT() |
+---------+------+--------------------------------------+
1 row in set (0.00 sec)

Continue reading » “SQL trick: overcoming GROUP_CONCAT limitation in special cases”

mycheckpoint (rev. 170): improved custom queries; local charting; page/swap I/O monitoring; improved HTML reports

Revision 170 of mycheckpoint, a MySQL monitoring solution, has been released. New and updated in this revision:

  • Improved custom queries: lifting of limitations from previous, introductory revision; better HTML presentation
  • Local, inline charting: no rendering of Google Charts, unless explicitly requested. All charts are now rendered locally using JavaScript. No data is now sent over the network.
  • Page/Swap I/O monitoring: now monitoring for page ins and outs, swap ins and outs (Linux only).
  • Improved HTML reports: several improvements on presentation (see sample, more follow).

Improved custom queries

Some limitations, introduced in revision 132, are now lifted. New features are introduced.

  • There is now no limit to the number of custom queries (well, an INT limit).
  • In fact, the data tables adjust themselves to the existing custom queries in the form of auto-deploy: once a new custom query is added or an old one removed, mycheckpoint will add or remove the relevant columns from the data tables.
  • The chart_order column is now utilized: HTML reports which include custom query charts now order those charts according to chart_order values. This makes for nicer reports.
  • The standard HTML brief report (SELECT html FROM sv_report_html_brief) now automatically includes all custom charts. The HTML brief report is the report one usually wants to look at: it provides with the latest 24 hours metrics for selected values. It now becomes a centralized place for all that is interesting in the past 24 hours.
  • Custom queries are now allowed to return NULL, treated as a missing value. This is a bugfix from previous revisions.

Local charting

Motivation for local charting is clear: no one likes having their data being sent over the network. And no one likes Google to know about their DML values.

I’ve been playing around with quite a few charting solutions, and have gone into depths with two of them, adding and rewriting quite a lot of code. Eventually, I settled on my very own rendering. Here’s what I’ve seen & tested: Continue reading » “mycheckpoint (rev. 170): improved custom queries; local charting; page/swap I/O monitoring; improved HTML reports”

Implicit casting you don’t want to see around

In Beware of implicit casting, I have outlined the dangers of implicit casting. Here’s a few more real-world examples I have tackled:

Number-String comparisons

Much like in programming languages, implicit casting is made to numbers when at least one of the arguments is a number. Thus:

mysql> SELECT 3 = '3.0';
+-----------+
| 3 = '3.0' |
+-----------+
|         1 |
+-----------+
1 row in set (0.00 sec)

mysql> SELECT '3' = '3.0';
+-------------+
| '3' = '3.0' |
+-------------+
|           0 |
+-------------+

The second query consists of pure strings comparison. It has no way to determine that number comparison should be made.

Direct DATE arithmetics

The first query seems to work, but is completely incorrect. The second explains why. The third is a total mess. Continue reading » “Implicit casting you don’t want to see around”