Sphinx – code.openark.org http://shlomi-noach.github.io/blog/ Blog by Shlomi Noach Tue, 03 Apr 2012 11:05:42 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.3 32412571 Thoughts on using MySQL plugins https://shlomi-noach.github.io/blog/mysql/thoughts-on-using-mysql-plugins https://shlomi-noach.github.io/blog/mysql/thoughts-on-using-mysql-plugins#comments Tue, 03 Apr 2012 11:05:42 +0000 https://shlomi-noach.github.io/blog/?p=4769 I’m giving thoughts on the viability of MySQL plugins. This is due to a particular experience I’ve had, which is thankfully solved. However, it left some bitter taste in my mouth.

MySQL plugins are a tricky business. To create a plugin, you must compile it against the MySQL version you wish the users to use it with. Theoretically, you should compile it against any existing MySQL version, minors as well (I’m not sure whether it may sometimes or most times work across minor versions).

But, most important, you must adapt your plugin to major versions.

Another option for plugin makers, is to actually not recompile it, but rather provide with the source code, and let the end user compile it with her own MySQL version. But here, too, the code must be compatible with whatever changes the new MySQL version may have.

And if it doesn’t compile with the new MySQL version?

That’s what happened to me. The particular case at hand was SphinxSE, a plugin which serves as a bridge between MySQL and a Sphinx Search server. I’ve been using it for years and was happy about it. But, as it happened, it took well over a year for sphinx to compile with MySQL 5,5. This meant I was unable to upgrade my 5.1 installation to 5.5, a thing I was aiming to do for quite a while.

Even when fixed, not all features were included, and thankfully I was able to come up with my own patch.

Not complaining about this particular project — I think Sphinx is awesome, with latest versions providing great features I’m putting into immediate use.

However, how much am I willing to rely on 3rd party projects when planning my MySQL upgrades? I’m now thinking I shouldn’t.

A central repository?

MariaDB does a great thing: it provides with several additional features over standard MySQL, including a set of plugins. They turn into a debian-like repository, in which they maintain the plugins for their own distribution.

[Darn! I just realized I should have looked at what they did with Sphinx in their 5.5 distribution! Need to do my monthly mental examination.]

Anyway, this is something I would like to see outside MariaDB as well: a central repository where plugins are maintained and kept up to the latest releases.

Thoughts, anyone?

]]>
https://shlomi-noach.github.io/blog/mysql/thoughts-on-using-mysql-plugins/feed 5 4769
sphinx, sphinx_snippets() & MySQL 5.5 https://shlomi-noach.github.io/blog/mysql/sphinx-sphinx_snippets-mysql-5-5 https://shlomi-noach.github.io/blog/mysql/sphinx-sphinx_snippets-mysql-5-5#comments Wed, 21 Mar 2012 13:57:59 +0000 https://shlomi-noach.github.io/blog/?p=4775 I’ve written a patch which completes Sphinx’s integration with MySQL 5.5.

Up until a couple months ago, Sphinx would not compile with MySQL 5.5 at all. This is, thankfully, resolved as of Sphinx 2.0.3.

However, to my disdain, I’ve found out that it only partially work: the sphinx_snippets() user defined function is not included within the plugin library. After some quick poking I discovered that it was not added to the build, and when added, would not compile.

I rely on sphinx_snippets() quite a lot, and like it. Eventually I wrote the fix to the snippets_udf.cc which allows it to run in a MySQL 5.5 server.

Here are the changes for the 2.0.4 version of Sphinx:

Replace your 2.0.4 files with these two and get on compiling your MySQL server.

Compilation guide

For completeness, here’s how to compile Percona Server 5.5 with Sphinx 2.0.4 including the above patches:

Get Percona Server source code and Sphinx Search source code.

I’ll be using Percona Server 5.5.21-25.0. I use /data/tmp/mysql as compilation path, and install MySQL on /usr/local/mysql55.

mkdir -p /data/tmp/mysql
cd /data/tmp/mysql
tar xzfv Percona-Server-5.5.21-rel25.0.tar.gz
tar xzfv sphinx-2.0.4-release.tar.gz
cd Percona-Server-5.5.21-rel25.0/
cp -R /data/tmp/mysql/sphinx-2.0.4-release/mysqlse storage/sphinx

Overwrite with patched files included in this post:

cp /tmp/CMakeLists.txt storage/sphinx/CMakeLists.txt
cp /tmp/snippets_udf.cc storage/sphinx/snippets_udf.cc

Build MySQL:

sh BUILD/autorun.sh
./configure --with-plugin-sphinx --prefix=/usr/local/mysql55
make
sudo make install

Install the mysql55 service:

cd /usr/local/mysql55
sudo cp support-files/mysql.server /etc/init.d/mysql55

In /etc/bash.bashrc, add:

export PATH=/usr/local/mysql55/bin:${PATH}

Start MySQL:

sudo service mysql55 start

Login to MySQL as an administrato (typically root) and install Sphinx:

mysql> INSTALL PLUGIN sphinx SONAME 'ha_sphinx.so';
mysql> CREATE FUNCTION sphinx_snippets RETURNS STRING SONAME 'ha_sphinx.so';

Notes

See also http://sphinxsearch.com/bugs/view.php?id=1090 and http://sphinxsearch.com/forum/view.html?id=8982

]]>
https://shlomi-noach.github.io/blog/mysql/sphinx-sphinx_snippets-mysql-5-5/feed 8 4775
Sphinx & MySQL: facts and misconceptions https://shlomi-noach.github.io/blog/mysql/sphinx-mysql-facts-and-misconceptions https://shlomi-noach.github.io/blog/mysql/sphinx-mysql-facts-and-misconceptions#comments Thu, 02 Sep 2010 08:56:15 +0000 https://shlomi-noach.github.io/blog/?p=2754 Sphinx search is a full text search engine, commonly used with MySQL.

There are some misconceptions about Sphinx and its usage. Following is a list of some of Sphinx’ properties, hoping to answer some common questions.

  • Sphinx is not part of MySQL/Oracle.
  • It is a standalone server; an external application to MySQL.
  • Actually, it is not MySQL specific. It can work with other RDBMS: PostgreSQL, MS SQL Server.
  • And, although described as “free open-source SQL full-text search engine”, it is not SQL-specific: Sphinx can read documents from XML.
  • It is often described as “full text search for InnoDB”. This description is misleading. Sphinx indexes text; be it from any storage engine or external source. It solves, in a way, the issue of “FULLTEXT is only supported by MyISAM”. Essentially, it provided full-text indexing for InnoDB tables, but in a very different way than the way MyISAM’s FULLTEXT index works.

Sphinx works by reading documents, usually from databases. Considering the case of MySQL, Sphinx issues a SQL query which retrieves relevant data (mostly the text you want to index, but other properties allowed).

  • Being an external module, it does not update its indexes on the fly. So if 10 new rows are INSERTed, it has no knowledge of this. It must be called externally to re-read the data (or just read the new data), and re-index.
    • This is perhaps the greatest difference, functionality-wise, between Sphinx and MyISAM’s FULLTEXT. The latter is always updated, for every row INSERTed, DELETEd or UPDATEd. The latter also suffers by this property, as this makes for serious overhead with large volumes.
    • There’s more than one way to make that less of an issue. I’ll write some more in future posts.
  • Sphinx does not keep the text to itself; just the index. Sphinx cannot be asked “Give me the blog post content for those posts containing ‘open source'”.
    • Sphinx will only tell you the ID (i.e. Primary Key) for the row that matches your search.
    • It is up to you to then get the content from the table.
    • With SphinxSE (Sphinx Storage Engine for MySQL) this becomes easier, all-in-one query.
  • It can keep other numeric data. Such data can be used to filter results.
  • It provides with GROUP BY-like, as well as ORDER BY-like mechanism.
  • It allows for ordering results by relevance.
  • It allows for exact match search, boolean search, and more.
  • It has an API & implementation for popular programming languages: PHP, Python, Perl, Ruby, Java.

The above describes Sphinx as a general fulltext search engine for databases. It does, however, have special treatment for MySQL:

  • First and foremost, it knows how to query MySQL for data (duh!)
  • If you don’t mind compiling from source, you can rebuild MySQL with SphinxSE: a storage engine implementation. This storage engine does not actually hold any data, but rather provides an SQL-like interface to the search daemon.
    • Thus, you can query for search results using SELECT statements, JOINing to document tables, retrieving results, all in one step.
    • If you do mind compiling MySQL, be aware that MariaDB comes with SphinxSE built in in newer versions.
  • It implements the MySQL protocol. You can connect to the sphinx server using a MySQL client, and actually issue SQL statements to retrieve data. Not all SQL is supported. The valid subset is called SphinxQL.
  • Snippets (excerpts) are supported via MySQL UDF.
]]>
https://shlomi-noach.github.io/blog/mysql/sphinx-mysql-facts-and-misconceptions/feed 12 2754
SphinxSE 0.9.9-RC2 bug workaround https://shlomi-noach.github.io/blog/mysql/sphinxse-0-9-9-rc2-bug-workaround https://shlomi-noach.github.io/blog/mysql/sphinxse-0-9-9-rc2-bug-workaround#respond Mon, 07 Sep 2009 08:23:21 +0000 https://shlomi-noach.github.io/blog/?p=1245 There is a serious bug with the sphinx storage engine, introduced in 0.9.9-RC2 (and which has not been fixed in latest revisions, as yet – last checked with rev 2006).

I would usually just revert to an older version (0.9.9-RC1 does not contain this bug), but for the reason that RC2 introduces an important feature: the sphinx_snippets() function, which allows for creation of snippets from within MySQL, and which makes the sphinx integration with MySQL complete, as far as the application is concerned.

The bug

The bug is described here and here (and see further discussions). Though it’s claimed to have been fixed, it’s been re-reported, and I’ve tried quite a few revisions and verified it has not been fixed (tested on Debian/Ubuntu x64). Essentially, the bug does not allow you to set filters on a query issued from within the SphinxSE. For example, the following queries fail:

SELECT ... FROM ... WHERE query='python;mode=any;sort=relevance;limit=200;range=myUnixTimestamp,1249506000,1252184400;'
SELECT ... FROM ... WHERE query='python;mode=any;sort=relevance;limit=200;filter=my_field,1;'

While the following query succeeds:

SELECT ... FROM ... WHERE query='python;mode=any;sort=relevance;limit=200;'

The error message is this:

ERROR 1430 (HY000): There was a problem processing the query on the foreign data source. Data source error: searchd error: invalid or truncated request

I see this as a serious bug in the SphinxSE: it renders it useless; searching without the ability to filter is not something I can live with.

The motivation

Sphinx does not store the actual text content. To get search results with snippets, you need to:

  • Ask sphinx for the documents ids
  • Get the content for those documents
  • Ask sphinx for snippets based on the provided content and search phrase.

With the introduction of the sphinx_snippets() function, this can all be done with a single query, like this:

SELECT my_docs.my_docs_id,  my_docs.publish_time,  CONVERT(sphinx_snippets(my_docs.id.content, 'my_docs_index', 'python') USING utf8) AS snippet  FROM tets.my_docs INNER JOIN test.my_docs_sphinx USING(my_docs_id) WHERE query='python;mode=any;sort=relevance;limit=200;range=publish_time_unix,1249506000,1252184400;';

This is really a life saver; without this function, you need to get the resutls back to your application, then send the data again to MySQL, in which case you might altogether discard the SphinxSE and talk to sphinx directly. But with a single query you get to ask the results just as if you were asking for any result set from your database (with extra syntax).

The workaround

My setup is Percona’s mysql-5.1.34-xtradb5 source, on Ubuntu server 8.04 amd64. The trick is to first compile MySQL with sphinx 0.9.9-RC2, in order to produce the sphinx.so file (where the sphinx_snippets() function is found), backup the sphinx.so file, then recompile everything with sphinx 0.9.9-RC1. The steps being:

Compile MySQL with sphinx 0.9.9-Rc2 (I choose to install MySQL on /usr/local/mysql51):

tar xzfv mysql-5.1.34-xtradb5.tar.gz
cd mysql-5.1.34-xtradb5
cp -R /tmp/resources/sphinx-0.9.9-rc2/mysqlse storage/sphinx
sh BUILD/autorun.sh
./configure --with-plugins=innobase,sphinx --prefix=/usr/local/mysql51
make

This produces the sphinx.so, sphinx.so.0, sphinx.so.0.0.0 files. Back them up!

Next, recompile with sphinx 0.9.9-RC1. I’ve found that simple copying and recompiling doesn’t work well. So just cleanup everything and start afresh:

cd ..
rm -rf mysql-5.1.34-xtradb5
tar xzfv mysql-5.1.34-xtradb5.tar.gz
cd mysql-5.1.34-xtradb5
cp -R /tmp/resources/sphinx-0.9.9-rc1/mysqlse storage/sphinx
sh BUILD/autorun.sh
./configure --with-plugins=innobase,sphinx --prefix=/usr/local/mysql51
make
sudo make install

Copy the sphinx.so files into the MySQL plugin directory (/usr/local/mysql51/lib/mysql/plugin in our case).

Then build sphinx (you must have MySQL includes for sphinx to compile, so this must be the second step):

cd /tmp/resources/sphinx-0.9.9-rc1/
./configure --prefix=/usr/local/sphinx --with-mysql=/usr/local/mysql51
make
sudo make install

Essentially, we’re working now with 0.9.9-RC1, but the sphinx_snippets() function is from the 0.9.9-RC2 version, and happily no one bothers about this mix.

I hope this helps.

]]>
https://shlomi-noach.github.io/blog/mysql/sphinxse-0-9-9-rc2-bug-workaround/feed 0 1245