<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>code.openark.org &#187; Sphinx</title>
	<atom:link href="http://code.openark.org/blog/tag/sphinx/feed" rel="self" type="application/rss+xml" />
	<link>http://code.openark.org/blog</link>
	<description>Blog by Shlomi Noach</description>
	<lastBuildDate>Wed, 01 Feb 2012 08:19:12 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Sphinx &amp; MySQL: facts and misconceptions</title>
		<link>http://code.openark.org/blog/mysql/sphinx-mysql-facts-and-misconceptions</link>
		<comments>http://code.openark.org/blog/mysql/sphinx-mysql-facts-and-misconceptions#comments</comments>
		<pubDate>Thu, 02 Sep 2010 08:56:15 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Sphinx]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=2754</guid>
		<description><![CDATA[Sphinx search is a full text search engine, commonly used with MySQL. There are some misconceptions about Sphinx and its usage. Following is a list of some of Sphinx' properties, hoping to answer some common questions. Sphinx is not part of MySQL/Oracle. It is a standalone server; an external application to MySQL. Actually, it is [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.sphinxsearch.com/">Sphinx search</a> is a full text search engine, commonly used with MySQL.</p>
<p>There are some misconceptions about Sphinx and its usage. Following is a list of some of Sphinx' properties, hoping to answer some common questions.</p>
<ul>
<li>Sphinx is not part of MySQL/Oracle.</li>
<li>It is a standalone server; an external application to MySQL.</li>
<li>Actually, it is not MySQL specific. It can work with other RDBMS: PostgreSQL, MS SQL Server.</li>
<li>And, although described as "free open-source SQL full-text search engine", it is not SQL-specific: Sphinx can read documents from XML.</li>
<li>It is often described as "full text search for InnoDB". This description  is misleading. Sphinx indexes text; be it from any storage engine or  external source. It solves, in a way, the issue of "FULLTEXT is only  supported by MyISAM". Essentially, it provided full-text indexing for InnoDB tables, but in a <em>very</em> different way than the way MyISAM's <strong>FULLTEXT</strong> index works.</li>
</ul>
<p>Sphinx works by reading documents, usually from databases. Considering the case of MySQL, Sphinx issues a SQL query which retrieves relevant data (mostly the text you want to index, but other properties allowed).<span id="more-2754"></span></p>
<ul>
<li>Being an external module, it does not update its indexes on the fly. So  if <strong>10</strong> new rows are <strong>INSERT</strong>ed, it has no knowledge of this. It must be  called externally to re-read the data (or just read the new data), and re-index.
<ul>
<li>This is perhaps the greatest difference, functionality-wise, between Sphinx and MyISAM's <strong>FULLTEXT</strong>. The latter is always updated, for every row <strong>INSERT</strong>ed, <strong>DELETE</strong>d or <strong>UPDATE</strong>d. The latter also suffers by this property, as this makes for serious overhead with large volumes.</li>
<li>There's more than one way to make that less of an issue. I'll write some more in future posts.</li>
</ul>
</li>
<li>Sphinx does not keep the text to itself; just the index. Sphinx cannot be asked "Give me the blog post content for those posts containing 'open source'".
<ul>
<li>Sphinx will only tell you the ID (i.e. Primary Key) for the row that matches your search.</li>
<li>It is up to you to then get the content from the table.</li>
<li>With SphinxSE (Sphinx Storage Engine for MySQL) this becomes easier, all-in-one query.</li>
</ul>
</li>
<li>It can keep other numeric data. Such data can be used to filter results.</li>
<li>It provides with <strong>GROUP BY</strong>-like, as well as <strong>ORDER BY</strong>-like mechanism.</li>
<li>It allows for ordering results by relevance.</li>
<li>It allows for exact match search, boolean search, and more.</li>
<li>It has an API &amp; implementation for popular programming languages: PHP, Python, Perl, Ruby, Java.</li>
</ul>
<p>The above describes Sphinx as a general fulltext search engine for databases. It does, however, have special treatment for MySQL:</p>
<ul>
<li>First and foremost, it knows how to query MySQL for data (duh!)</li>
<li>If you don't mind compiling from source, you can rebuild MySQL with <a href="http://www.sphinxsearch.com/docs/current.html#sphinxse">SphinxSE</a>: a storage engine implementation. This storage engine does not actually hold any data, but rather provides an SQL-like interface to the search daemon.
<ul>
<li>Thus, you can query for search results using <strong>SELECT</strong> statements, <strong>JOIN</strong>ing to document tables, retrieving results, all in one step.</li>
<li>If you do mind compiling MySQL, be aware that MariaDB <a href="http://askmonty.org/wiki/MariaDB_versus_MySQL">comes with SphinxSE</a> built in in newer versions.</li>
</ul>
</li>
<li>It implements the MySQL protocol. You can connect to the sphinx server using a MySQL client, and actually issue SQL statements to retrieve data. Not all SQL is supported. The valid subset is called <a href="http://www.sphinxsearch.com/docs/current.html#sphinxql">SphinxQL</a>.</li>
<li>Snippets (excerpts) are <a href="http://www.sphinxsearch.com/docs/current.html#sphinxse-snippets">supported</a> via MySQL UDF.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/sphinx-mysql-facts-and-misconceptions/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>SphinxSE 0.9.9-RC2 bug workaround</title>
		<link>http://code.openark.org/blog/mysql/sphinxse-0-9-9-rc2-bug-workaround</link>
		<comments>http://code.openark.org/blog/mysql/sphinxse-0-9-9-rc2-bug-workaround#comments</comments>
		<pubDate>Mon, 07 Sep 2009 08:23:21 +0000</pubDate>
		<dc:creator>shlomi</dc:creator>
				<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Installation]]></category>
		<category><![CDATA[Sphinx]]></category>

		<guid isPermaLink="false">http://code.openark.org/blog/?p=1245</guid>
		<description><![CDATA[There is a serious bug with the sphinx storage engine, introduced in 0.9.9-RC2 (and which has not been fixed in latest revisions, as yet - last checked with rev 2006). I would usually just revert to an older version (0.9.9-RC1 does not contain this bug), but for the reason that RC2 introduces an important feature: [...]]]></description>
			<content:encoded><![CDATA[<p>There is a serious bug with the sphinx storage engine, introduced in 0.9.9-RC2 (and which has not been fixed in latest revisions, as yet - last checked with rev 2006).</p>
<p>I would usually just revert to an older version (0.9.9-RC1 does not contain this bug), but for the reason that RC2 introduces an important feature: the <strong>sphinx_snippets()</strong> function, which allows for creation of snippets from within MySQL, and which makes the sphinx integration with MySQL complete, as far as the application is concerned.</p>
<h4>The bug</h4>
<p>The bug is described <a href="http://sphinxsearch.com/forum/view.html?id=3589">here</a> and <a href="http://sphinxsearch.com/forum/view.html?id=4081">here</a> (and see further discussions). Though it's claimed to have been fixed, it's been re-reported, and I've tried quite a few revisions and verified it has not been fixed (tested on Debian/Ubuntu x64).  <span>Essentially, the bug does not allow you to set filters on a query issued from within the SphinxSE. For example, the following queries fail:</span></p>
<blockquote>
<pre>SELECT ... FROM ... WHERE query='python;mode=any;sort=relevance;limit=200;range=myUnixTimestamp,1249506000,1252184400;'
SELECT ... FROM ... WHERE query='python;mode=any;sort=relevance;limit=200;filter=my_field,1;'</pre>
</blockquote>
<p>While the following query succeeds:</p>
<blockquote>
<pre>SELECT ... FROM ... WHERE query='python;mode=any;sort=relevance;limit=200;'</pre>
</blockquote>
<p>The error message is this:</p>
<blockquote>
<pre><span>ERROR 1430 (HY000): There was a problem processing the query on the foreign data source. Data source error: searchd error: invalid or truncated request</span></pre>
</blockquote>
<p><span>I see this as a serious bug in the SphinxSE: it renders it useless; searching without the ability to filter is not something I can live with.<span id="more-1245"></span></span></p>
<h4><span>The motivation</span></h4>
<p><span>Sphinx does not store the actual text content. To get search results with snippets, you need to:</span></p>
<ul>
<li><span>Ask sphinx for the documents ids</span></li>
<li><span>Get the content for those documents</span></li>
<li><span>Ask sphinx for snippets based on the provided content and search phrase.</span></li>
</ul>
<p>With the introduction of the <strong>sphinx_snippets()</strong> function, this can all be done with a single query, like this:</p>
<blockquote>
<pre>SELECT my_docs.my_docs_id,  my_docs.publish_time,  CONVERT(sphinx_snippets(my_docs.id.content, 'my_docs_index', 'python') USING utf8) AS snippet  FROM tets.my_docs INNER JOIN test.my_docs_sphinx USING(my_docs_id) WHERE query='python;mode=any;sort=relevance;limit=200;range=publish_time_unix,1249506000,1252184400;';</pre>
</blockquote>
<p><span>This is really a life saver; without this function, you need to get the resutls back to your application, then send the data again to MySQL, in which case you might altogether discard the SphinxSE and talk to sphinx directly. But with a single query you get to ask the results just as if you were asking for any result set from your database (with extra syntax).</span></p>
<h4><span>The workaround</span></h4>
<p><span>My setup is Percona's </span><strong>mysql-5.1.34-xtradb5</strong> source, on Ubuntu server <strong>8.04 amd64</strong>. The trick is to first compile MySQL with sphinx <strong>0.9.9-RC2</strong>, in order to produce the <strong>sphinx.so</strong> file (where the <strong>sphinx_snippets()</strong> function is found), backup the <strong>sphinx.so</strong> file, then recompile everything with <strong>sphinx 0.9.9-RC1</strong>. The steps being:</p>
<p>Compile MySQL with sphinx <strong>0.9.9-Rc2</strong> (I choose to install MySQL on <strong>/usr/local/mysql51</strong>):</p>
<blockquote>
<pre>tar xzfv mysql-5.1.34-xtradb5.tar.gz
cd mysql-5.1.34-xtradb5
cp -R /tmp/resources/sphinx-0.9.9-rc2/mysqlse storage/sphinx
sh BUILD/autorun.sh
./configure --with-plugins=innobase,sphinx --prefix=/usr/local/mysql51
make</pre>
</blockquote>
<p>This produces the <strong>sphinx.so</strong>, <strong>sphinx.so.0</strong>, <strong>sphinx.so.0.0.0</strong> files. Back them up!</p>
<p>Next, recompile with sphinx <strong>0.9.9-RC1</strong>. I've found that simple copying and recompiling doesn't work well. So just cleanup everything and start afresh:</p>
<blockquote>
<pre>cd ..
rm -rf mysql-5.1.34-xtradb5
tar xzfv mysql-5.1.34-xtradb5.tar.gz
cd mysql-5.1.34-xtradb5
cp -R /tmp/resources/sphinx-0.9.9-rc1/mysqlse storage/sphinx
sh BUILD/autorun.sh
./configure --with-plugins=innobase,sphinx --prefix=/usr/local/mysql51
make
sudo make install</pre>
</blockquote>
<p>Copy the <strong>sphinx.so</strong> files into the MySQL plugin directory (<strong>/usr/local/mysql51/lib/mysql/plugin</strong> in our case).</p>
<p>Then build sphinx (you must have MySQL includes for sphinx to compile, so this must be the second step):</p>
<blockquote>
<pre>cd /tmp/resources/sphinx-0.9.9-rc1/
./configure --prefix=/usr/local/sphinx --with-mysql=/usr/local/mysql51
make
sudo make install</pre>
</blockquote>
<p>Essentially, we're working now with <strong>0.9.9-RC1</strong>, but the <strong>sphinx_snippets()</strong> function is from the <strong>0.9.9-RC2</strong> version, and happily no one bothers about this mix.</p>
<p>I hope this helps.</p>
]]></content:encoded>
			<wfw:commentRss>http://code.openark.org/blog/mysql/sphinxse-0-9-9-rc2-bug-workaround/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

