Sphinx & MySQL: facts and misconceptions

September 2, 2010

Sphinx search is a full text search engine, commonly used with MySQL.

There are some misconceptions about Sphinx and its usage. Following is a list of some of Sphinx' properties, hoping to answer some common questions.

  • Sphinx is not part of MySQL/Oracle.
  • It is a standalone server; an external application to MySQL.
  • Actually, it is not MySQL specific. It can work with other RDBMS: PostgreSQL, MS SQL Server.
  • And, although described as "free open-source SQL full-text search engine", it is not SQL-specific: Sphinx can read documents from XML.
  • It is often described as "full text search for InnoDB". This description is misleading. Sphinx indexes text; be it from any storage engine or external source. It solves, in a way, the issue of "FULLTEXT is only supported by MyISAM". Essentially, it provided full-text indexing for InnoDB tables, but in a very different way than the way MyISAM's FULLTEXT index works.

Sphinx works by reading documents, usually from databases. Considering the case of MySQL, Sphinx issues a SQL query which retrieves relevant data (mostly the text you want to index, but other properties allowed).

  • Being an external module, it does not update its indexes on the fly. So if 10 new rows are INSERTed, it has no knowledge of this. It must be called externally to re-read the data (or just read the new data), and re-index.
    • This is perhaps the greatest difference, functionality-wise, between Sphinx and MyISAM's FULLTEXT. The latter is always updated, for every row INSERTed, DELETEd or UPDATEd. The latter also suffers by this property, as this makes for serious overhead with large volumes.
    • There's more than one way to make that less of an issue. I'll write some more in future posts.
  • Sphinx does not keep the text to itself; just the index. Sphinx cannot be asked "Give me the blog post content for those posts containing 'open source'".
    • Sphinx will only tell you the ID (i.e. Primary Key) for the row that matches your search.
    • It is up to you to then get the content from the table.
    • With SphinxSE (Sphinx Storage Engine for MySQL) this becomes easier, all-in-one query.
  • It can keep other numeric data. Such data can be used to filter results.
  • It provides with GROUP BY-like, as well as ORDER BY-like mechanism.
  • It allows for ordering results by relevance.
  • It allows for exact match search, boolean search, and more.
  • It has an API & implementation for popular programming languages: PHP, Python, Perl, Ruby, Java.

The above describes Sphinx as a general fulltext search engine for databases. It does, however, have special treatment for MySQL:

  • First and foremost, it knows how to query MySQL for data (duh!)
  • If you don't mind compiling from source, you can rebuild MySQL with SphinxSE: a storage engine implementation. This storage engine does not actually hold any data, but rather provides an SQL-like interface to the search daemon.
    • Thus, you can query for search results using SELECT statements, JOINing to document tables, retrieving results, all in one step.
    • If you do mind compiling MySQL, be aware that MariaDB comes with SphinxSE built in in newer versions.
  • It implements the MySQL protocol. You can connect to the sphinx server using a MySQL client, and actually issue SQL statements to retrieve data. Not all SQL is supported. The valid subset is called SphinxQL.
  • Snippets (excerpts) are supported via MySQL UDF.

tags: ,
posted in MySQL by shlomi

« | »

Follow comments via the RSS Feed | Leave a comment | Trackback URL

12 Comments to "Sphinx & MySQL: facts and misconceptions"

  1. barry hunter wrote:

    > Sphinx does not keep the text [] itself; just the index.

    One, slight update to that: Currently in the 'stable' version true, but the beta version does have string attributes, so can store and get text content from sphinx. So this is changing.

    but otherwise a great overview of sphinx!

  2. shlomi wrote:

    @barry,

    thanks for the update!

  3. erkules wrote:

    The beta also supports "indexes on the fly"
    http://sphinxsearch.com/docs/current.html#rt-indexes

  4. shlomi wrote:

    To be sure, this post is not a complete feature list of sphinx (@erkules: thanks).

  5. Nams wrote:

    So how can RT indexes be of complete use if it does not allow string atributes.How can I get my complete document data from mysql table?Any suggestions anyone?

  6. shlomi wrote:

    @Nams,
    same with other indexes: you need to keep track of the PK where you've got the string in the first place. Meaning you'll need a "real" table to hold the content.

  7. Kamelot Blog wrote:

    Sphinx : j'en dit un peu plus sur ce moteur de recherche...

    Présentation de sphinx en français...

  8. DmitrySh wrote:

    Update: It is not available on a shared hosting account.

  9. sachin pethani wrote:

    HI Shlomi,

    Its a grate overview about Sphinx.

    Currently I'm using Apahe Lucene's Solr as "search engine" in my ajax based application.

    Is there any difference between these two?

    Thanks
    http://twitter.com/sachinpethani

  10. shlomi wrote:

    @Sachin,
    I have no experience with Lucene. From what I gather around, I see many users to both, and I hear they both compete closely in terms of performance and feature set. I can find quite a few links on Google which discuss this; sorry, but I can't offer any additional information here, other than the fact that Sphinx has a good integration with MySQL, which Lucene does not. This in itself is probably not a sufficient reason to choose one over the other.

  11. Jamie wrote:

    * Being an external module, it does not update its indexes on the fly. So if 10 new rows are INSERTed, it has no knowledge of this. It must be called externally to re-read the data (or just read the new data), and re-index.

    ** There's more than one way to make that less of an issue. I'll write some more in future posts.

    ---

    Did you end up writing about how to make that less of an issue?

  12. shlomi wrote:

    I did not... And it's been a while since I last used Sphinx, so I can only assume a lot has gone by that I'm unaware of.

Leave Your Comment

 

 
Powered by Wordpress and MySQL. Theme by openark.org