{"id":2754,"date":"2010-09-02T10:56:15","date_gmt":"2010-09-02T08:56:15","guid":{"rendered":"http:\/\/code.openark.org\/blog\/?p=2754"},"modified":"2010-09-02T10:56:15","modified_gmt":"2010-09-02T08:56:15","slug":"sphinx-mysql-facts-and-misconceptions","status":"publish","type":"post","link":"https:\/\/code.openark.org\/blog\/mysql\/sphinx-mysql-facts-and-misconceptions","title":{"rendered":"Sphinx &#038; MySQL: facts and misconceptions"},"content":{"rendered":"<p><a href=\"http:\/\/www.sphinxsearch.com\/\">Sphinx search<\/a> is a full text search engine, commonly used with MySQL.<\/p>\n<p>There are some misconceptions about Sphinx and its usage. Following is a list of some of Sphinx&#8217; properties, hoping to answer some common questions.<\/p>\n<ul>\n<li>Sphinx is not part of MySQL\/Oracle.<\/li>\n<li>It is a standalone server; an external application to MySQL.<\/li>\n<li>Actually, it is not MySQL specific. It can work with other RDBMS: PostgreSQL, MS SQL Server.<\/li>\n<li>And, although described as &#8220;free open-source SQL full-text search engine&#8221;, it is not SQL-specific: Sphinx can read documents from XML.<\/li>\n<li>It is often described as &#8220;full text search for InnoDB&#8221;. This description  is misleading. Sphinx indexes text; be it from any storage engine or  external source. It solves, in a way, the issue of &#8220;FULLTEXT is only  supported by MyISAM&#8221;. Essentially, it provided full-text indexing for InnoDB tables, but in a <em>very<\/em> different way than the way MyISAM&#8217;s <strong>FULLTEXT<\/strong> index works.<\/li>\n<\/ul>\n<p>Sphinx works by reading documents, usually from databases. Considering the case of MySQL, Sphinx issues a SQL query which retrieves relevant data (mostly the text you want to index, but other properties allowed).<!--more--><\/p>\n<ul>\n<li>Being an external module, it does not update its indexes on the fly. So  if <strong>10<\/strong> new rows are <strong>INSERT<\/strong>ed, it has no knowledge of this. It must be  called externally to re-read the data (or just read the new data), and re-index.\n<ul>\n<li>This is perhaps the greatest difference, functionality-wise, between Sphinx and MyISAM&#8217;s <strong>FULLTEXT<\/strong>. The latter is always updated, for every row <strong>INSERT<\/strong>ed, <strong>DELETE<\/strong>d or <strong>UPDATE<\/strong>d. The latter also suffers by this property, as this makes for serious overhead with large volumes.<\/li>\n<li>There&#8217;s more than one way to make that less of an issue. I&#8217;ll write some more in future posts.<\/li>\n<\/ul>\n<\/li>\n<li>Sphinx does not keep the text to itself; just the index. Sphinx cannot be asked &#8220;Give me the blog post content for those posts containing &#8216;open source'&#8221;.\n<ul>\n<li>Sphinx will only tell you the ID (i.e. Primary Key) for the row that matches your search.<\/li>\n<li>It is up to you to then get the content from the table.<\/li>\n<li>With SphinxSE (Sphinx Storage Engine for MySQL) this becomes easier, all-in-one query.<\/li>\n<\/ul>\n<\/li>\n<li>It can keep other numeric data. Such data can be used to filter results.<\/li>\n<li>It provides with <strong>GROUP BY<\/strong>-like, as well as <strong>ORDER BY<\/strong>-like mechanism.<\/li>\n<li>It allows for ordering results by relevance.<\/li>\n<li>It allows for exact match search, boolean search, and more.<\/li>\n<li>It has an API &amp; implementation for popular programming languages: PHP, Python, Perl, Ruby, Java.<\/li>\n<\/ul>\n<p>The above describes Sphinx as a general fulltext search engine for databases. It does, however, have special treatment for MySQL:<\/p>\n<ul>\n<li>First and foremost, it knows how to query MySQL for data (duh!)<\/li>\n<li>If you don&#8217;t mind compiling from source, you can rebuild MySQL with <a href=\"http:\/\/www.sphinxsearch.com\/docs\/current.html#sphinxse\">SphinxSE<\/a>: a storage engine implementation. This storage engine does not actually hold any data, but rather provides an SQL-like interface to the search daemon.\n<ul>\n<li>Thus, you can query for search results using <strong>SELECT<\/strong> statements, <strong>JOIN<\/strong>ing to document tables, retrieving results, all in one step.<\/li>\n<li>If you do mind compiling MySQL, be aware that MariaDB <a href=\"http:\/\/askmonty.org\/wiki\/MariaDB_versus_MySQL\">comes with SphinxSE<\/a> built in in newer versions.<\/li>\n<\/ul>\n<\/li>\n<li>It implements the MySQL protocol. You can connect to the sphinx server using a MySQL client, and actually issue SQL statements to retrieve data. Not all SQL is supported. The valid subset is called <a href=\"http:\/\/www.sphinxsearch.com\/docs\/current.html#sphinxql\">SphinxQL<\/a>.<\/li>\n<li>Snippets (excerpts) are <a href=\"http:\/\/www.sphinxsearch.com\/docs\/current.html#sphinxse-snippets\">supported<\/a> via MySQL UDF.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Sphinx search is a full text search engine, commonly used with MySQL. There are some misconceptions about Sphinx and its usage. Following is a list of some of Sphinx&#8217; properties, hoping to answer some common questions. Sphinx is not part of MySQL\/Oracle. It is a standalone server; an external application to MySQL. Actually, it is [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[5],"tags":[57,44],"class_list":["post-2754","post","type-post","status-publish","format-standard","hentry","category-mysql","tag-open-source","tag-sphinx"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p2bZZp-Iq","_links":{"self":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/2754","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/comments?post=2754"}],"version-history":[{"count":23,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/2754\/revisions"}],"predecessor-version":[{"id":2904,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/posts\/2754\/revisions\/2904"}],"wp:attachment":[{"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/media?parent=2754"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/categories?post=2754"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code.openark.org\/blog\/wp-json\/wp\/v2\/tags?post=2754"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}