Tuesday, August 26, 2008

A better search box

The TopSeller II inventory search box works well up to a few thousand items. But I've got a new client who has 50,000+ items. This has bogged down the simple Rails find() type search. It is taking 90 seconds or so to do a simple search. This won't do, so I'm investigating optimized search solutions.

The four solutions for a Ruby on Rails system seem to be:
  1. Stick with a MySQL query, but do something smarter (perhaps with an index, or using the full text search capabilities of MySQL)
  2. Ferret
  3. Ultra-Search
  4. Solr
MySQL Solutions

A bit of Googling turned up a useful looking plugin: acts_as_fulltexttable

(Here are some more search options that I didn't have time to investigate: http://agilewebdevelopment.com/plugins/search?search=search

This little plugin does exactly what I was thinking... build a search index using mysql's full text search capability. If it is fast enough, then it will probably be good enough for my purposes.

Here are the full installation instructions.

== Step 1

Install the plugin:
script/plugin install http://wonsys.googlecode.com/svn/plugins/acts_as_fulltextable/

I had no problems with this... just note the last line that the installation process prints out.

== Step 2

Add the following code to the model that should be included in searches:
acts_as_fulltextable :fields, :to, :include, :in, :index

I wondered if this line should be inserted literally, or whether ':fields', for instance, should be replaced by the fields I actually wanted to index. I decided to follow the instructions literally, and added this line to three models I wanted to search.

Needless to say, this didn't work at all. I really hate overly clever documentation, and this is a classic case. I scratched my head and read the code for about an hour before stumbling upon the answer... read all the arguments to acts_as_fulltextable together, like a sentence:

'fields to include in index'

Duh. A simple example would have saved me, and probably a bunch of other people, a lot of time.

So the right way to do this is to add a line like:

acts_as_fulltextable :name, :description, :author, :etc

== Step 3

Create the migration:
script/generate fulltext_rows model1 model2 model3 ...

'modelx' needs to be the lowercase, singular version of the model name. So if your model is 'Item', use 'item'.

Then execute it:
rake db:migrate

No problems. The migration worked fine, and the index was built, all in one step. Nice. Now, does it work?

== Run searches

You can either run a search on a single model:
Model.find_fulltext('query to run', :limit => 10, :offset => 0)

Or you can run it on more models at once:
FulltextRow.search('query to run', :only => [:only, :this, :models], :limit => 10, :offset => 0)

I imported a database with 50000 records and tried some searches. Immediate gratification. Not only did the acts_as_fulltextable plugin work, but searches were blindingly fast. Things were looking up.

However, I quickly realized the query syntax was a bit quirky. It's documented in the MySQL documentation.

The plugin uses 'in boolean mode'. Here's the documentation.

Will Paginate

The next big step will be to get will_paginate working.



So, it wasn't hard to integrate the plugin with the rest of the website, and will_paginate seemed to work just fine.

However, I did run into a weird problem... Even though I had 50K items in the database, the standard search seemed to be limited to 30. Here's the SQL from the log:

FulltextRow Load (0.382258) SELECT fulltext_rows.*, match(`value`) against('1*' in boolean mode) AS relevancy FROM `fulltext_rows` WHERE (match(value) against('1*' in boolean mode) AND fulltextable_type IN ('Item')) ORDER BY relevancy DESC, value ASC LIMIT 0, 30

Note the 'Limit 0, 30' at the end.

This is limit is not in the plugin code, so I am trying to figure out where it is coming from. Using the limit field during the search does not help... find_fulltext() still returns 30 items, even when the search should return thousands of items... very puzzling...

@collection = Item.find_fulltext(search_terms, :limit => 100)

It seems to have something to do with will_paginate. If I remove the "require 'will_paginate'" from environment.rb, then when I run the above command in script/console, I get 100 items back.

In fulltext_row.rb, the following lines seem to merge paginate options with search options in a way that doesn't work for me:

if defined?(WillPaginate)
self.paginate(:all, search_options.merge(:page => page))
else
self.find(:all, search_options.merge(:limit => limit, :offset => offset))
end


It may be a bug, it may just be my requirements, but somehow that 30 from will_paginate is being used to limit the number of items returned in search. I need a way to override this behavior...

The way I 'fixed' this is to make a small change to the above code:

if defined?(WillPaginate)
self.paginate(:all, search_options.merge(:page => page, :per_page => 500))
else
self.find(:all, search_options.merge(:limit => limit, :offset => offset))
end


By adding that :per_page number, I am able to get 500 items, rather than 30. 500 is probably enough for me.

This seems a bit of a kludge, so I'll need to dig into this more, but for now, that seems to work.