Removing “out of sync” error in acts_as_solr

Posted by Bhushan G Ahire | Posted in JRuby, Rails, ruby | Posted on 05-10-2010

0

Solr is an open source enterprise
search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, a web administration interface and many more features. It runs in a Java servlet container such as Tomcat.  -Apache Solr

Solr can be used in different containers and different wrappers. Our application runs on Ruby on Rails, and we used acts_as_solr. Though solr is a powerful, already stable and yet flexible third party solution that we could rely on, we were still not able to maximize its full capacity. We used the bare minimum features of solr for our search modules.

As of now, we’ve used a couple of acts_as_solr enhancments and add ons, some of which we learned from different online resources. We were able to use db_free_solr and explored on the highlighting and faceting capabilities of solr. Its been pretty helpful, but of course nothing is almost always seamless. We encounter few problems with syncing records from the database and onto solr. For sure, you’ve come across this trouble before, if you’ve been using solr:

Out of sync! Found N items in index, but only n were found in database!

It sure was putting down every page wherein there was this glitch in the count of the records retrieved. It therefore gave the negative impression that our site was frequently unstable. Removing a certain indexed element from the solr index is easy as:

ActsAsSolr:: Post.execute(Solr::Request:: Delete.new(:query => %{type_s:Model AND id:"Model:110809"}))
ActsAsSolr:: Post.execute(Solr::Request::Commit.new)

It could’ve been pretty straightforward removing this concerned item from the solr index and then everything would be well.. but its a lot harder than that if you’re looking at over a thousand indexed elements vs their ‘existing’ counterparts in the database! Finding the exact data to remove was really the hardest part! I never knew this until I took the liberty of helping out our kind Infra Team to resolve the problem. I decided to tweak the solr parser method returning the “out of sync” error. I thought that it would actually be brilliant to just display the concerned element’s id so that they could delete it from the index itself. And so, I had something like this: (in acts_solr/lib/parser_methods.rb)

raise "Out of sync! Found #{ids.size} items in index, but only #{things.size} were found in database! Remove #{(ids - (things.collect{|x| x.id})).to_sentence}." unless things.size == ids.size

And yes, viola! I can now see the faulty ids that were causing the “out of sync” problem. I presented this not-so-brilliant solution to our Infra Team, and they came up with a better idea. My colleague thought that it would be nicer if I could just do away with the “out of sync” error altogether. Since I can already pinpoint the cause of the trouble, then why not remove it for good? I came up with half the solution. It was the quicker one to implement and didn’t require much from their end either.

Distinguishing the faulty id from the list of objects from solr vs those that were from the db, it paved the way for me to simply remove these ids from the checking. It was half the solution because (hint, hint.. I may be doing this next time when I have time) I could actually delete the certain indexed element from solr instead of simply removing it from solr’s items on hand. This “full” solution could actually bring forth other complications since you’d have to deal with what models were concerned and what fields will solr need to look at, etc.

And so.. the half solution that I did was to clean up the elements on hand for solr. This snippet is found in acts_as_solr/lib/parser_methods.rb.

 def reorder(things, ids)
    ordered_things = Array.new(things.size)

    unless things.size == ids.size
      (ids - (things.collect{|x| x.id})).collect{|missing| ids[ids.index(missing)] = nil}
      ids = ids.compact
    end

    raise "Out of sync! Found #{ids.size} items in index, but only #{things.size} were found in database! Remove #{(ids - (things.collect{|x| x.id})).to_sentence}." unless things.size == ids.size

    things.each do |thing|
      position = ids.index(thing.id)
      ordered_things[position] = thing
    end

    ordered_things
  end

The first four lines above the “out of sync” message is what is critical. It will attempt to remove the missing object from the items that solr will return. If all else fails, then it will be displaying the “out of sync” error, but would still be displaying the ids that were causing the problem.

Its quick, but not dirty. It works, but will not really guarantee that your problem will go away permanently. I suggest you do a complete reindex of your whole data. Or better yet, whatever was causing it, just make sure that there are no direct database deletion of any data so that solr will always remain in sync with your database.

Also there is an alternative option if you dont like the above.
Why just don’t MySQL decide your ordering.

A small snippet you need to change in

acts_as_solr/lib/parser_methods.rb

 def find_objects(ids, options, configuration)
      result = if configuration[:lazy] && configuration[:format] != :ids
        ids.collect {|id| ActsAsSolr::LazyDocument.new(id, self)}
      elsif configuration[:format] == : objects
        conditions = [ "#{self.table_name}.#{primary_key} in (?)", ids ]
        find_options = {:conditions => conditions}
        find_options[:include] = options[:include] if options[:include]
        if self.connection.adapter_name =~ /mysql/i
          find_options[:order] = "FIELD(#{self.table_name}.#{primary_key}, #{ids.join(',')})"
          result = self.find(:all, find_options)
        else
          result = reorder(self.find(:all, find_options), ids)
        end
      else
        ids
      end

      result
    end

In the above method it will check if the adapter is mysql then it will fetch the latest result list from the DB which will not cause the “Out Of Sync” issue.

Hope this helps.

Deploying two rails application with Apache + mongrel on windows

Posted by Bhushan Ahire | Posted in Rails | Posted on 28-01-2008

1

Install Ruby, Gems and then install Ruby on Rails:



sudo gem install rails --include-dependencies

Now download and install Apache 2.2 using, as the fastest way, the msi package.

Now enable the needed modules (url rewriting, proxy, proxy_balancer e proxy_http) by editing the httpd.conf file (under c:Apache_Software_FoundationApache2.2conf, if you installed Apache in its standard path). You just need to uncomment the following lines (remove the #):



LoadModule rewrite_module modules/mod_rewrite.so
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_balancer_module modules/mod_proxy_balancer.so
LoadModule proxy_http_module modules/mod_proxy_http.so

Install the mongrel and mongrel_service gems:



gem install mongrel (pick last version for win32)
gem install mongrel_service (pick last version for win32)

Now we will create a mongrel cluster of 2 windows services responding at http://127.0.0.1 on ports 3010, 3011 serving a rail application at the path c:wwwrormyapp that will be started from the windows system user. The two windows services will be respectively named mongrel_myapp1 and mongrel_myapp2. Open the command prompt and type:



mongrel_rails service::install -N mongrel_myapp1 -p 3010 -e production -c c:wwwrormyapp
mongrel_rails service::install -N mongrel_myapp2 -p 3011 -e production -c c:wwwrormyapp

Now open the windows services tool, make the 2 new services have an automatic startup type (so they will still be started when you reboot).
Test if your application is now running at the two ports:



http://localhost:3010

http://localhost:3011

If everything is working fine, you are ready to place Apache in front of these 2 mongrel services, to manage the load balancing of you application.

The best way to configure Apache is to create a Virtual Host for your ROR application. First edit your httpd.conf file, and uncomment the following line:



# Virtual hosts

Include conf/extra/httpd-vhosts.conf

Now edit the httpd-vhosts.conf file, like this (keep the slashes in the *nix fashion!):



NameVirtualHost *:80

#Proxy balancer section (create one for each ruby app cluster)
<Proxy balancer://myapp_cluster>
  BalancerMember http://myapp:3010
  BalancerMember http://myapp:3011
</Proxy>

#Virtual host section (create one for each ruby app you need to publish)

<VirtualHost *:80>
  ServerName myapp
  DocumentRoot c:/www/ror/myapp/public/

  <Directory c:/www/ror/myapp/public/ >
      Options Indexes FollowSymLinks MultiViews
      AllowOverride All
      Order allow,deny
      allow from all
  </Directory>

  #log files
  ErrorLog /var/log/apache2/myapp_error.log
  # Possible values include: debug, info, notice, warn, error, crit,
  # alert, emerg.
  LogLevel warn
  CustomLog /var/log/apache2/myapp_access.log combined

  #Rewrite stuff
   RewriteEngine On

  # Check for maintenance file and redirect all requests
  RewriteCond %{DOCUMENT_ROOT}/system/maintenance.html -f
  RewriteCond %{SCRIPT_FILENAME} !maintenance.html
  RewriteRule ^.*$ /system/maintenance.html [L]

  # Rewrite index to check for static
  RewriteRule ^/$ /index.html [QSA]

  # Rewrite to check for Rails cached page
  RewriteRule ^([^.]+)$ $1.html [QSA]

  # Redirect all non-static requests to cluster
  RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f
  RewriteRule ^/(.*)$ balancer://myapp_cluster%{REQUEST_URI} [P,QSA,L]

</VirtualHost>

Add your app as a host in hosts.file (in the c:WINNTsystem32driversetc folder):



127.0.0.1 localhost
127.0.0.1 myapp

Restart now Apache from the Windows services panel, and if everything is fine you should have your app served by Apache at the following url:

http://myapp