Starting to work with SOLR
SOLR, a wrapper for Lucene, was developed by a fellow coworker at CNET. It has recently graduated from the Apache incubation cycle and is now a full fledge project. It not only does it wrap Lucene with a simpler interface it more importantly creates a Restful type of API. I only been using this for a few months now but I love it. It will be launching it into production at work and for my own personal project.
So what is SOLR/Lucene
Lucene is an indexing system, much like a database, except faster and narrower. Narrower? Well essentially an index is a single table with a primary key per entry. It allows extremely fast full text searching with stemming that other databases cannot handle, or even support. MySQL for example, especially MySQL 5 will fall over under heavy load no ifs-ands-or-buts. I’ve seen it happen at work in the lab as well as in production.
Lucene, as well as Solr, are built in Java but Solr needs a servlet container to run like Tomcat, Resin, WebLogic, etc. It runs “next to” your database. All the data in the Lucene index is the same as 1 table in your database. When you write to your DB you will write to Lucene (via SOLR). When you delete, same thing. Updating, yes. You get the point.
Faceting
Solr, and Lucene of course, also support faceting which is very powerful. You have seen this on many sites especially in the world of online shopping comparison. It allows you to see how many other entries also share the same common attributes. In the example of shopping comparison you can see how many other products are also, under 20 dollars, made of cotton, and from Amazon Marketplace. This is very powerful feature and it is perfect for allowing users to drill down through the data.
I will post more findings when I have the time, including configurations.
Categories: Software
[...] 2.0 sites implement tags involves faceting. I have discussed this in a previous blog post regarding faceting with Lucene and SOLR, but it in a nutshell, it allows you to group together documents or objects based on attributes. [...]