Posts Tagged ‘Search Engines’

About Apache Lucene and Lucene Solr – Properties and Advantages

Sunday, October 4th, 2009
by Brian Warren

Apache Lucene is a free Java-based search library available as open source under the Apache Software License. Users can modify or embed the technology. Any resulting product may be sold, re-distributed or kept proprietary. .NET and other versions may be available, though the software is entirely written in Java.

Apache Lucene has many strong properties: Speed – sub-second queries for the most part; strong out of the box relevancy ranking; equaling or exceeding the best of the commercial competition.

Apache Lucene has complete query capabilities: keyword, Boolean and +/- queries, proximity operators, wildcards, fielded searching, term/field/document weights, find-similar, spell-checking, multi-lingual search and others. Complete results processing, along with sorting by relevancy, date or any field, dynamic summaries and hit highlighting

Apache Lucene is portable: it runs on any Java-supporting platform. The indexes are cross-platform portable. It is highly scalable, with production applications in the hundreds of millions and billions of documents/records; has low overhead indexes and very fast incremental indexing – particularly with versions 2.3 and after.

Solr is a layer of code on top of Lucene that transforms it into an enterprise search platform is available (Lucene Solr). It is free open source available under the liberal Apache Software License. It contains many of the capabilities needed to turn a core search capability into a full-fledged search application.

It has capabilities such as Web service (Solr places Lucene over HTTP, allowing programs written in any language to invoke Lucene.), XML based schema for managing indexed fields and their characteristics. Admin tools for configuration, data loading, index replication, statistics, logging and cache management; Large scale distributed search, Fixed/paid result list placement, Faceting ” the dynamic clustering of items or search results into categories that let users drill into search results (or even skip searching entirely) by any value in any field, as seen in popular e-commerce sites such as Amazon or Zappos.

About the Author: