Create and retrieve informations from an index with Lucene.NET

Nowadays, users rely blindly on search engines to find the information they need. For this reason, when building a web application, it is good practice to provide users the opportunity to search for information within the site. Everybody would like to have a search engine that has the same effectiveness and efficiency as the most famous and popular ones.
Well, being able to reach this goal by oneself, namely to build one’s own algorithm, is not that unremarkable.
There are various external solutions that can be incorporated in your website using scripts, such as search boxes “powered by Google”. These solutions are very useful but hardly customizable.
In this article, we’re going to discuss an elegant solution, quick to implement and extraordinarily effective to build your own search engine.
To reach our goal, let’s introduce the library that lies at the bottom of everything. We are talking about Lucene. An open-source library written in Java and available for other platforms too.

What we would like to emphasize in this discussion is how to use this library on .NET platforms in C#.

Index:

  1. Importing libraries
  2. Creating an index
  3. Including documents in the index
  4. Retrieving information

Creating a project and importing libraries

First of all, we need to create a new project in Visual Studio. Once the new project is created, let’s go to reference and right click on “Manage NuGet Packages”. Look for “Lucene” inside the tool. At this point, let’s import the library clicking on “Install”. All the libraries (.dll) necessaries will be imported.

Now that we have the references, let’s start using them.

Creating an index

In the following snippet, the creation of a Lucene index is described.

Lucene.Net.Store.Directory dir = Lucene.Net.Store.FSDirectory.GetDirectory(@"C:/INDEX_PATH_LOCATION", true);

Lucene.Net.Analysis.Analyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_CURRENT);

Lucene.Net.Index.IndexWriter writer = new Lucene.Net.Index.IndexWriter(dir, analyzer, Lucene.Net.Index.IndexWriter.MaxFieldLength.LIMITED);

Once the index is created, documents can be added inside of it.

Including documents in the index

Lucene can index any kind of information, from text files (.txt, .xml), to documents (.doc, .pdf).

The information to be added inside Lucene data structure depends on the application context.

To add documents to the index, we first have to retrieve the IndexWriter defined at point 2.

IndexWriter writer = // retrieve your index in your own location

// create a document
Lucene.Net.Documents.Document doc = new Lucene.Net.Documents.Document();

doc.Add(new Field("id", istance.property.ToString(), Field.Store.YES, Field.Index.NO));
doc.Add(new Field("yourField1", istance.property.ToString(), Field.Store.YES, Field.Index.ANALYZED));
doc.Add(new Field("yourField2", istance.property.ToString(), Field.Store.YES, Field.Index.ANALYZED));

writer.AddDocument(doc);

writer.Optimize();
writer.Flush();
writer.Close();

The last step consists in retrieving the stored information in order to submit it to the user in the desired shape.

Retrieving information

To retrieve information related to what the user requests, Lucene Per recuperare le informazioni relative a quanto richiesto dall’utente, Lucene provides a series of methods that allow u sto retrieve the document in the index. The extra effort that the programmer has to make is to turn Lucene results into a list of business objects that will be presented to the user in the desired shape.

To question the index and get teh results take your cue from this snippet

// Multi fields search
MultiFieldQueryParser parser = new MultiFieldQueryParser(new string[] { "yourField1", "yourField2" }, ANALYZER);
Query query = parser.Parse(userQuery);

IndexSearcher searcher = new IndexSearcher(INDEX_PATH);

Hits hits = searcher.Search(query);
List<T> searchResults = new List<T>();

searchResults.AddRange(ProcessQueryResults(hits)); // ProcessQueryResults, your own method to transform Lucene document in your business object

searcher.Close();

return searchResults;

With these few lines of code and an in-depth library examination to perform more complex queries, you can integrate a real search engine in your website.

Advertisement
This entry was posted in C# and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s