Search Smith

ColdFusion, SQL queries, and, of course, searching

Posts Tagged ‘highlighting’

Solr: Showing faceted search stems in human-readable terms

Posted by David Faber on March 12, 2012

A fascinating question came up on StackOverflow. Suppose you have a Solr core (collection for you ColdFusion peeps) and you want to return the most common terms found in the index. If you facet on a field that has stemming enabled, Solr will return the stems and not the matching terms. Instead, you will see stemmed terms like the following: associ, studi, signific, increas – generally not the sort of thing you want to show to your end users. However, if you use highlighting as well as faceting, fragments or snippets from the fields that match will be returned along with the search results (and along with the facet results), and you can then examine those snippets for the matching terms in a format that is readable by humans. For example, if you do the following –

?q=keyword&facet=true&facet.field=description&hl=true&hl.fl=description&hl.fragsize=0&hl.simple.pre=[&hl.simple.post=]

– then the matching terms will be returned in the highlighting structure wrapped in square brackets. You can then examine those results using regular expressions to pull out the friendly matching terms. One caveat is that unless your index is very small, you will likely only be able to retrieve a sampling of the terms matching the stems. The reason for this is that highlighting returns only those fields relevant to the documents (or records) returned by the query, and is dependent on the number of rows specified.

Update: Well, it appears that I was trying to do too much here. This won’t work as written. It can’t be done in a single query. Rather, what you would need to do is to use faceting to get the top indexed terms:

?q=*.*&facet=true&facet.field=description&facet.limit=20&rows=20

This will return the top 20 (parameter facet.limit) indexed terms. You can then query Solr with highlighting to retrieve the terms that actually match the stemmed terms:

?q=stem&hl=true&hl.fl=description&hl.fragsize=0&hl.simple.pre=[&hl.simple.post=]&rows=20

Twenty rows should be a good number to find a decent sampling of matched terms.

Posted in Solr | Tagged: , , , | Leave a Comment »

Solr: Knowing which fields match

Posted by David Faber on March 9, 2012

How do you know which fields match your query? For example, if we search our articles index for “malaria,” and we want to know whether we matched the term in title, description, and/or journal_name, how do we go about doing that?

The answer is to turn on highlighting for those fields: hl=on&hl.fl=title,description,journal_name . Solr will return a highlighting structure containing the unique keys for each record match, along with the field that matched and a snippet of that field’s contents with the matching text highlighted. N.B.: Fields to be highlighted must be stored, but not actually indexed. How cool is that?

Posted in Solr | Tagged: , | 1 Comment »