Search Smith

ColdFusion, SQL queries, and, of course, searching

Posts Tagged ‘Solr’

Solr: Knowing which fields match

Posted by David Faber on March 9, 2012

How do you know which fields match your query? For example, if we search our articles index for “malaria,” and we want to know whether we matched the term in title, description, and/or journal_name, how do we go about doing that?

The answer is to turn on highlighting for those fields: hl=on&hl.fl=title,description,journal_name . Solr will return a highlighting structure containing the unique keys for each record match, along with the field that matched and a snippet of that field’s contents with the matching text highlighted. N.B.: Fields to be highlighted must be stored, but not actually indexed. How cool is that?

Posted in Solr | Tagged: , | 1 Comment »

Some statistics

Posted by David Faber on March 8, 2012

At some point yesterday this blog passed 200 unique visitors and 400 page views. And today we had our 300th visit.

I guess that’s what one would call starting small. 🙂

On a side note, I’ve recently posted on:

  1. How to handle Solr query results in ColdFusion;
  2. Returning the number of milliseconds elapsed since midnight in Oracle (useful for seeding a random number generator);
  3. Paginating query results in Oracle 10g; and
  4. Changing the port on which Jetty listens if you want to use a port other than the default (8983).

Oh, and aside from the United States, this blog is most popular in:

  1. India,
  2. Germany,
  3. Canada,
  4. The United Kingdom, and
  5. Italy.

Of course special mention must also be made to Norway! Most unexpected was the single visit from Lagos, Nigeria. Within the U.S., the greatest number of visits are from:

  1. Philadelphia, PA;
  2. Bozeman, MT;
  3. Washington, DC;
  4. Harrisburg, PA; and
  5. Chicago, IL.

Those five actually combine for over 1/3 of my visits from the U.S. and nearly 1/4 of my total visits.

Posted in ColdFusion, Miscellany, Off-Topic, Oracle, Solr | Tagged: , , , | Leave a Comment »

ColdFusion and Solr: Dealing with the results

Posted by David Faber on March 8, 2012

The Solr web service allows you to return results in either XML format or JSON (the format is XML by default; to return results in JSON format, add the parameter “wt=json” to your Solr query string). Suppose we run a query on the Solr collection myindex. There are three ways we can search this collection: (1) we could use the <cfsearch> tag; (2) we could use the Solr web service and return XML; or (3) we could use the Solr web service and return JSON. We’ve talked about <cfsearch> and its limitations before so I’ll limit this discussion to the second and third options.

XML:

<cfhttp url="http://localhost:8983/solr/myindex?q=*:*&fl=*,score" result="myresult" />

JSON:

<cfhttp url="http://localhost:8983/solr/myindex?q=*:*&fl=*,score&wt=json" result="myresult" />

In each case (XML or JSON), ColdFusion has powerful functions that help you manage the data without needing to parse it manually. Typically I create a friendly query that I can return to the front-end developer that he can use in place of a SQL query or <cfsearch> result.

<cfset search_results = queryNew("id,title,description,pubdate,journal_name,author_name,num_reads,score"
                               , "CF_SQL_INTEGER,CF_SQL_VARCHAR,CF_SQL_CLOB,CF_SQL_DATE,CF_SQL_VARCHAR,CF_SQL_VARCHAR,CF_SQL_INTEGER,CF_SQL_DOUBLE")
/>
<!--- We're using our articles index mentioned in a previous post --->

XML:

<cfif isXML(myresult)>
    <cfset xml_result = XMLParse(myresult.fileContent) />
    <!--- continue processing --->
</cfif>

JSON:

<cfif isJSON(myresult)>
    <cfset json_result = deserializeJSON(myresult.fileContent) />
    <!--- continue processing --->
</cfif>

In either case we will iterate over our results structure and put the results in our friendly query.

JSON:
The JSON structure is particularly friendly for our purposes. First, let’s get the number of results and the maximum score:

<cfset result_cnt = json_result.response.numFound />
<cfset max_score = json_result.response.maxScore />

These fields aren’t absolutely necessary, of course. In particular, maxScore seems superfluous but it might be useful if you’re converting from Verity to Solr and need scores in a % format. The Verity search engine scores results between 0 and 1, which is easily converted to a percentage; while Solr has no maximum score. So to get a Verity-style score, simply divide each result’s score by maxScore.

Now we simply loop over the docs array of the response:

<cfloop array="#json_result.response.docs#" index="doc">
    <cfif structKeyExists(doc, "id") AND structKeyExists(doc, "title")>
        <!--- Both id and title are absolutely required --->
        <cfset queryAddRow(search_results) />
        <cfset querySetCell(search_results, "id", doc[id]) />
        <cfset querySetCell(search_results, "title", doc[title]) />
        <cfset querySetCell(search_results, "score", doc[score] / max_score) />
        <cfif structKeyExists(doc, "description")>
            <cfset querySetCell(search_results, "description", doc[description]) />
        </cfif>
        <cfif structKeyExists(doc, "pub_date")>
            <cfset querySetCell(search_results, "pub_date", doc[pub_date]) />
        </cfif>
        <cfif structKeyExists(doc, "journal_name")>
            <cfset querySetCell(search_results, "journal_name", doc[journal_name]) />
        </cfif>
        <cfif structKeyExists(doc, "author_name")>
            <cfset querySetCell(search_results, "author_name", doc[author_name]) />
        </cfif>
        <cfif structKeyExists(doc, "read_cnt")>
            <cfset querySetCell(search_results, "read_cnt", doc[read_cnt]) />
        </cfif>
    </cfif>
</cfloop>

Dealing with XML results is a bit more complicated so I will discuss that in a subsequent post or in an update to this one. Suffice it to say that the approach will be very similar – we will iterate over the results and store them in a friendly query.

Posted in ColdFusion, Solr | Tagged: , , , , , , , | 7 Comments »