Search Smith

ColdFusion, SQL queries, and, of course, searching

ColdFusion Solr Tutorial

Search Smith’s most popular page – 2,000 page views and counting as of 03/15/2013

Integrating ColdFusion with Apache Solr – A Tutorial

I thought it would be helpful to create a page from which I could link all of my posts related to ColdFusion/Solr integration in chronological order (or, at least, in order of their usefulness) along with what I hope is some helpful commentary.

Let’s start by assuming that we’re going to be creating a Solr collection on the same server on which ColdFusion is installed. Creating a collection is easy enough: Log in to ColdFusion Administrator and choose “ColdFusion Collections” under the Data & Services menu on the left side of the page. You will see a screen that looks something like this:

ColdFusion Collections

(You can’t see it in this image, but ColdFusion also creates a Solr collection named core0 in addition to the default Verity collection bookclub. As an aside, a sure sign that Solr is not configured properly will be that no Solr collections are listed on this page despite the fact that the service is running.)

Upgrading from Verity to Solr

Note that in ColdFusion 9, not only is Verity still available, but it is chosen by default. To create a Solr collection, simply type the name of the collection in the appropriate box and choose the radio button next to “Solr collection,” then click “Create Collection.” This will create the file structure for the collection along with the schema.xml file. (Assuming you create the collection in the default directory shown in the image above, the schema.xml file will be found in C:\ColdFusion9\collections\<collection_name>\conf\schema.xml.) Note that Solr has category support enabled by default (actually, this is because the category field will just be defined as multiValued in schema.xml, but it’s nice to have regardless).

Now that our collection has been created, we can edit schema.xml as mentioned in “ColdFusion 9: Upgrading from Verity to Solr.” Note that a lot of the fields in the schema.xml file that ColdFusion creates by default may not be needed for a particular collection. For example, if we’re indexing database content, then we probably don’t have much need for the mime and size fields. On the other hand, there are probably a number of fields that we do need that aren’t found here (yet).

Customizing schema.xml

One change that I think we should make right away is to change the title field from the text_ws type to the text type. The text_ws type is case-sensitive while the text type is not. (Remember to restart the Solr service after schema.xml is edited, by the way, before new data are indexed.) Also, another good change to make is to remove the comments around the dynamic field entries. This way, if you forget a field definition, you can simply add it as a dynamic field and re-index, rather than making an edit to schema.xml and restarting the Solr service.

Once schema.xml is ready and the Solr service restarted, we can start indexing our data. One thing not mentioned in that blog post is that if a field is defined as multiValued in schema.xml, we can make multiple calls to the addField method for that field if we want to store more than one value. Suppose the articles mentioned in the post can each belong to one or more categories. We can then do the following (this is just an example; I don’t ordinarily recommend running queries inside of query loops if it can be helped):

<cfquery name="get_article_categories" datasource="#the_datasource#">
    SELECT category_id
      FROM article_x_category
     WHERE article_id = <cfqueryparam cfsqltype="CF_SQL_INTEGER" value="#id#" />
</cfquery>

<cfloop query="get_article_categories">
    <cfset temp.addField("category", category_id) />
</cfloop>

Querying Solr with CFHTTP: Escaping the confines of CFSEARCH

Now that our data are indexed, we need to be able to search it. As we saw in a recent post, the <cfsearch> tag doesn’t simply pass our queries on to the Solr engine, and in any case it won’t return all of our custom fields. Instead, we will query the Solr web service directly:

http://<server>:8983/solr/<collection_name>/select?q=<criteria>&fl=*,score&wt=json

In the above query, <server> is the server on which the Solr collection is located (if we’re running CF and Solr on the same box then we can just use localhost). The Solr web service runs on port 8983 by default (changing this depends on the JVM on which Solr is running – in the default ColdFusion set up, Solr is running on Jetty and the port can be changed by editing the jetty.xml file found in <CF_HOME>/solr/etc/ – see this recent post for more information on how to do this).

All of our criteria that we want to use for scoring the results go in the “q” parameter. If we want to filter the results without affecting their score, we use the “fq” (filter query) parameter. The “fl” parameter tells Solr which fields to return (in this case, we want them all plus the score pseudofield), while the “wt” parameter determines the format in which the data are returned (if it is omitted or is a value other than json, the data will be returned as XML).

A detailed, if obscure, discussion of Solr’s query syntax can be found here (the Dismax parser is beyond the scope of this tutorial). The same syntax is used for both the “q” parameter and the “fq” parameter. Please note that certain operators and search strings need to be url-encoded. For example, the “+” sign must be url-encoded (the relevant code is %2B), otherwise it will be converted into a space. The ColdFusion function UrlEncodedFormat can be helpful here to encode query parameters.

Other helpful parameters that we can use for our Solr query include the start, rows, and sort parameters. This allows us to paginate the results, which (under ColdFusion 9 – ColdFusion 10 is another matter*) the <cfsearch> tag does not allow us to do.

In order to query the Solr service from ColdFusion, we make a CFHTTP call:

<cfhttp url="http://localhost:8983/solr/#the_collection#/select?q=#urlEncodedFormat(the_criteria)#&fl=*,score&wt=json" />

Things are pretty easy once that’s accomplished:

<cfset the_results = {} />
<cfif cfhttp.statusCode EQ "200 OK">
    <!--- We would use ColdFusion's XML functions below (e.g., isXML(), XmlParse()) if we're returning XML data rather than JSON --->
    <cfif isJSON(cfhttp.fileContent)>
        <cfset the_results = deserializeJSON(cfhttp.fileContent) />
    </cfif>
</cfif>

<!--- Now we can turn our results structure into a query if we want, or just use it as-is. --->

See here for more information on how to handle the results of Solr queries in ColdFusion.

*An orderby attribute was added to <cfsearch> to allow sorting of results. Without this attribute we would not be able to both sort and paginate unless we returned all of the results and applied the sort.

One Response to “ColdFusion Solr Tutorial”

  1. […] ColdFusion Solr Tutorial […]

Leave a Comment

Your email address will not be published. Required fields are marked *