Search Smith

ColdFusion, SQL queries, and, of course, searching

ColdFusion 9: Upgrading from Verity to Solr

Posted by David Faber on December 22, 2011

I started working with ColdFusion around 1998-9. Since that time my primary interests have been (1) database queries and how they can be optimized, and (2) keyword searches. I’ll consider the latter in this post.

When CF9 was first released I was skeptical of making the upgrade (and it is an upgrade, as I can see in hindsight) from Verity to Solr. While Adobe had implied, at the time, that support for Verity might be dropped in the future, it was not entirely clear that anything would be gained from the change. The tags used (<CFINDEX> and <CFSEARCH>) were the same, while the syntax was slightly different (to search the title field, one used “title:” instead of “<CF_TITLE>” in the search criteria). However, there were supposed to be significant speed gains, so I pushed for my company to make the switch.

Two problems presented themselves immediately. One was that the Solr service ran out of memory very quickly. This was resolved thanks to a very helpful post on ColdFusion Muse. The other problem was that searches on the title field were now case-sensitive where they had not been before (as an aside, Verity searches — at least those using <CFSEARCH> — are case-insensitive unless the search criteria are mixed-case. For example, searching for “cancer” or “CANCER” will return the same results, but searching for “cANcEr” will not). Since I was under a deadline, I did not take the time to look into the underlying issue — instead, I used the CUSTOM1 field to store the title, while putting everything I had been jamming into the CUSTOM1 field into the TITLE field! These were fields that I wanted returned in a search, so as to avoid querying the database again, but which I did not expect to use to filter the search. So whether or not a search of that field was case-insensitive was irrelevant for my purpose — I was using it to store data, not to search. My search criteria went from this: “logic AND <CF_TITLE>fuzzy” to this: “logic +title:fuzzy”.

That solution was klunky, but it worked fine. The search index was not flexible, but it worked — until I needed the flexibility.

The <CFINDEX> tag is very limited. There aren’t many attributes available for custom fields (either for searching or merely for storing) — just CUSTOM1, CUSTOM2, CUSTOM3, and CUSTOM4. (Of course this is a big upgrade over versions of CF in which only two custom fields were available.) I wanted to return more than just four custom fields, so I jammed them into the CUSTOMx fields as delimited lists. This of course makes it impossible (or, at least, difficult) to filter on those custom fields; they store data and nothing more. I wish I could remember where I read this (I certainly didn’t discover it on my own — credit where credit is due), but the key to unlocking some of the power of Solr is to edit the collection’s schema, then break out of the <CFINDEX>/<CFSEARCH> paradigm and use Solr as a web service.

To configure the collection’s schema, go to the collection’s home directory (on Windows, this will likely be something like C:\ColdFusion9\collections\<collection-name>), then to the conf directory, and open schema.xml. There is a bunch of stuff at the top that is beyond the scope of this blog post. However, if you scroll down to line 440 or thereabouts, you see the fields defined for this search collection (these are the default fields that ColdFusion creates when a collection is created):

On lines 444 and 445 we see some fields that are probably familiar, then again on ll. 478-481 and 483. The first thing that sticks out (to me, at least) is that the “title” field is not does not have type “text” but something else — “text_ws”. This is a special type defined in the schema file above that tokenizes field only on whitespace but is also case-sensitive. For our purposes here we can just replace this with “text”:

N.B.: When you make any changes to the schema.xml file, you need to restart the Solr service before you can re-index the collection. I think, but I have not tested it yet, that you can make this change and still use <CFINDEX> and <CFSEARCH> as you did before, only now with the ability to make case-insensitive searches on the title. (If you’re wondering why not simply store the title in all-lowercase or all-uppercase, it’s because I want the title stored exactly, and I do not want to “waste” one of my custom fields on it.)

A second question comes to mind. Can you simply add new custom fields here, in addition to custom1 .. custom4? The answer is yes. The problem then becomes one of indexing and searching. How do you index the new custom fields? and: How do you filter searches on them?

Searching on them is the easier part. You can call a Solr collection as a web service and it will return data in XML or JSON format:

Here “core0” is the name of the Solr collection (this is the default collection that ships with CF 9), the “q” parameter is where we submit our criteria, the “fl” parameter shows which fields we want returned (in this case, we want them all as well as the pseudo-field “score”), and the “wt” field tells the web server the format in which the data ought to be returned. A value of “json” will return JSON, while a value of anything else will return XML. (There are other formats available, but, again, this is beyond the scope of this post.)

Now comes the hard part — actually indexing your data. There are a couple of tools available on the internet for this, but personally I found them unacceptable for one reason or another. But I also think that it’s important to know what is going on under the hood, so to speak, in the event you need something for your search that isn’t in one of those tools. I’ll go over that in my next post.

2 Responses to “ColdFusion 9: Upgrading from Verity to Solr”

  1. […] The other thing that I omitted was a short discussion on how to search a Solr collection on a custom field, or how to filter a search. Let’s consider the Solr query from the previous post: […]

  2. […] how does one do this when using Solr web services? We saw in a previous post that the HTTP call to the Solr web service looks like […]

Leave a Comment

Your email address will not be published. Required fields are marked *