Search Smith

ColdFusion, SQL queries, and, of course, searching

Posts Tagged ‘Verity’

Moving from Oracle Text to Solr

Posted by David Faber on January 11, 2012

I’ve not had the opportunity to use Oracle’s full text search, but I thought I would offer this link for consideration:

Moving from Oracle Text to Solr/Lucene

I did have the opportunity some years ago to utilize SQL Server’s full-text searching ability (this was first available in SQL Server 7 if memory serves). At the time we were using ColdFusion with Verity as our search engine. I did not see a good way to get around the custom field limitations in Verity, so I searched for my criteria using <CFSEARCH>, then used those results to query a SQL Server database (hopefully using the IN clause instead of querying row-by-row, but it was several years ago and I could not say for certain). Things got much better when I switched to SQL Server’s full-text search as I was then able to do the keyword search and retrieve related data all in one query.

Update: The following article might also be interesting:

Text Search, your Database or Solr

One possible issue with Oracle’s full text search is that there is no provision for faceted search. That might, of course, not be an issue if one can integrate regular queries with full-text queries. This was certainly possible with SQL Server, but I don’t know if Oracle offers this capability.

Posted in Oracle, Solr, SQL | Tagged: , , , , | Leave a Comment »

ColdFusion 9: Searching more than one Solr collection

Posted by David Faber on January 3, 2012

The <CFSEARCH> tag allows one to search multiple collections very easily, whether one is searching Verity collections or Solr:

<cfsearch name="the_search" collection="collection1,collection2,collection3,..." criteria="#the_criteria#" />

However, how does one do this when using Solr web services? We saw in a previous post that the HTTP call to the Solr web service looks like this:

http://localhost:8983/solr/core0/select?q=cancer&fl=*,score&wt=json

We call this our “plain-vanilla” search. What if we want to search more than one “core” (Solr collection)? It doesn’t appear that we can put more than one Solr core name into the path of that HTTP call, and the last thing we want is to make multiple calls to the Solr web service. I don’t think I’ve written on the topic of paginating results with Solr, but briefly, Solr allows you to specify the start row and end row for search results by passing parameters to the web service. If multiple calls to the web service are required, then that feature is out the window if the results from multiple calls are mixed together because of sorting. (For example, if you have multiple collections partitioned by date, then sorting by score means that results from different collections will be mixed in together.)

As it turns out, we can search multiple cores by making a single call to the Solr web service. We use the shards parameter. (In fact, we can use shards to search over multiple servers!)

http://localhost:8983/solr/collection1/select?shards=localhost:8983/solr/collection1,localhost:8983/solr/collection2&q=cancer&wt=json

I recommend using a core from the shards parameter in the HTTP call itself. Truth be told, I am not certain which Solr cores can be used here!

Posted in ColdFusion, Solr | Tagged: , , , , , , , | 1 Comment »

Verity to Solr Addendum

Posted by David Faber on December 22, 2011

One day into this blog and I’m already lying about upcoming posts! Seriously, I noticed reading over my first post that I had unintentionally left a couple of things out.

The first is a postscript to the blog post I cited by ColdFusion Muse. This blog post dealt with some issues caused by using Solr with the default (as installed under ColdFusion) options for the JVM. One of these options is

-XX:+AggressiveOpts

Oracle, the maintainer of Java since its takeover of Sun, has the following to say on this subject (kudos to a colleague for turning this up):

On the same day, Oracle released Java 6u29 fixing the same problems occurring with Java 6, if the JVM switches -XX:+AggressiveOpts or -XX:+OptimizeStringConcat were used. Of course, you should not use experimental JVM options like -XX:+AggressiveOpts in production environments! We recommend everybody to upgrade to this latest version 6u29.

Now I can’t speak for anyone else, but I do find it strange that Solr would be configured by default (at least in its ColdFusion edition) to use an experimental JVM option. It’s not as if this is a new option, or that it was originally used for some other purpose. Sun mentioned that it was an experimental option in a 2005 white paper on Java Tuning:

-XX:+AggressiveOpts
Turns on point performance optimizations that are expected to be on by default in upcoming releases. The changes grouped by this flag are minor changes to JVM runtime compiled code and not distinct performance features (such as BiasedLocking and ParallelOldGC). This is a good flag to try the JVM engineering team’s latest performance tweaks for upcoming releases. Note: this option is experimental! The specific optimizations enabled by this option can change from release to release and even build to build. You should reevaluate the effects of this option with prior to deploying a new release of Java.

The other thing that I omitted was a short discussion on how to search a Solr collection on a custom field, or how to filter a search. Let’s consider the Solr query from the previous post:

http://localhost:8983/solr/core0/select?q=cancer&fl=*,score&wt=json

This is just a plain-vanilla keyword search — there are no custom fields involved at all (well, there could be — but more on that later unless I am lying about future posts again). If we want to search on a custom field, we can do the following:

http://localhost:8983/solr/core0/select?q=custom1:cancer&fl=*,score&wt=json

If we want to filter (filtering will search the field, but it won’t affect the results’ scores) the search on a custom field, we can do the following:

http://localhost:8983/solr/core0/select?q=cancer&fl=*,score&wt=json&fq=custom1:cancer

The “fq” parameter is used for filter queries and I believe the same syntax is used for this parameter as is used for the “q” parameter.

What if you want to search or filter on TWO (or more) custom fields? That is easy enough.

http://localhost:8983/solr/core0/select?q=cancer&fl=*,score&wt=json&fq=%2Bcustom1:cancer%20%2Bcustom2:mesothelioma

The “%2B” codes are url-encoded plus (+) signs, used because the “+” sign is used to url-encode spaces, while “%20” is the code for a url-encoded space. I suppose one might use a + sign here instead, but that might get confusing. (As an aside, I am old enough that some of the oldest code I used was written out on a blackboard for me, and we used an uppercase delta character Δ to denote a space as spaces are not always obvious when written.) I believe (but I am not entirely certain without checking) that the + signs are needed in front of both terms.

Posted in ColdFusion, Oracle, Solr | Tagged: , , , , , , | Leave a Comment »