Search Smith

ColdFusion, SQL queries, and, of course, searching

Posts Tagged ‘search’

ColdFusion 9: Searching more than one Solr collection

Posted by David Faber on January 3, 2012

The <CFSEARCH> tag allows one to search multiple collections very easily, whether one is searching Verity collections or Solr:

<cfsearch name="the_search" collection="collection1,collection2,collection3,..." criteria="#the_criteria#" />

However, how does one do this when using Solr web services? We saw in a previous post that the HTTP call to the Solr web service looks like this:

http://localhost:8983/solr/core0/select?q=cancer&fl=*,score&wt=json

We call this our “plain-vanilla” search. What if we want to search more than one “core” (Solr collection)? It doesn’t appear that we can put more than one Solr core name into the path of that HTTP call, and the last thing we want is to make multiple calls to the Solr web service. I don’t think I’ve written on the topic of paginating results with Solr, but briefly, Solr allows you to specify the start row and end row for search results by passing parameters to the web service. If multiple calls to the web service are required, then that feature is out the window if the results from multiple calls are mixed together because of sorting. (For example, if you have multiple collections partitioned by date, then sorting by score means that results from different collections will be mixed in together.)

As it turns out, we can search multiple cores by making a single call to the Solr web service. We use the shards parameter. (In fact, we can use shards to search over multiple servers!)

http://localhost:8983/solr/collection1/select?shards=localhost:8983/solr/collection1,localhost:8983/solr/collection2&q=cancer&wt=json

I recommend using a core from the shards parameter in the HTTP call itself. Truth be told, I am not certain which Solr cores can be used here!

Posted in ColdFusion, Solr | Tagged: , , , , , , , | 1 Comment »

ColdFusion 9: Indexing custom fields in Solr

Posted by David Faber on December 23, 2011

Our first step is to create an object in ColdFusion that we can use to communicate with the Solr server:

<cfset the_server = createObject("java", "org.apache.solr.client.solrj.impl.CommonsHttpSolrServer").init("http://localhost:8983/solr/arts_solr") />

One very nice feature of Solr is that you can use its query syntax to delete records. For example, if you wanted to delete all records with the word “cancer” in the title, you would do the following:

<cfset the_server.deleteByQuery( "title:cancer" ) />

However, in this case we want to delete everything, so we’ll use wildcards:

<cfset the_server.deleteByQuery( "*:*" ) /> <!--- Delete everything --->

Now that the collection has been completely purged, we can add some records. Let’s grab some data from a query:

<cfquery name="get_all_articles" datasource="#the_datasource#">
    SELECT id, title, description, pubdate, journal_name, author_name, num_reads
      FROM articles
</cfquery>

We’re going to index all of the articles we currently have in our database. Here we’ll create an array to store the results of the query.

<cfset the_articles = arrayNew(1) />

Let’s put the results of the query into the array:

<cfloop query="get_all_articles">
    <!--- Strip out HTML from the description --->
    <cfset the_summary = REReplace(description, "<[^>]+>", "", "all") />
    <cfset temp_article = createObject("java", "org.apache.solr.common.SolrInputDocument") />
    <cfset temp_article.addField("uid", id) />
    <cfset temp_article.addField("key", id) />
    <cfset temp_article.addField("size", len(description)) />
    <cfset temp_article.addField("summary", the_summary) />
    <cfset temp_article.addField("title", title) />
    <cfset temp_article.addField("description", description) />
    <cfset temp_article.addField("contents", description & " " & title) />
    <cfset temp_article.addField("pubdate", pubdate) />
    <cfset temp_article.addField("journal_name", journal_name) />
    <cfset temp_article.addField("author_name", author_name) />
    <cfset temp_article.addField("num_reads", num_reads) />
    <cfset temp_article.addField("modified", now()) />
    <cfset arrayAppend(the_articles, temp_article) />
</cfloop>

I am assuming that all of the above fields will already have been defined in the collection’s schema.xml file. The rest is easy:

<cfset the_server.add(the_articles) /> <!--- Add the articles to the index --->
<cfset the_server.commit() /> <!--- Commit the changes --->
<cfset the_server.optimize() /> <!--- Optimize the index --->

And that is really all there is to it, at least for indexes where you don’t expect to have hundreds of thousands of records. If you have many records, you would want to segment the indexing OR partition the collection horizontally (so, for example, you could have one collection for articles from 2011, another for articles from 2010, etc.). Searching on more than one collection at a time is not much more difficult than searching on a single collection, but it is fodder for a future post.

Posted in ColdFusion, Solr, SQL | Tagged: , , , , , , | 6 Comments »