Search Smith

ColdFusion, SQL queries, and, of course, searching

Posts Tagged ‘JVM’

ColdFusion 9 and Solr: MultiValued fields vs. Tokenized

Posted by David Faber on January 17, 2012

I was indexing a new collection yesterday and kept getting out of memory errors from the JVM (in truth, I don’t know if they were from JRun or from whatever JVM Solr was running in). Apparently either CF didn’t like the huge array I was generating, or Solr didn’t like me trying to cram it into a multiValued field of type slong. I decided to try something different – I created a space-delimited list of terms instead (they were all numbers, so no need to worry about phrases or anything like that). That worked great.

According to this question/answer thread on StackOverflow, there should not be a difference in results if that field is used for filtering. There may be a difference in scoring, but as I was only using the field for filtering anyway, that is not a concern.

Posted in ColdFusion, Solr | Tagged: , , , | Leave a Comment »

Verity to Solr Addendum

Posted by David Faber on December 22, 2011

One day into this blog and I’m already lying about upcoming posts! Seriously, I noticed reading over my first post that I had unintentionally left a couple of things out.

The first is a postscript to the blog post I cited by ColdFusion Muse. This blog post dealt with some issues caused by using Solr with the default (as installed under ColdFusion) options for the JVM. One of these options is

-XX:+AggressiveOpts

Oracle, the maintainer of Java since its takeover of Sun, has the following to say on this subject (kudos to a colleague for turning this up):

On the same day, Oracle released Java 6u29 fixing the same problems occurring with Java 6, if the JVM switches -XX:+AggressiveOpts or -XX:+OptimizeStringConcat were used. Of course, you should not use experimental JVM options like -XX:+AggressiveOpts in production environments! We recommend everybody to upgrade to this latest version 6u29.

Now I can’t speak for anyone else, but I do find it strange that Solr would be configured by default (at least in its ColdFusion edition) to use an experimental JVM option. It’s not as if this is a new option, or that it was originally used for some other purpose. Sun mentioned that it was an experimental option in a 2005 white paper on Java Tuning:

-XX:+AggressiveOpts
Turns on point performance optimizations that are expected to be on by default in upcoming releases. The changes grouped by this flag are minor changes to JVM runtime compiled code and not distinct performance features (such as BiasedLocking and ParallelOldGC). This is a good flag to try the JVM engineering team’s latest performance tweaks for upcoming releases. Note: this option is experimental! The specific optimizations enabled by this option can change from release to release and even build to build. You should reevaluate the effects of this option with prior to deploying a new release of Java.

The other thing that I omitted was a short discussion on how to search a Solr collection on a custom field, or how to filter a search. Let’s consider the Solr query from the previous post:

http://localhost:8983/solr/core0/select?q=cancer&fl=*,score&wt=json

This is just a plain-vanilla keyword search — there are no custom fields involved at all (well, there could be — but more on that later unless I am lying about future posts again). If we want to search on a custom field, we can do the following:

http://localhost:8983/solr/core0/select?q=custom1:cancer&fl=*,score&wt=json

If we want to filter (filtering will search the field, but it won’t affect the results’ scores) the search on a custom field, we can do the following:

http://localhost:8983/solr/core0/select?q=cancer&fl=*,score&wt=json&fq=custom1:cancer

The “fq” parameter is used for filter queries and I believe the same syntax is used for this parameter as is used for the “q” parameter.

What if you want to search or filter on TWO (or more) custom fields? That is easy enough.

http://localhost:8983/solr/core0/select?q=cancer&fl=*,score&wt=json&fq=%2Bcustom1:cancer%20%2Bcustom2:mesothelioma

The “%2B” codes are url-encoded plus (+) signs, used because the “+” sign is used to url-encode spaces, while “%20” is the code for a url-encoded space. I suppose one might use a + sign here instead, but that might get confusing. (As an aside, I am old enough that some of the oldest code I used was written out on a blackboard for me, and we used an uppercase delta character Δ to denote a space as spaces are not always obvious when written.) I believe (but I am not entirely certain without checking) that the + signs are needed in front of both terms.

Posted in ColdFusion, Oracle, Solr | Tagged: , , , , , , | Leave a Comment »