Search Smith

ColdFusion, SQL queries, and, of course, searching

Posts Tagged ‘pagination’

Oracle 10g Pagination

Posted by David Faber on February 28, 2012

A colleague of mine alerted me to the following example of pagination in Oracle 10g:

select * from (
    select /*+ FIRST_ROWS(n) */
          a.*
        , ROWNUM rnum
        , count(*) over() as total_rows
    from ( 
        /* your select statement goes here */ 
    ) a
) b 
where b.rnum >= :start_row and b.rnum <= :end_row

This does have the advantage of working with an arbitrary SQL SELECT statement, but there is a somewhat simpler way of going about it if you know in advance what the SELECT statement will be:

SELECT * FROM (
    SELECT your_columns, ROW_NUMBER() OVER (ORDER BY your_sort) AS rnum
         , COUNT(*) OVER() AS total_rows
      FROM your_table
     ORDER BY your_sort
) WHERE rnum >= :start_row AND rnum <= :end_row

The key is that the value of your_sort must be the same in both places.

Posted in Oracle, SQL | Tagged: , , , , , | 1 Comment »

Solr Query Parameters

Posted by David Faber on January 10, 2012

A list of common Solr query parameters can be found here: http://wiki.apache.org/solr/CommonQueryParameters. I mentioned in a previous post that pagination is “built-in” to Solr. One of the more frustrating things about the <CFSEARCH> tag is that there is no provision to start on a particular row and return a set number of rows if you want to use a sort other than the default. In a previous example we wanted to index a collection of articles. It’s reasonable that a user might want to sort on publication date rather than relevance, but <CFSEARCH> doesn’t give us a sorting option along with the startrow and maxrows attributes. If we limit the search using startrow and maxrows, and then sort after the fact, our results will not be correct.

Solr, however, does have the parameters we need:

  • sort – We can sort on any field defined in schema.xml as long as it is (a) “indexed” (able to be searched) and (b) NOT “multiValue”. The field in question does not have to be “stored”.
  • start – This is actually an offset rather than a start row as it would be used in ColdFusion. So the value of the “start” parameter is zero (0) by default (while in ColdFusion’s <CFSEARCH>, the “startrow” attribute is 1 by default). Just FYI in case you’re seeing some off by one errors.
  • rows – This is the maximum number of rows to be returned by the query.

The combination of these three parameters allows us to paginate results without returning everything, sorting, and then paginating. Furthermore, the ability to customize the schema of the Solr collection allows us to create a search index with all of the data we need already included, alleviating the need to query our database based on the search results.

Update: I was mistaken about the fields on which the sort parameter can be used. In order for it to be used effectively, the field must be (a) indexed, (b) not multiValue, and (c) a single term. This means that fields that are tokenized can’t be sorted upon, unless they use an analyzer that produces a single term. In particular, you can’t sort on text and text_* fields. So if you want to sort on title, for example, you must either define title as something other than text (or text_ws), such as string, or create an additional field with the same contents:

<field name="title_s" type="string" indexed="true" stored="true" required="true" />

Common query parameters – sort

Sorting can be done on the “score” of the document, or on any multiValued="false" indexed="true" field provided that field is either non-tokenized (ie: has no Analyzer) or uses an Analyzer that only produces a single Term (ie: uses the KeywordTokenizer)

Hope this helps.

Posted in ColdFusion, Solr | Tagged: , , , , | 1 Comment »