Search Smith

ColdFusion, SQL queries, and, of course, searching

Oracle: SEO-Friendly URLs II

Posted by David Faber on March 15, 2013

I posted over a year ago about how one might use regular expressions in Oracle to return SEO-friendly URLs in queries. The query I came up with at the time looked something like this:

SELECT id, title, description, pubdate, journal_name, author_name, num_reads
     , REGEXP_REPLACE(REGEXP_REPLACE(LOWER(title), '[^-a-z0-9/ ]', ''), '[/ ]', '-') AS url_keyword
  FROM articles

Since then, I’ve been looking into regular expressions quite a bit. One difficulty with the expression above is that multiple characters will be replaced with multiple hyphens, plus other punctuation will be ignored — they will simply be replaced with nothing. Another, perhaps more obvious difficulty is its complexity! A better approach would be the following:

SELECT id, title, description, pubdate, journal_name, author_name, num_reads
     , REGEXP_REPLACE(LOWER(title), '[^a-z0-9]+', '-') AS url_keyword
  FROM articles

This simply replaces consecutive non-alphanumeric characters with a single hyphen. Since Oracle supports the POSIX character classes in regular expressions, the above could also be written:

SELECT id, title, description, pubdate, journal_name, author_name, num_reads
     , REGEXP_REPLACE(LOWER(title), '[^[:alnum:]]+', '-') AS url_keyword
  FROM articles

where [[:alnum:]] is the POSIX character class for alphanumeric characters and [^[:alnum:]] is its negation. A third option, if you know your data has no underscores or want them to be preserved for some reason (and not transformed into hyphens) would be the following:

SELECT id, title, description, pubdate, journal_name, author_name, num_reads
     , REGEXP_REPLACE(LOWER(title), '\W+', '-') AS url_keyword
  FROM articles

Leave a Comment

Your email address will not be published. Required fields are marked *