Say Goodbye to LIKE, and Hello to Oracle Text

Share Button

Don Burleson recently posted a great article on using Oracle Text indexes to make up for the horrible performance issues caused by ‘%LIKE%’ queries on large blocks of text.

I have personally been enjoying keyword searching CONTEXT indexes for quite some time through its various incarnations as the Context Option, Intermedia Option, and now Oracle Text.

So let’s expand a little bit on Don’s article and talk about some more features; namely, the XML searching capabilities of Oracle Text.

Making It Happen

First, let’s create a table to hold some car data, and insert three rows.


SQL> create table car (id number, make varchar2(20), model varchar2(30), color varchar2(10), description varchar2(255));

Table created.

SQL> insert into car values (1, 'Honda', 'Civic', 'Blue', 'A fun car to drive');

1 row created.

SQL> insert into car values (2, 'Jeep', 'Wrangler', 'Green', 'Better than a Honda');

1 row created.

SQL> insert into car values (3, 'Chevrolet', 'Camaro', 'Black', 'Good when you''re feeling blue');

1 row created.

SQL> commit;

Commit complete.

Note that Car 1 has the color “blue,” but Car 3 has “blue” in the description. Also note that Car 1 has a MAKE of “Honda” but Car 2 has “Honda” in the description.

Now let’s create a table from this data to hold XML for each row:


SQL> create table car_xml as select id, dbms_xmlgen.getxml('select * from car where id = ' || id) xml from car;

Table created.

SQL> desc car_xml
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                                 NUMBER
 XML                                                CLOB

SQL> select * from car_xml where rownum < 2;

        ID
----------
XML
--------------------------------------------------------------------------------
         1
<?xml version="1.0"?>
<ROWSET>
 <ROW>
  <ID>1</ID>
  <MAKE>Honda</MAKE>
  <MODEL>Civic</MODEL>
  <COLOR>Blue</COLOR>
  <DESCRIPTION>A fun car to drive</DESCRIPTION>
 </ROW>
</ROWSET>

As you can see, Oracle has assigned column tags for each column of the row.

Lastly, we create the Oracle Text index:


SQL> create index car_xml_text_idx on car_xml(xml) indextype is ctxsys.context parameters ('section group ctxsys.auto_section_group');

Index created.

Note that we have added a parameter to our context index. This parameter is SECTION GROUP, and we have assigned the AUTO_SECTION_GROUP. This section group will take our data from the column and actually parse it internally based on standard HTML style tags.

Querying XML


SQL> select id from car_xml where contains(xml, 'blue') > 0;

        ID
----------
         1
         3

Notice that when we searched for the word "blue," Oracle returned rows 1 and 3. Row 1 has the color blue, while row 3 mentions blue in the description. How can we keep searching all columns when we want to, but narrow our search as needed? Search WITHIN.


SQL> select id from car_xml where contains(xml, 'blue within color') > 0;

        ID
----------
         1

Notice that now when we use the CONTAINS clause, we specify that "blue" must appear "within color." This instructs Oracle to ONLY return rows where the word "blue" can be found between begin and end COLOR tags.

To illustrate this example once more, we'll try "honda."


SQL> select id from car_xml where contains(xml, 'honda') > 0;

        ID
----------
         1
         2

SQL> select id from car_xml where contains(xml, 'honda within make') > 0;

        ID
----------
         1

Once again, we have narrowed our search with a WITHIN cause. Sometimes, it is very handy to be able to search an entire table's contents for a keyword, while other times drilling down is much more appropriate.

You may have also noticed that the Oracle Text index is case insenstive. Though I searched for 'honda' in lowercase, it picked up 'Honda' in uppercase in both columns.

Better still, there are many more features included with Oracle Text such as synonym searching, preferred terms, fuzzy searches, highlighting, and more, but we will save those for another post.

Share Button

6 comments

  1. donichayen@yahoo.co.in is my second id.

    i am stuck with a requirement to check a database table and retrieve the primary keys for rows containing the xml files with specific nodes inside.

    For example , we will have to retrieve files with value Honda inside node C with parent node B and its parent A. we will have to check for another value for a sibling node to C also, but the problem with simple SQL (using existsnode ) was that it retrieved wrong results because there was duplicate parent nodes. for example Honda may be there in one c, but the other value will be present under another parent node and child node with same name. eg- inside a list.
    A
    -B
    -C hONDA
    -D DIESEL
    -B
    -C BENZ
    -D PETROL

    IT WILL retyrieve true for query for honda petrol.
    Are ypu sure we can check for sibling nodes with oracle text?

    kindly contact me in my email please!

  2. Hello,

    I am stuck while using CONTAIN in my query.
    My problem is when i created a index on the text column it was working fine with contains clause, but as i inserted new rows to the table, it just didn’t fetched that newly inserted rows. while the condition in the contains clause was satisfying all those records.

    But when i dropped the index and recreated it, it then started working fine.
    ie it was then returning the new records also.

    Do i need to recreate the index every time.

    Please Help.
    Regards
    Neeraj Bansal

  3. Just check your syntax of index because if you have not created index like below then you will be not getting new inserted or updated records after search.

    Create index … on … INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS (‘SYNC ( ON COMMIT)’)
    Create index … on … INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ‘SYNC (EVERY “SYSDAT

  4. CREATED THE INDEX:
    ==================
    create index HAS_OP_QUERY_IDX on SF_OP_INFO (XML_CONFIG) indextype is ctxsys.context parameters (‘section group ctxsys.auto_section_group’);

    RAN SEARCH:
    ============
    SELECT * FROM SF_OP_INFO WHERE CONTAINS (xml_config, ‘PROJECT_WORKSHEET_SYN within sql’) > 0;

    RESULTS:
    ========
    Does not show any records. When run with only a partial word (i.e. ‘worksheet’) then it works. as soon as I add the underscores, it give 0 records.

    Any thoughts?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.