Wednesday, July 25, 2007

Tokenizer in Oracle PLSQL

I created a tokenizer for Oracle PL/SQL using the Oracle 10g regexp functions:

To test this,
1. first compile the function below:

create or replace FUNCTION Tokenizer
      (p_string VARCHAR2
      ,p_separators in VARCHAR2)
RETURN dbms_sql.varchar2s IS
   l_token_tbl dbms_sql.varchar2s;
   pattern varchar2(250);
BEGIN

   pattern := '[^(' || p_separators || ')]+' ;

   select regexp_substr(p_string, pattern,1,level) token
      bulk collect into l_token_tbl
      from dual
    where regexp_substr(p_string, pattern,1,level) is not null
    connect by regexp_instr(p_string, pattern,1, level ) > 0;

   RETURN l_token_tbl;
END Tokenizer;
/

2. Then call it using,

Declare
   l_token_tbl dbms_sql.varchar2s; -- table of varchar2
begin
   l_token_tbl := Tokenizer(
                                 'anup.pani @somewhere.com ',
                                ' .@' );
   FOR i IN l_token_tbl.FIRST..l_token_tbl.LAST LOOP
      dbms_output.put_line(l_token_tbl(i));
   END LOOP;
end;
/

3. Result

anup
pani
somewhere
com

Hope you find this useful. Let me know your comments or email me in anuppani@gmail.com