Friday, July 6, 2012

Sample "Where" Clauses for Finding Dates in Text Fields

The following where clauses use SAS PRXMatch functions and simple Perl regular expressions to help find dates within SAS dataset text fields.

Identify records that contain number/number, e.g. 1/2, 12/25, etc.:
where prxmatch('*\d\/\d*', fieldname) >= 1;

Identify records that contain number-number, e.g. 1-2, 12-25, etc.:
where prxmatch('*\d-\d*', fieldname) >= 1;

Identify records that contain four-digit numbers, e.g. 2007:
where prxmatch('*\d\d\d\d*', fieldname) >= 1;

Identify records that contain the word "July":
where prxmatch('*July*', fieldname) >= 1;

Or identify records that contain any of the above, any month name, or any month three-letter abbreviation:
where prxmatch('*\d\/\d*', fieldname) >= 1
OR prxmatch('*\d-\d*', fieldname) >= 1
OR prxmatch('*\d\d\d\d*', fieldname) >= 1
OR prxmatch('*January*', fieldname) >= 1
OR prxmatch('*Feburary*', fieldname) >= 1
OR prxmatch('*March*', fieldname) >= 1
OR prxmatch('*April*', fieldname) >= 1
OR prxmatch('*May*', fieldname) >= 1
OR prxmatch('*June*', fieldname) >= 1
OR prxmatch('*July*', fieldname) >= 1
OR prxmatch('*August*', fieldname) >= 1
OR prxmatch('*September*', fieldname) >= 1
OR prxmatch('*October*', fieldname) >= 1
OR prxmatch('*November*', fieldname) >= 1
OR prxmatch('*December*', fieldname) >= 1
OR prxmatch('*Jan*', fieldname) >= 1
OR prxmatch('*Feb*', fieldname) >= 1
OR prxmatch('*Mar*', fieldname) >= 1
OR prxmatch('*Apr*', fieldname) >= 1
OR prxmatch('*May*', fieldname) >= 1
OR prxmatch('*Jun*', fieldname) >= 1
OR prxmatch('*Jul*', fieldname) >= 1
OR prxmatch('*Aug*', fieldname) >= 1
OR prxmatch('*Sep*', fieldname) >= 1
OR prxmatch('*Oct*', fieldname) >= 1
OR prxmatch('*Nov*', fieldname) >= 1
OR prxmatch('*Dec*', fieldname) >= 1;


Or, a macro that will print any records that match the above:
%macro dsearch(tablename, fieldname);
    proc print data = &tablename;
        var &fieldname;
        where prxmatch('*\d\/\d*', &fieldname) >= 1
            OR prxmatch('*\d-\d*', &fieldname) >= 1
            OR prxmatch('*\d\d\d\d*', &fieldname) >= 1
            OR prxmatch('*January*', &fieldname) >= 1
            OR prxmatch('*Feburary*', &fieldname) >= 1
            OR prxmatch('*March*', &fieldname) >= 1
            OR prxmatch('*April*', &fieldname) >= 1
            OR prxmatch('*May*', &fieldname) >= 1
            OR prxmatch('*June*', &fieldname) >= 1
            OR prxmatch('*July*', &fieldname) >= 1
            OR prxmatch('*August*', &fieldname) >= 1
            OR prxmatch('*September*', &fieldname) >= 1
            OR prxmatch('*October*', &fieldname) >= 1
            OR prxmatch('*November*', &fieldname) >= 1
            OR prxmatch('*December*', &fieldname) >= 1
            OR prxmatch('*Jan*', &fieldname) >= 1
            OR prxmatch('*Feb*', &fieldname) >= 1
            OR prxmatch('*Mar*', &fieldname) >= 1
            OR prxmatch('*Apr*', &fieldname) >= 1
            OR prxmatch('*May*', &fieldname) >= 1
            OR prxmatch('*Jun*', &fieldname) >= 1
            OR prxmatch('*Jul*', &fieldname) >= 1
            OR prxmatch('*Aug*', &fieldname) >= 1
            OR prxmatch('*Sep*', &fieldname) >= 1
            OR prxmatch('*Oct*', &fieldname) >= 1
            OR prxmatch('*Nov*', &fieldname) >= 1
            OR prxmatch('*Dec*', &fieldname) >= 1
    ;
    run;
%mend dsearch;