Wednesday, November 4, 2009

Simple command line grep usage to understand complex regular expressions

I was looking for a simple method to understand regular expressions. In this particular case it was to analyze an ignore rule used by logcheck log monitoring tool. I knew that the Linux environment offered some powerful tools to use regular expressions, but I was looking for a simple pass in this string and show me what matches via my regular expression.

After quite a bit of time looking at man pages for grep, I initially didn't find the simple solution that I was hoping to...

AMD-ubuntu /USR/SBIN/CRON[9999]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)

And it was ignored by the entry in /etc/logcheck/ignore.d.paranoid/cron

^\w{3} [ :0-9]{11} [._[:alnum:]-]+ /USR/CRON\[[0-9]+\]: \([_[:alnum:]-]+\) CMD \(.*\)$

The log entry I wanted to ignore was:

Nov 2 19:17:01 AMD-ubuntu CRON[6877]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)

The regular expression that I ended up using to ignore my routine CRON job log entries was:

^\w{3} [ :0-9]{11} [._[:alnum:]-]+ CRON\[[0-9]+\]: \([_[:alnum:]-]+\) CMD \(.*\)$

I wanted a way to determine what part of the regular expression was matching what, in an interactive way. I ultimately ended up playing with the egrep version of the grep command. The basic syntax was:

echo 'string' | egrep --color 'regular expression'

I struggled figuring out that I needed to use the echo command to pipe the string into the egrep command. The --color output control was helpful in coloring the part of the string that matched the regular expression. This was important to me, because I wanted to understand what type of log entries would be ignored this regular expression.

If the output matched the regular expression the string was output to the screen with whatever matched being colored. This allowed me to play with the regular expression to understand the matching characteristics of the expression.

echo 'Nov 2 19:17:01 AMD-ubuntu CRON[6877]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)' | egrep --color '^\w{3} [ :0-9]{11} [._[:alnum:]-]+ CRON\[[0-9]+\]: \([_[:alnum:]-]+\) CMD \(.*\)$'

Nov 2 19:17:01 AMD-ubuntu CRON[6877]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)

Quickly adjusting parts of the regular expression an observing the color changes in the output allowed me to understand every aspect of a fairly complex regular expression.

My bug to the Ubuntu team got posted here:
Launchpad logcheck 9.10 CRON reporting bug