Bash Regular Expressions

May 26th, 2008 by Mitch Frazier in

Your rating: None Average: 5 (5 votes)

When working with regular expressions in a shell script the norm is to use grep or sed or some other external command/program. Since version 3 of bash (released in 2004) there is another option: bash's built-in regular expression comparison operator "=~".

Bash's regular expression comparison operator takes a string on the left and an extended regular expression on the right. It returns 0 (success) if the regular expression matches the string, otherwise it returns 1 (failure).

In addition to doing simple matching, bash regular expressions support sub-patterns surrounded by parenthesis for capturing parts of the match. The matches are assigned to an array variable BASH_REMATCH. The entire match is assigned to BASH_REMATCH[0], the first sub-pattern is assigned to BASH_REMATCH[1], etc..

The following example script takes a regular expression as its first argument and one or more strings to match against. It then cycles through the strings and outputs the results of the match process:

#!/bin.bash

if [[ $# -lt 2 ]]; then
    echo "Usage: $0 PATTERN STRINGS..."
    exit 1
fi
regex=$1
shift
echo "regex: $regex"
echo

while [[ $1 ]]
do
    if [[ $1 =~ $regex ]]; then
        echo "$1 matches"
        i=1
        n=${#BASH_REMATCH[*]}
        while [[ $i -lt $n ]]
        do
            echo "  capture[$i]: ${BASH_REMATCH[$i]}"
            let i++
        done
    else
        echo "$1 does not match"
    fi
    shift
done

Assuming the script is saved in "bashre.sh", the following sample shows its output:

  # sh bashre.sh 'aa(b{2,3}[xyz])cc' aabbxcc aabbcc
  regex: aa(b{2,3}[xyz])cc

  aabbxcc matches
    capture[1]: bbx
  aabbcc does not match

__________________________

Mitch Frazier is an Associate Editor at Linux Journal.


Special Magazine Offer -- 2 Free Trial Issues!
Receive 2 free trial issues of Linux Journal as well as instant online access to current and past issues. There's NO RISK and NO OBLIGATION to buy. CLICK HERE for offer

Linux Journal: delivering readers the advice and inspiration they need to get the most out of their Linux systems since 1994.

Sorry, offer available in the US only. International orders, click here.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Albert Bicchi's picture

add color and better indentation to the output

On July 7th, 2008 Albert Bicchi (not verified) says:

#!/bin/sh

if [[ $# -lt 2 ]]; then
    echo "Usage: regex PATTERN STRINGS..."
    exit 1
fi
regex=$1
shift
echo "regex: $regex"
echo

while [[ $1 ]]
do
    if [[ $1 =~ $regex ]]; then
        echo -e "\t\E[42;37m${1} - matches\E[33;0m"
        i=1
        n=${#BASH_REMATCH[*]}
        while [[ $i -lt $n ]]
        do
            echo -e "\t\t\E[43;37mcapture[$i]: ${BASH_REMATCH[$i]}\E[33;0m"
            let i++
        done
    else
        echo -e "\t\E[41;37m${1} - does not match\E[33;0m"
    fi
    shift
done

Anonymous's picture

Is "(( $# < 2 ))" an

On June 25th, 2008 Anonymous (not verified) says:

Is "(( $# < 2 ))" an alternative conditional expression for the line "[[ $# -lt 2 ]]"?

Could you discuss BASH expressions with [[]] (()) and their valid operators. It seems the -lt, -gt, -a,
etc, can be replaced with <, >, &&, etc, if used with (()) --- replacing [] with (()) (numeric) and [[]] (strings).

Thank you.

PS: The captcha is really hard to read. It would be nice it there was an option to generate a new one that could possible be read by a mere human.

Anonymous's picture

Is "(( $# < 2 ))" an

On June 25th, 2008 Anonymous (not verified) says:

Is "(( $# < 2 ))" an alternative conditional expression for the line "[[ $# -lt 2 ]]"?

Could you discuss BASH expressions with [[]] (()) and their valid operators. It seems the -lt, -gt, -a,
etc, can be replaced with <, >, &&, etc, if used with (()) --- replacing [] with (()) (numeric) and [[]] (strings).

Thank you.

Image Hosting's picture

Wow, this may be a bit much....

On June 7th, 2008 Image Hosting (not verified) says:

I'm in "learning mode" and I just came across this blog, which is great, but it appears that I have quite a bit to learn. Why didn't I start this several years ago!?!

Image Hosting

Robert de Bock's picture

That simple?!

On May 28th, 2008 Robert de Bock (not verified) says:

My god, Bash sure is a great tool! Thanks for the information.

Post new comment

Please note that comments may not appear immediately, so there is no need to repost your comment.
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre> <ul> <ol> <li> <dl> <dt> <dd> <i> <b>
  • Lines and paragraphs break automatically.

More information about formatting options

Featured Videos

The November 13, 2008 edition of Linux Journal Live! Shawn Powers and special guest, Linux Journal Author Daniel Bartholomew, talk e-book readers and Daniel's Kindle, DRM, and other goodness.

From the Magazine

December 2008, #176

The Oxford English Dictionary says the word "gadget" is a placeholder name for a technical item whose precise name one can't remember. Like that book-reader thingy from Amazon...what's it called? Spindle, Gindle...Kindle, that's it. Check it out in this month's gadget issue.

Other gadgets covered include the Nokia tablets, the BlackBerry, the Neo FreeRunner, the Dash Express, the Roku Netflix Player, the Kangaroo TV, The TomTom GO 930 and the MooBella Ice Cream System. On the larger hardware front, read the reviews of the Acer Aspire One and the YDL PowerStation. On the software front, check out the articles and columns on memcached, Samba security, Mutt, desktop gadgets, bash and Puppet. To wrap it all up, read Doc's thoughts on Google and the browser platform.

Read this issue

Sign up for our Email Newsletter