RegExp.com
Wide range of in-depth information
                Themes: Default |White |Blue |Green |Orange
                Font Size: Default |Small |Medium |Large

POSIX Character Class Definitions

POSIX 1003.2 section 2.8.3.2 (6) defines a set of character classesthat denote certain common ranges. They tend to look very ugly but have the advantage that also take into account the 'locale', that is, any variant of the local language/coding system. Many utilities/languages provide short-hand ways of invoking these classes. Strictly the names used and hence their contents reference the LC_CTYPE POSIX definition (1003.2 section 2.5.2.1).

[:digit:] Only the digits 0 to 9

[:alnum:] Any alphanumeric character 0 to 9 OR A to Z or a to z.

[:alpha:] Any alpha character A to Z or a to z.

[:blank:] Space and TAB characters only.

[:xdigit:] Hexadecimal notation 0-9, A-F, a-f.

[:punct:] Punctuation symbols . , " ' ? ! ; : # $ % & ( ) * + - / < > = @ [ ] \ ^ _ { } | ~

[:print:] Any printable character.

[:space:] Any whitespace characters (space, tab, NL, FF, VT, CR). Many system abbreviate as \s.

[:graph:] Exclude whitespace (SPACE, TAB). Many system abbreviate as \W.

[:upper:] Any alpha character A to Z.

[:lower:] Any alpha character a to z.

[:cntrl:] Control Characters NL CR LF TAB VT FF NUL SOH STX EXT EOT ENQ ACK SO SI DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC IS1 IS2 IS3 IS4 DEL.

These are always used inside square brackets in the form [[:alnum:]] or combined as [[:digit:]a-d]

Common Extensions and Abbreviations

Some utitlities and most languages provide extensions or abbreviations to simplify(!) regular expressions. These tend to fall into Character Classes or position extensions and the most common are listed below. In general these extensions are defined by PERL and implemented in what is called PCRE's (Perl Compatible Regular Expressions) which has been implemented in the form of a libary that has been ported to many systems. Full details of PCRE. PERL 5.8.8 regular expression documentation.

While the \x type syntax for can look initially confusing the backslash precedes a character that does not normally need escaping and hence can be interpreted correctly by the utility or language - whereas we simple humans tend to become confused more easily. The following are supported by: .NET, PHP, PERL, RUBY, PYTHON, Javascript as well as many others.

Character Class Abbreviations

\d    Match any character in the range 0 - 9 (equivalent of POSIX [:digit:]

\D    Match any character NOT in the range 0 - 9 (equivalent of POSIX [^[:digit:]])

\s    Match any whitespace characters (space, tab etc.). (equivalent of POSIX [:space:] EXCEPT VT is not recognized)

\S    Match any character NOT whitespace (space, tab). (equivalent of POSIX [^[:space:]])

\w    Match any character in the range 0 - 9, A - Z and a - z (equivalent of POSIX [:alnum:])

\W    Match any character NOT the range 0 - 9, A - Z and a - z (equivalent of POSIX [^[:alnum:]])

Positional Abbreviations

\b    Match any character(s) at the beginning or end of a word, thus \bton\b will find ton but not tons but \bton will also find tons

\B    Match any character(s) NOT at the beginning or end of a word, thus \Bton\B will find wanton but not ton



First |  Next |  Previous |  Last