Confoo 2012 : Regular Expressions

Talk by Jakon Westhoff (Qafoo) given on Friday 2nd March

Different languages utilize different different regular expression engines : PHP (PCRE), Java, Python, Ruby, etc…

Delimiter  with / for example : /foobar/i  : i is case insensitive

MetaCharacters :

  • * : any number of occurrences
  • + : at least once
  • ? : once or not at all
  • {x,y} : occurrences between x and y

Character classes

  • matches any character EXCEPT new line; but using the s , for example : (The.Point)s   it can also match new lines
  • character classes : [abcdef]+ any character in the bracket would be matched on or several times; ranges :  [a–cd-f]+ most of the metacharacters loose their meaning inside brackets; except for the range
  • [^abcef]+ : negates a the new line is part of it; to except the new line : [^\n]+
  • predefined character classes : \d (digit) \s : every whitespace whitespace; the  capital letters negate : \D everything but a digit
  • (something)D : no new line tolerated at the end

Alternatives

  • Logical OR : Open|Source : matches the first found : Open or Source

Escaping

  • \ in front of the character (it is supposed to be a literal one, not a special one)
  • be careful that according to your programming language, \ has also a meaning; so \\n become current…

Anchors

  • ^ beginning of the subject
  • $ end of the character
  • (^abcdef$)m enables multiline mode

Sub pattern

  • ((abc)(def)) : to extract part of the string : 1 -> abc and 2 -> def
  • named sub-pattern : P option

Readability

  • x : Extend your pattern’s legibility by permitting whitespace and comments.
  • # is ignored, so you can use it to comment
  • whitespaces are ignored if they are not escaped