Talk by Jakon Westhoff (Qafoo) given on Friday 2nd March
Different languages utilize different different regular expression engines : PHP (PCRE), Java, Python, Ruby, etc…
Delimiter with / for example : /foobar/i : i is case insensitive
MetaCharacters :
- * : any number of occurrences
- + : at least once
- ? : once or not at all
- {x,y} : occurrences between x and y
Character classes
- matches any character EXCEPT new line; but using the s , for example : (The.Point)s it can also match new lines
- character classes : [abcdef]+ any character in the bracket would be matched on or several times; ranges : [a–cd-f]+ most of the metacharacters loose their meaning inside brackets; except – for the range
- [^abcef]+ : negates a the new line is part of it; to except the new line : [^\n]+
- predefined character classes : \d (digit) \s : every whitespace whitespace; the capital letters negate : \D everything but a digit
- (something)D : no new line tolerated at the end
Alternatives
- Logical OR : Open|Source : matches the first found : Open or Source
Escaping
- \ in front of the character (it is supposed to be a literal one, not a special one)
- be careful that according to your programming language, \ has also a meaning; so \\n become current…
Anchors
- ^ beginning of the subject
- $ end of the character
- (^abcdef$)m enables multiline mode
Sub pattern
- ((abc)(def)) : to extract part of the string : 1 -> abc and 2 -> def
- named sub-pattern : P option
Readability
- x : Extend your pattern’s legibility by permitting whitespace and comments.
- # is ignored, so you can use it to comment
- whitespaces are ignored if they are not escaped