memo regex

28 mai 2018

Rédigé par Paulo Aucun commentaire

Classé dans : Bash, Linux, Divers Mots clés : regex

Memo 'classe de caractere' utilisable en vim, sed, python et probablement bien d'autres...
Attention: pour utiliser les expressions ci-dessous : il faut les encadrer par des []
ex : [[:digit:]]

‘[:alnum:]’
    Alphanumeric characters: ‘[:alpha:]’ and ‘[:digit:]’; in the ‘C’ locale and ASCII character encoding, this is the same as ‘[0-9A-Za-z]’.

‘[:alpha:]’
    Alphabetic characters: ‘[:lower:]’ and ‘[:upper:]’; in the ‘C’ locale and ASCII character encoding, this is the same as ‘[A-Za-z]’.

‘[:blank:]’
    Blank characters: space and tab.

‘[:cntr l:]’
    Control characters. In ASCII, these characters have octal codes 000 through 037, and 177 (DEL). In other character sets, these are the equivalent characters, if any.

‘[:digit:]’
    Digits: 0 1 2 3 4 5 6 7 8 9.

‘[:graph:]’
    Graphical characters: ‘[:alnum:]’ and ‘[:punct:]’.

‘[:lower:]’
    Lower-case letters; in the ‘C’ locale and ASCII character encoding, this is a b c d e f g h i j k l m n o p q r s t u v w x y z.

‘[:print:]’
    Printable characters: ‘[:alnum:]’, ‘[:punct:]’, and space.

‘[:punct:]’
    Punctuation characters; in the ‘C’ locale and ASCII character encoding, this is ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~.

‘[:space:]’
    Space characters: in the ‘C’ locale, this is tab, newline, vertical tab, form feed, carriage return, and space.

‘[:upper:]’
    Upper-case letters: in the ‘C’ locale and ASCII character encoding, this is A B C D E F G H I J K L M N O P Q R S T U V W X Y Z.

‘[:xdigit:]’
    Hexadecimal digits: 0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f.

source : https://www.gnu.org/software/sed/manual/html_node/Character-Classes-and-Bracket-Expressions.html
http://vimregex.com/

#	Matching	#	Matching
.	any character except new line
\s	whitespace character	\S	non-whitespace character
\d	digit	\D	non-digit
\x	hex digit	\X	non-hex digit
\o	octal digit	\O	non-octal digit
\h	head of word character (a,b,c...z,A,B,C...Z and _)	\H	non-head of word character
\p	printable character	\P	like \p, but excluding digits
\w	word character	\W	non-word character
\a	alphabetic character	\A	non-alphabetic character
\l	lowercase character	\L	non-lowercase character
\u	uppercase character	\U	non-uppercase character

Quantifier	Description
*	matches 0 or more of the preceding characters, ranges or metacharacters .* matches everything including empty line
\+	matches 1 or more of the preceding characters...
\=	matches 0 or 1 more of the preceding characters...
\{n,m}	matches from n to m of the preceding characters...
\{n}	matches exactly n times of the preceding characters...
\{,m}	matches at most m (from 0 to m) of the preceding characters...
\{n,}	matches at least n of of the preceding characters...
where n and m are positive integers (>0)

Quantifier	Description
\{-}	matches 0 or more of the preceding atom, as few as possible
\{-n,m}	matches 1 or more of the preceding characters...
\{-n,}	matches at lease or more of the preceding characters...
\{-,m}	matches 1 or more of the preceding characters...
where n and m are positive integers (>0)

Python, gestion des groupes :

import re

# definition des groupes : device, tag et alias
RE_IFACE = re.compile('^(?P<device>[^.]+)'
                          '(\.(?P<tag>[0-9]+v?))?'
                          '(:(?P<alias>.+))?$')

m = RE_IFACE.match(iface)
if m: 
    # acces au contenu de chaque groupe
    print m.groups()
    device, tag, alias = m.group('device', 'tag', 'alias')