memo regex

Rédigé par Paulo Aucun commentaire
Classé dans : Bash, Linux, Divers Mots clés : regex

Memo 'classe de caractere' utilisable en vim, sed, python et probablement bien d'autres...
Attention: pour utiliser les expressions ci-dessous : il faut les encadrer par des []
ex : [[:digit:]]

‘[:alnum:]’
    Alphanumeric characters: ‘[:alpha:]’ and ‘[:digit:]’; in the ‘C’ locale and ASCII character encoding, this is the same as ‘[0-9A-Za-z]’.

‘[:alpha:]’
    Alphabetic characters: ‘[:lower:]’ and ‘[:upper:]’; in the ‘C’ locale and ASCII character encoding, this is the same as ‘[A-Za-z]’.

‘[:blank:]’
    Blank characters: space and tab.

‘[:cntrl:]’
    Control characters. In ASCII, these characters have octal codes 000 through 037, and 177 (DEL). In other character sets, these are the equivalent characters, if any.

‘[:digit:]’
    Digits: 0 1 2 3 4 5 6 7 8 9.

‘[:graph:]’
    Graphical characters: ‘[:alnum:]’ and ‘[:punct:]’.

‘[:lower:]’
    Lower-case letters; in the ‘C’ locale and ASCII character encoding, this is a b c d e f g h i j k l m n o p q r s t u v w x y z.

‘[:print:]’
    Printable characters: ‘[:alnum:]’, ‘[:punct:]’, and space.

‘[:punct:]’
    Punctuation characters; in the ‘C’ locale and ASCII character encoding, this is ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~.

‘[:space:]’
    Space characters: in the ‘C’ locale, this is tab, newline, vertical tab, form feed, carriage return, and space.

‘[:upper:]’
    Upper-case letters: in the ‘C’ locale and ASCII character encoding, this is A B C D E F G H I J K L M N O P Q R S T U V W X Y Z.

‘[:xdigit:]’
    Hexadecimal digits: 0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f.

source : https://www.gnu.org/software/sed/manual/html_node/Character-Classes-and-Bracket-Expressions.html
               http://vimregex.com/

 

#
Matching
#
Matching
.
any character except new line    
\s
whitespace character
\S
non-whitespace character
\d
digit
\D
non-digit
\x
hex digit
\X
non-hex digit
\o
octal digit
\O
non-octal digit
\h
head of word character (a,b,c...z,A,B,C...Z and _)
\H
non-head of word character
\p
printable character
\P
like \p, but excluding digits
\w
word character
\W
non-word character
\a
alphabetic character
\A
non-alphabetic character
\l
lowercase character
\L
non-lowercase character
\u
uppercase character
\U
non-uppercase character

 

Quantifier
Description
*
matches 0 or more of the preceding characters, ranges or metacharacters .* matches everything including empty line
\+
matches 1 or more of the preceding characters...
\=
matches 0 or 1 more of the preceding characters...
\{n,m}
matches from n to m of the preceding characters...
\{n}
matches exactly n times of the preceding characters...
\{,m}
matches at most m (from 0 to m) of the preceding characters...
\{n,}
matches at least n of of the preceding characters...
where n and m are positive integers (>0)

Quantifier
Description
\{-}
matches 0 or more of the preceding atom, as few as possible
\{-n,m}
matches 1 or more of the preceding characters...
\{-n,}
matches at lease or more of the preceding characters...
\{-,m}
matches 1 or more of the preceding characters...
where n and m are positive integers (>0)


Python, gestion des groupes :

import re

# definition des groupes : device, tag et alias
RE_IFACE = re.compile('^(?P<device>[^.]+)'
                          '(\.(?P<tag>[0-9]+v?))?'
                          '(:(?P<alias>.+))?$')

m = RE_IFACE.match(iface)
if m: 
    # acces au contenu de chaque groupe
    print m.groups()
    device, tag, alias = m.group('device', 'tag', 'alias')

 

Les commentaires sont fermés.