Python re 模块使用 | realyee's blog

两种使用方式

直接使用 re 模块生成 Match 对象

import re
test = """
hello 12 world
Goodbye 224 world
h yet anothet 'h'
"""
pattern = r"\d{1,}"
match_obj = re.search(pattern, test)
print(match_obj)
print(match_obj.group())

编译 Pattern 对象，然后用其生成 Match 对象

先使用 re.complie 将模式字符串编译为 pattern 对象，然后使用 pattern 对象的 match， search， findall， finditer 方法（它们的参数只是相对于 re 自带的方法少了一个模式字符串而已）

import re
test = """
hello 12 world
Goodbye 224 world
h yet anothet 'h'
"""
pattern = r"\d{1,}"
pattern_obj = re.compile(pattern)
match_obj = pattern_obj.search(test)
print(match_obj)
print(match_obj.group())

使用

findall 返回列表，列表不能 group（）

注意：match， search 等匹配后，需要判断返回对象是否为空，空对象不能执行 match 对象的方法们，会报错。

match 对象的方法

start()：Find starting index position of a group end(): Find last index position of a group group(): Retrieve value of a group by number or name

Find the first match

match, search, search: 只返回第一个满足条件的匹配

Find all Matches

findall 与 finditer

findall: find all match in a single call 返回匹配列表，直接输出即可。

finditer: iterate through matches one by one 返回迭代器，通过 for 逐个循环迭代，当成 match 对象使用。

finditer 用来解决 findall 两个缺陷：

大量数据搜索时间长，存储的列表数据过大
不知道每个 match 的位置

Find and Match Text

Python 的 re 模块提供了 re.sub 用于替换字符串中的匹配项。

语法：

re.sub(pattern, repl, string, count=0, flags=0)

参数：

pattern : 正则中的模式字符串。
repl : 替换的纯字符串(可以引用分组)，也可为一个函数。 Notice: 并不是 pattern 模式字符串,只是可以引用而已., 其中是指 group，后面跟 number 或 name.
string : 要被查找替换的原始字符串。
count : 模式匹配后替换的最大次数，默认 0 表示替换所有的匹配。

def substitution_example():
    pattern = r"(?P<value>\d+(,\d{3})*(\.\d{2})?)\s+dollar(s)?"

    replacement_pattern = r'**USD \g<value>**'

    text = \
'''Widget Unit cost: 12,000.56 dollars
Taxes: 234.00 dollars
Total: 12,234.56 dollars'''
    print ('Pattern: {0}'.format(pattern))
    print ('---Text:\n{0}'.format(text))
    #successful match
    new_text = re.sub(pattern, replacement_pattern, text)

    print('---New Text:\n{0}'.format(new_text))

inline option

[python re inline option](https://docs.python.org/3/library/re.html#:~:text=currently%20supported%20extensions.-,(%3FaiLmsux),-(One%20or%20more)

(?aiLmsux)

re 模块可以使用 flag 传参, 或者直接使用 inline option

i: 大小写 a: ascii 码 only

例如:

1
2
3

(?i)d
匹配:
Don't d

Single Characters

Negate the whole range with ^

Problem: Find all occurrence of characters NOT in (a,b,c,d,x,y,z,0,1,2,3)
Pattern: [^a-dx-z0-3]

Escape Character

Problem: Find all occurrence of .(dot)
Pattern：\.
Text: This. is. a text

Control character(tab, newline, carriage return and so forth)

Problem: Find all occurrences of tab
Pattern:
text: One tab. Two tebs

Set-Negation

Problem: Find all occurrence of characters that NOT vowels
Pattern: [^aeiou]
Text: this is ^ a big test

[^set]: Not in that set, 只要在开头加了 ^, 那么这个集合就都是不包含的集合

例如:[^ae^iou], 匹配: Tig