【IT专家】在Python中逐步查找流数据中的正则表达式匹配
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
本文由我司收集整编,推荐下载,如有疑问,请与我司联系
在Python 中逐步查找流数据中的正则表达式匹配
在Python 中逐步查找流数据中的正则表达式匹配[英]Incrementally finding regular expression matches in streaming data in Python I have data streaming into a
number of TCP sockets continuously. For each, I have a different regular expression that I
need to pull out matches for. For example, one might match numbers of the format
##.#
followed by the letter f:
我有数据流连续进入许多TCP 套接字。对于每一个,我有一个不同的正则表达式,
我需要拉出匹配。例如,可以匹配格式##。#的数字,后跟字母f:
r = repile(rb’([0-9][0-9]\.[0-9])f’)Another might match numbers of the format ### preceded by the letter Q:
另一个可能匹配字母Q 前面的###格式的数字:
r = repile(rb’Q([0-9][0-9][0-9])’) In reality, the expressions may be of arbitrary length and complexity, and are pulled from configuration files and not known in advance. They are not hard-coded.
实际上,表达式可以具有任意长度和复杂性,并且从配置文件中提取并且事先不知
道。它们不是硬编码的。
When new data comes in, I append it to a buffer of type bytearray() (here called self.buffer). Then I call a function like this (with self.r being the compiled regular expression):
当新数据进入时,我将它附加到bytearray()类型的缓冲区(此处称为self.buffer)。然
后我调用这样的函数(self.r 是编译的正则表达式):
def advance(self): m = self.r.search(self.buffer) # No match. Return. if m is None: return None # Match. Advance the buffer and return the matched groups. self.buffer = self.buffer[m.end():] return m.groups() If there is no match yet, it returns None. If there is a match, it returns the match and discards the buffer up to the end of the match, making
itself ready to be called again.