Java正则表达式教程及示例

合集下载

相关主题

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

当我开始我的Java职业生涯的时候，对于我来说正则表达式简直是个是梦魇。

本教程旨在帮助你驾驭Java正则表达式，同时也帮助我复习正则表达式。

什么是正则表达式?
正则表达式定义了字符串的模式。

正则表达式可以用来搜索、编辑或处理文本。

正则表达式并不仅限于某一种语言，但是在每种语言中有细微的差别。

Java正则表达式和Perl的是最为相似的。

Java正则表达式的类在java.util.regex 包中，包括三个类：Pattern,Matcher 和 PatternSyntaxException。

1. Pattern对象是正则表达式的已编译版本。

他没有任何公共构造器，我们通过传递一个正则表达式参数给公共静
态方法compile 来创建一个pattern对象。

2. Matcher是用来匹配输入字符串和创建的pattern 对象的正则引擎对象。

这个类没有任何公共构造器，我们用
patten对象的matcher方法，使用输入字符串作为参数来获得一个Matcher对象。

然后使用matches方法，通过返回的布尔值判断输入字符串是否与正则匹配。

3. 如果正则表达式语法不正确将抛出PatternSyntaxException异常。

让我们在一个简单的例子里看看这些类是怎么用的吧
1 2 3 4 5 6 7 8 9
10
11
12
13
14
15
16
17
18 package com.journaldev.util;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexExamples {
public static void main(String[] args) {
// using pattern with flags
Pattern pattern = pile("ab", Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher("ABcabdAb");
// using Matcher find(), group(), start() and end() methods
while(matcher.find()) {
System.out.println("Found the text \""+ matcher.group()
+ "\" starting at "+ matcher.start()
+ " index and ending at index "+ matcher.end());
}
// using Pattern split() method
pattern = pile("\\W");
String[] words = pattern.split("one@two#three:four$five");
for(String s : words) {
System.out.println("Split using Pattern.split(): "+ s);
}
// using Matcher.replaceFirst() and replaceAll() methods
pattern = pile("1*2");
19
20
21
22
23 matcher = pattern.matcher("11234512678");
System.out.println("Using replaceAll: "+ matcher.replaceAll("_")); System.out.println("Using replaceFirst: "+
matcher.replaceFirst("_"));
}
}
上述程序的输出是：
Input String matches regex - true
Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
*xx*
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.sequence(Pattern.java:2090)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at pile(Pattern.java:1665)
at java.util.regex.Pattern.(Pattern.java:1337)
at pile(Pattern.java:1022)
at com.journaldev.util.PatternExample.main(PatternExample.java:13)
既然正则表达式总是和字符串有关，Java 1.4对String类进行了扩展，提供了一个matches方法来匹配pattern。

在方法内部使用Pattern和Matcher类来处理这些东西，但显然这样减少了代码的行数。

Pattern类同样有matches方法，可以让正则和作为参数输入的字符串匹配，输出布尔值结果。

下述的代码可以将输入字符串和正则表达式进行匹配。

1 2 3 String str = "bbb";
System.out.println("Using String matches method: "+str.matches(".bb")); System.out.println("Using Pattern matches method: "+Pattern.matches(".bb", str));
所以如果你的需要仅仅是检查输入字符串是否和pattern匹配，你可以通过调用String的matches方法省下时间。

只有当你需要操作输入字符串或者重用pattern的时候，你才需要使用Pattern和Matches类。

注意由正则定义的pattern是从左至右应用的，一旦一个原字符在一次匹配中使用过了，将不会再次使用。

例如，正则“121”只会匹配两次字符串“31212142121″，就像这样“_121____121″。

正则表达式通用匹配符号
正则表达式说明示例
. Matches any single sign, includes
everything
匹配任何单个符号，包括所有字符
(“..”, “a%”) –true(“..”, “.a”) – true
(“..”, “a”) – false
^xxx 在开头匹配正则xxx (“^a.c.”, “abcd”) – true(“^a”, “a”) – true
(“^a”, “ac”) – false
xxx$ 在结尾匹配正则xxx (“..cd$”, “abcd”) –true(“a$”, “a”) – true (“a$”, “aca”) – false
[abc] 能够匹配字母a,b或c。

[]被称为character
classes。

(“^[abc]d.”, “ad9″) –true(“[ab].d$”, “bad”) – true
(“[ab]x”, “cx”) – false
[abc][12] 能够匹配由1或2跟着的a,b或c (“[ab][12].”, “a2#”) –true(“[ab]..[12]“, “acd2″) – true
(“[ab][12]“, “c2″) – false
[^abc] 当^是[]中的第一个字符时代表取反，匹配除了
a,b或c之外的任意字符。

(“[^ab][^12].”, “c3#”) –true(“[^ab]..[^12]“, “xcd3″) –
true
(“[^ab][^12]“, “c2″) – false
[a-e1-8] 匹配a到e或者1到8之间的字符(“[a-e1-3].”, “d#”) –true(“[a-e1-3]“, “2″) – true (“[a-e1-3]“, “f2″) – false
xx|yy 匹配正则xx或者yy (“x.|y”, “xa”) –true(“x.|y”, “y”) – true (“x.|y”, “yz”) – false
正则表达式说明
\d 任意数字，等同于[0-9]
\D 任意非数字，等同于[^0-9]
\s 任意空白字符，等同于[\t\n\x0B\f\r]
\S 任意非空白字符，等同于[^\s]
\w 任意英文字符，等同于[a-zA-Z_0-9]
\W 任意非英文字符，等同于[^\w]
\b 单词边界
\B 非单词边界
Java正则表达式元字符
有两种方法可以在正则表达式中像一般字符一样使用元字符。

1. 在元字符前添加反斜杠(\)
2. 将元字符置于\Q(开始引用)和\E(结束引用)间
正则表达式量词
量词指定了字符匹配的发生次数。

正则表达式说明
x? x没有出现或者只出现一次
X* X出现0次或更多
X+ X出现1次或更多
X{n} X正好出现n次
X{n,} X出席n次或更多
X{n,m} X出现至少n次但不多于m次
量词可以和character classes和capturing group一起使用。

例如，[abc]+表示a,b或c出现一次或者多次。

(abc)+表示capturing group “abc”出现一次或多次。

我们即将讨论capturing group。

正则表达式capturing group
Capturing group是用来对付作为一个整体出现的多个字符。

你可以通过使用()来建立一个group。

输入字符串中和capturing group相匹配的部分将保存在内存里，并且可以通过使用Backreference调用。

你可以使用matcher.groupCount方法来获得一个正则pattern中capturing groups的数目。

例如((a)(bc))包含3个capturing groups; ((a)(bc)), (a) 和(bc)。

你可以使用在正则表达式中使用Backreference，一个反斜杠(\)接要调用的group号码。

Capturing groups和Backreferences可能很令人困惑，所以我们通过一个例子来理解。

1 2 3 4 System.out.println(Pattern.matches("(\\w\\d)\\1", "a2a2")); //true
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2b2")); //false
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B2AB")); //true
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B3AB")); //false
在第一个例子里，运行的时候第一个capturing group是(\w\d)，在和输入字符串“a2a2″匹配的时候获取“a2″并保存到内存里。

因此\1是”a2”的引用，并且返回true。

基于相同的原因，第二行代码打印false。

试着自己理解第三行和第四行代码。

:)
现在我们来看看Pattern和Matcher类中一些重要的方法。

我们可以创建一个带有标志的Pattern对象。

例如Pattern.CASE_INSENSITIVE可以进行大小写不敏感的匹配。

Pattern 类同样提供了和String类相似的split(String)方法
Pattern类toString()方法返回被编译成这个pattern的正则表达式字符串。

Matcher类有start()和end()索引方法，他们可以显示从输入字符串中匹配到的准确位置。

Matcher类同样提供了字符串操作方法replaceAll(String replacement)和replaceFirst(String replacement)。

现在我们在一个简单的java类中看看这些函数是怎么用的。

1 2 3 4 5 6 7 8 9
10
11 package com.journaldev.util;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexExamples {
public static void main(String[] args) {
// using pattern with flags
Pattern pattern = pile("ab", Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher("ABcabdAb");
// using Matcher find(), group(), start() and end() methods
while(matcher.find()) {
System.out.println("Found the text \""+ matcher.group()
+ "\" starting at "+ matcher.start()
+ " index and ending at index "+ matcher.end());
}
12
13
14
15
16
17
18
19
20
21
22
23
24 // using Pattern split() method
pattern = pile("\\W");
String[] words = pattern.split("one@two#three:four$five");
for(String s : words) {
System.out.println("Split using Pattern.split(): "+ s);
}
// using Matcher.replaceFirst() and replaceAll() methods
pattern = pile("1*2");
matcher = pattern.matcher("11234512678");
System.out.println("Using replaceAll: "+ matcher.replaceAll("_")); System.out.println("Using replaceFirst: "+
matcher.replaceFirst("_"));
}
}
上述程序的输出：1
2 3 4 5 6 7 8 9 10 Found the text "AB"starting at 0index and ending at index 2 Found the text "ab"starting at 3index and ending at index 5 Found the text "Ab"starting at 6index and ending at index 8 Split using Pattern.split(): one
Split using Pattern.split(): two
Split using Pattern.split(): three
Split using Pattern.split(): four
Split using Pattern.split(): five
Using replaceAll: _345_678
Using replaceFirst: _34512678。