22. Regular Expressions
|
22.1 Overview
|
JMeter includes the pattern matching software
Apache Jakarta ORO
There is some documentation for this on the Jakarta web-site, for example
a summary of the pattern matching characters
There is also documentation on an older incarnation of the product at
OROMatcher User's guide
, which might prove useful.
The pattern matching is very similar to the pattern matching in Perl.
A full installation of Perl will include plenty of documentation on regular expressions - look for
perlrequick
,
perlretut
,
perlre
and
perlreref
.
It is worth stressing the difference between "
contains
" and "
matches
", as used on the Response Assertion test element:
-
"
contains
"
-
means that the regular expression matched at least some part of the target,
so '
alphabet
' "
contains
" '
ph.b.
' because the regular expression matches the substring '
phabe
'.
-
"
matches
"
-
means that the regular expression matched the whole target.
So '
alphabet
' is "
matched
" by '
al.*t
'.
In this case, it is equivalent to wrapping the regular expression in
^
and
$
, viz '
^al.*t$
'.
However, this is not always the case.
For example, the regular expression '
alp|.lp.*
' is "
contained
" in '
alphabet
',
but does not "
match
" '
alphabet
'.
Why? Because when the pattern matcher finds the sequence '
alp
' in '
alphabet
', it stops trying any other
combinations - and '
alp
' is not the same as '
alphabet
', as it does not include '
habet
'.
Unlike Perl, there is no need to (i.e. do not) enclose the regular expression in
//
.
|
So how does one use the modifiers
ismx
etc. if there is no trailing
/
?
The solution is to use
extended regular expressions
, i.e.
/abc/i
becomes
(?i)abc
.
See also
Placement of modifiers
below.
|
|
22.2 Examples
|
Extract single string
Suppose you want to match the following portion of a web-page:
name="file" value="readme.txt">
and you want to extract
readme.txt
.
A suitable regular expression would be:
name="file" value="(.+?)">
The special characters above are:
-
(
and
)
-
these enclose the portion of the match string to be returned
-
.
-
match any character
-
+
-
one or more times
-
?
-
don't be greedy, i.e. stop when first match succeeds
Note: without the
?
, the
.+
would continue past the first
">
until it found the last possible
">
- which is probably not what was intended.
Note: although the above expression works, it's more efficient to use the following expression:
name="file" value="([^"]+)">
where
[^"]
- means match anything except
"
In this case, the matching engine can stop looking as soon as it sees the first
"
,
whereas in the previous case the engine has to check that it has found
">
rather than say
" >
.
Extract multiple strings
Suppose you want to match the following portion of a web-page:
name="file.name" value="readme.txt"
and you want to extract both
file.name
and
readme.txt
.
A suitable regular expression would be:
name="([^"]+)" value="([^"]+)"
This would create 2 groups, which could be used in the JMeter Regular Expression Extractor template as
$1$
and
$2$
.
The JMeter Regex Extractor saves the values of the groups in additional variables.
For example, assume:
-
Reference Name:
MYREF
-
Regex:
name="(.+?)" value="(.+?)"
-
Template:
$1$$2$
Do not enclose the regular expression in
/ /
|
The following variables would be set:
-
MYREF
-
file.namereadme.txt
-
MYREF_g0
-
name="file.name" value="readme.txt"
-
MYREF_g1
-
file.name
-
MYREF_g2
-
readme.txt
These variables can be referred to later on in the JMeter test plan, as
${MYREF}
,
${MYREF_g1}
etc.
|
|
22.3 Line mode
|
The pattern matching behaves in various slightly different ways,
depending on the setting of the multi-line and single-line modifiers.
Note that the single-line and multi-line operators have nothing to do with each other;
they can be specified independently.
Single-line mode
Single-line mode only affects how the '
.
' meta-character is interpreted.
Default behaviour is that '
.
' matches any character except newline.
In single-line mode, '
.
' also matches newline.
Multi-line mode
Multi-line mode only affects how the meta-characters '
^
' and '
$
' are interpreted.
Default behaviour is that '
^
' and '
$
' only match at the very beginning and end of the string.
When Multi-line mode is used, the '
^
' metacharacter matches at the beginning of every line,
and the '
$
' metacharacter matches at the end of every line.
|
|
22.4 Meta characters
|
Regular expressions use certain characters as meta characters - these characters have a special meaning to the RE engine.
Such characters must be escaped by preceding them with
\
(backslash) in order to treat them as ordinary characters.
Here is a list of the meta characters and their meaning (please check the ORO documentation if in doubt).
-
(
and
)
-
grouping
-
[
and
]
-
character classes
-
{
and
}
-
repetition
-
*
,
+
and
?
-
repetition
-
.
-
wild-card character
-
\
-
escape character
-
|
-
alternatives
-
^
and
$
-
start and end of string or line
Please note that ORO does not support the
\Q
and
\E
meta-characters.
[In other RE engines, these can be used to quote a portion of an RE so that the meta-characters stand for themselves.]
You can use function to do the equivalent, see
${__escapeOroRegexpChars(valueToEscape)}
.
|
The following Perl5 extended regular expressions are supported by ORO.
-
(?#text)
-
An embedded comment causing text to be ignored.
-
(?:regexp)
-
Groups things like "
()
" but doesn't cause the group match to be saved.
-
(?=regexp)
-
A zero-width positive lookahead assertion. For example,
\w+(?=\s)
matches a word followed by whitespace, without including whitespace in the MatchResult.
-
(?!regexp)
-
A zero-width negative lookahead assertion. For example
foo(?!bar)
matches any occurrence of "
foo
" that
isn't followed by "
bar
". Remember that this is a zero-width assertion, which means that
a(?!b)d
will
match
ad
because
a
is followed by a character that is not
b
(the
d
) and a
d
follows the zero-width assertion.
-
(?imsx)
-
One or more embedded pattern-match modifiers.
i
enables case insensitivity,
m
enables multiline treatment
of the input,
s
enables single line treatment of the input, and
x
enables extended whitespace comments.
Note that
(?<=regexp)
- lookbehind - is not supported.
|
|
|
|
22.6 Testing Regular Expressions
|
Since JMeter 2.4, the listener
View Results Tree
include a RegExp Tester to test regular expressions directly on sampler response data.
There is a
Website
to test Java Regular expressions.
Another approach is to use a simple test plan to test the regular expressions.
The Java Request sampler can be used to generate a sample, or the HTTP Sampler can be used to load a file.
Add a Debug Sampler and a Tree View Listener and changes to the regular expression can be tested quickly,
without needing to access any external servers.
|
|
|