So far, we have learned how to define with regular expressions in order to check if the string contains the specific pattern.
But what if we don't only need to check it, but to get the part of a match?
You can easily do so by using so-called capturing groups.
Important note: make sure to check out the previous article, where regular expressions are explained in-depth.
Capturing group can be created by using parentheses (...)
.
To access the content of the capturing group, use a dollar sign with the group number.
Group numbers start with 1
and are numbered from left to right.
The first capturing group can be accessed using $1
, the second one by $2
and so on.
Imagine the following situation: you develop a project using font sizes in em
units. Some requirements change and you have to use rem
instead.
If the project has a lot of css code, replacing font size units manually probably would take the whole day. Too long, huh?
Capturing groups to rescue:
const string = `
body {
font-size: 1.6em;
}
h1 {
font-size: 3.2em;
}
h2 {
font-size: 2.8em;
}
h3 {
font-size: 2.4em;
}
`;
const pattern = /(\d)em/g;
// Result: each "em" is replaced with "rem"
console.log(string.replace(pattern, "$1rem"));
You might ask: "Is there a chance to modify the returned capturing group"?
Yes, by using function replacement:
const string = `
body {
font-size: 1.6em;
}
h1 {
font-size: 3.2em;
}
h2 {
font-size: 2.8em;
}
h3 {
font-size: 2.4em;
}
`;
const pattern = /(\d)em/g;
// Result: each "em" is replaced with "rem"
console.log(string.replace(pattern, (match, $1, offset, string) => `${$1}rem`));
You can pass the function as a second argument to string.replace
. The function accepts the following parameters:
match
- matched substring$1
- capture groupoffset
- the offset of the matched substring within the whole string being examinedstring
- the whole string being examinedCapturing groups are really helpful when it's necessary to swap some words/expressions. Consider the following example:
const string = "Andrew, John";
const pattern = /(\w+),\s(\w+)/;
console.log(string.replace(pattern, "$2, $1")); // Prints "John, Andrew"
Note, how we have access to the first group using $1
and to the second $2
and how easy it is to swap their positions.
string.replace(searchValue, newValue | function)
This method searches a string for a specified value, or regular expression, and returns a new string where the specified values are replaced.
It does not change the original string.
Important note: If you are replacing a value (and not regular expression), only the first instance of the value will be replaced. To replace all occurrences of a specified value, use the global g
modifier:
const string = "Today we met John. John was happy to see us.";
// Prints "Today we met Andrew. John was happy to see us."
console.log(string.replace("John", "Andrew"));
// Prints "Today we met Andrew. Andrew was happy to see us."
console.log(string.replace(/John/g, "Andrew"));
All of the defined capturing groups can be given name and referenced by it later.
(?<name>pattern)
Example:
const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/g;
const string = "2019-10-30, 2020-01-01";
// Prints "30/10/2019, 01/01/2020"
console.log(string.replace(pattern, "$<day>/$<month>/$<year>"));
In the example above we defined 3 groups, each of them is referenced by name: year
, month
, day
.
A capturing group can be made optional by using (...)?
. If it’s not found, the resulting array slot will contain undefined
:
const pattern = /(\d{4})-(\d{2})(-\d{2})?/g;
const string = "2019-10";
const result = pattern.exec(string);
console.log(result[0]); // Prints "2019-10"
console.log(result[1]); // Prints "2019"
console.log(result[2]); // Prints "10"
console.log(result[3]); // Prints "undefined"
Note, how result[3]
prints undefined
since the third capturing group has not been found.
regex.exec(string)
This method of the RegExp object searches for a match in a specified string. The method returns the results in an array or null
.
const string = "Today we met John.";
const regex = /John/g;
/*
Prints:
[
"John",
index: 13,
input: "Today we met John. John was happy to see us.",
groups: undefined
]
*/
console.log(regex.exec(string));
Capturing groups can be nested. As for not nested capturing groups, numbering also goes from left to right.
Assume you are given a task to search for <div class="example" />
and get the tag name and tag attributes (class in our example):
const string = "<div class='example' />";
const regexp = /<(([a-z]+)\s*([^>]*))>/;
const result = string.match(regexp);
console.log(result[0]); // Prints <div class='example'>
console.log(result[1]); // Prints div class='example', (([a-z]+)\s*([^>]*)) group
console.log(result[2]); // Prints "div", ([a-z]+) group
console.log(result[3]); // Prints class='example', ([^>]*)) group
Note that 0
index always holds the full match, capturing groups are numbered from left to right.
The first group returned as result[1]
encloses the whole tag content, second: result[2]
holds tag name and the third one: result[3]
- class attribute.
string.match(regex)
This method searches a string for a match against a regular expression, and returns the matches, as an array object or null
.
Important note: If the regular expression does not include the (g
) modifier (to perform a global search), the method will return only the first match in the string.
Without g
modifier:
const string = "Today we met John. John was happy to see us.";
const regex = /John/;
/*
Prints:
[
"John",
index: 13,
input: "Today we met John. John was happy to see us.",
groups: undefined
]
*/
console.log(string.match(regex));
With g
modifier:
const string = "Today we met John. John was happy to see us.";
const regex = /John/g;
console.log(string.match(regex)); // Prints: ["John", "John"]
Sometimes we want to apply a quantifier, but we don’t want contents in results. It can be done using ?:
in the beginning:
const string = "Andrew, John";
const pattern = /(?:\w+),\s(\w+)/;
console.log(string.replace(pattern, "$2, $1")); // Prints "$2, Andrew"
Have you noticed that in the example above we received $2, Andrew
instead of John, Andrew
?
(...)
syntax and numbered from left to right$
sign combined with the group's number $1
, $2
, ... or if they were named, by using the name $<name>
(...)?
syntax?:
syntax