Class CharMatcher
char
value, just as Predicate
does
for any Object
. Also offers basic text processing methods based on this function.
Implementations are strongly encouraged to be side-effect-free and immutable.
Throughout the documentation of this class, the phrase "matching character" is used to mean
"any char
value c
for which this.matches(c)
returns true
".
Warning: This class deals only with char
values, that is, BMP characters. It does not understand
supplementary Unicode code
points in the range 0x10000
to 0x10FFFF
which includes the majority of
assigned characters, including important CJK characters and emoji.
Supplementary characters are encoded
into a String
using surrogate pairs, and a CharMatcher
treats these just as
two separate characters. countIn(java.lang.CharSequence)
counts each supplementary character as 2 char
s.
For up-to-date Unicode character properties (digit, letter, etc.) and support for supplementary code points, use ICU4J UCharacter and UnicodeSet (freeze() after building). For basic text processing based on UnicodeSet use the ICU4J UnicodeSetSpanner.
Example usages:
String trimmed =whitespace()
.trimFrom
(userInput); if (ascii()
.matchesAllOf
(s)) { ... }
See the Guava User Guide article on CharMatcher
.
- Since:
- 1.0
- Author:
- Kevin Bourrillion
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionand
(CharMatcher other) Returns a matcher that matches any character matched by both this matcher andother
.static CharMatcher
any()
Matches any character.static CharMatcher
anyOf
(CharSequence sequence) Returns achar
matcher that matches any BMP character present in the given character sequence.boolean
Deprecated.static CharMatcher
ascii()
Determines whether a character is ASCII, meaning that its code point is less than 128.static CharMatcher
Determines whether a character is a breaking whitespace (that is, a whitespace which can be interpreted as a break between words for formatting purposes).collapseFrom
(CharSequence sequence, char replacement) Returns a string copy of the input character sequence, with each group of consecutive matching BMP characters replaced by a single replacement character.int
countIn
(CharSequence sequence) Returns the number of matchingchar
s found in a character sequence.static CharMatcher
digit()
Deprecated.Many digits are supplementary characters; see the class documentation.static CharMatcher
forPredicate
(Predicate<? super Character> predicate) Returns a matcher with identical behavior to the givenCharacter
-based predicate, but which operates on primitivechar
instances instead.int
indexIn
(CharSequence sequence) Returns the index of the first matching BMP character in a character sequence, or-1
if no matching character is present.int
indexIn
(CharSequence sequence, int start) Returns the index of the first matching BMP character in a character sequence, starting from a given position, or-1
if no character matches after that position.static CharMatcher
inRange
(char startInclusive, char endInclusive) Returns achar
matcher that matches any character in a given BMP range (both endpoints are inclusive).static CharMatcher
Deprecated.Most invisible characters are supplementary characters; see the class documentation.static CharMatcher
is
(char match) Returns achar
matcher that matches only one specified BMP character.static CharMatcher
isNot
(char match) Returns achar
matcher that matches any character except the BMP character specified.static CharMatcher
Deprecated.Many digits are supplementary characters; see the class documentation.static CharMatcher
Determines whether a character is an ISO control character as specified byCharacter.isISOControl(char)
.static CharMatcher
Deprecated.Most letters are supplementary characters; see the class documentation.static CharMatcher
Deprecated.Most letters and digits are supplementary characters; see the class documentation.static CharMatcher
Deprecated.Some lowercase characters are supplementary characters; see the class documentation.static CharMatcher
Deprecated.Some uppercase characters are supplementary characters; see the class documentation.int
lastIndexIn
(CharSequence sequence) Returns the index of the last matching BMP character in a character sequence, or-1
if no matching character is present.abstract boolean
matches
(char c) Determines a true or false value for the given character.boolean
matchesAllOf
(CharSequence sequence) Returnstrue
if a character sequence contains only matching BMP characters.boolean
matchesAnyOf
(CharSequence sequence) Returnstrue
if a character sequence contains at least one matching BMP character.boolean
matchesNoneOf
(CharSequence sequence) Returnstrue
if a character sequence contains no matching BMP characters.negate()
Returns a matcher that matches any character not matched by this matcher.static CharMatcher
none()
Matches no characters.static CharMatcher
noneOf
(CharSequence sequence) Returns achar
matcher that matches any BMP character not present in the given character sequence.or
(CharMatcher other) Returns a matcher that matches any character matched by either this matcher orother
.Returns achar
matcher functionally equivalent to this one, but which may be faster to query than the original; your mileage may vary.removeFrom
(CharSequence sequence) Returns a string containing all non-matching characters of a character sequence, in order.replaceFrom
(CharSequence sequence, char replacement) Returns a string copy of the input character sequence, with each matching BMP character replaced by a given replacement character.replaceFrom
(CharSequence sequence, CharSequence replacement) Returns a string copy of the input character sequence, with each matching BMP character replaced by a given replacement sequence.retainFrom
(CharSequence sequence) Returns a string containing all matching BMP characters of a character sequence, in order.static CharMatcher
Deprecated.Many such characters are supplementary characters; see the class documentation.toString()
Returns a string representation of thisCharMatcher
, such asCharMatcher.or(WHITESPACE, JAVA_DIGIT)
.trimAndCollapseFrom
(CharSequence sequence, char replacement) Collapses groups of matching characters exactly ascollapseFrom(java.lang.CharSequence, char)
does, except that groups of matching BMP characters at the start or end of the sequence are removed without replacement.trimFrom
(CharSequence sequence) Returns a substring of the input character sequence that omits all matching BMP characters from the beginning and from the end of the string.trimLeadingFrom
(CharSequence sequence) Returns a substring of the input character sequence that omits all matching BMP characters from the beginning of the string.trimTrailingFrom
(CharSequence sequence) Returns a substring of the input character sequence that omits all matching BMP characters from the end of the string.static CharMatcher
Determines whether a character is whitespace according to the latest Unicode standard, as illustrated here.
-
Constructor Details
-
CharMatcher
protected CharMatcher()Constructor for use by subclasses. When subclassing, you may want to overridetoString()
to provide a useful description.
-
-
Method Details
-
any
-
none
-
whitespace
Determines whether a character is whitespace according to the latest Unicode standard, as illustrated here. This is not the same definition used by other Java APIs. (See a comparison of several definitions of "whitespace".)All Unicode White_Space characters are on the BMP and thus supported by this API.
Note: as the Unicode definition evolves, we will modify this matcher to keep it up to date.
- Since:
- 19.0 (since 1.0 as constant
WHITESPACE
)
-
breakingWhitespace
Determines whether a character is a breaking whitespace (that is, a whitespace which can be interpreted as a break between words for formatting purposes). Seewhitespace()
for a discussion of that term.- Since:
- 19.0 (since 2.0 as constant
BREAKING_WHITESPACE
)
-
ascii
Determines whether a character is ASCII, meaning that its code point is less than 128.- Since:
- 19.0 (since 1.0 as constant
ASCII
)
-
digit
Deprecated.Many digits are supplementary characters; see the class documentation.Determines whether a character is a BMP digit according to Unicode. If you only care to match ASCII digits, you can useinRange('0', '9')
.- Since:
- 19.0 (since 1.0 as constant
DIGIT
)
-
javaDigit
Deprecated.Many digits are supplementary characters; see the class documentation.Determines whether a character is a BMP digit according to Java's definition. If you only care to match ASCII digits, you can useinRange('0', '9')
.- Since:
- 19.0 (since 1.0 as constant
JAVA_DIGIT
)
-
javaLetter
Deprecated.Most letters are supplementary characters; see the class documentation.Determines whether a character is a BMP letter according to Java's definition. If you only care to match letters of the Latin alphabet, you can useinRange('a', 'z').or(inRange('A', 'Z'))
.- Since:
- 19.0 (since 1.0 as constant
JAVA_LETTER
)
-
javaLetterOrDigit
Deprecated.Most letters and digits are supplementary characters; see the class documentation.Determines whether a character is a BMP letter or digit according to Java's definition.- Since:
- 19.0 (since 1.0 as constant
JAVA_LETTER_OR_DIGIT
).
-
javaUpperCase
Deprecated.Some uppercase characters are supplementary characters; see the class documentation.Determines whether a BMP character is upper case according to Java's definition.- Since:
- 19.0 (since 1.0 as constant
JAVA_UPPER_CASE
)
-
javaLowerCase
Deprecated.Some lowercase characters are supplementary characters; see the class documentation.Determines whether a BMP character is lower case according to Java's definition.- Since:
- 19.0 (since 1.0 as constant
JAVA_LOWER_CASE
)
-
javaIsoControl
Determines whether a character is an ISO control character as specified byCharacter.isISOControl(char)
.All ISO control codes are on the BMP and thus supported by this API.
- Since:
- 19.0 (since 1.0 as constant
JAVA_ISO_CONTROL
)
-
invisible
Deprecated.Most invisible characters are supplementary characters; see the class documentation.Determines whether a character is invisible; that is, if its Unicode category is any of SPACE_SEPARATOR, LINE_SEPARATOR, PARAGRAPH_SEPARATOR, CONTROL, FORMAT, SURROGATE, and PRIVATE_USE according to ICU4J.See also the Unicode Default_Ignorable_Code_Point property (available via ICU).
- Since:
- 19.0 (since 1.0 as constant
INVISIBLE
)
-
singleWidth
Deprecated.Many such characters are supplementary characters; see the class documentation.Determines whether a character is single-width (not double-width). When in doubt, this matcher errs on the side of returningfalse
(that is, it tends to assume a character is double-width).Note: as the reference file evolves, we will modify this matcher to keep it up to date.
See also UAX #11 East Asian Width.
- Since:
- 19.0 (since 1.0 as constant
SINGLE_WIDTH
)
-
is
Returns achar
matcher that matches only one specified BMP character. -
isNot
Returns achar
matcher that matches any character except the BMP character specified.To negate another
CharMatcher
, usenegate()
. -
anyOf
Returns achar
matcher that matches any BMP character present in the given character sequence. Returns a bogus matcher if the sequence contains supplementary characters. -
noneOf
Returns achar
matcher that matches any BMP character not present in the given character sequence. Returns a bogus matcher if the sequence contains supplementary characters. -
inRange
Returns achar
matcher that matches any character in a given BMP range (both endpoints are inclusive). For example, to match any lowercase letter of the English alphabet, useCharMatcher.inRange('a', 'z')
.- Throws:
IllegalArgumentException
- ifendInclusive < startInclusive
-
forPredicate
Returns a matcher with identical behavior to the givenCharacter
-based predicate, but which operates on primitivechar
instances instead. -
matches
Determines a true or false value for the given character. -
negate
Returns a matcher that matches any character not matched by this matcher. -
and
Returns a matcher that matches any character matched by both this matcher andother
. -
or
Returns a matcher that matches any character matched by either this matcher orother
. -
precomputed
Returns achar
matcher functionally equivalent to this one, but which may be faster to query than the original; your mileage may vary. Precomputation takes time and requires more memory, so it is only likely to be worthwhile if the precomputed matcher is queried very often.This method has no effect (returns
this
) when called in GWT: it's unclear whether a precomputed matcher is faster, but it certainly would consume more memory (which doesn't seem like a worthwhile tradeoff in a browser). -
matchesAnyOf
Returnstrue
if a character sequence contains at least one matching BMP character. Equivalent to!matchesNoneOf(sequence)
.The default implementation iterates over the sequence, invoking
matches(char)
for each character, until this returnstrue
or the end is reached.- Parameters:
sequence
- the character sequence to examine, possibly empty- Returns:
true
if this matcher matches at least one character in the sequence- Since:
- 8.0
-
matchesAllOf
Returnstrue
if a character sequence contains only matching BMP characters.The default implementation iterates over the sequence, invoking
matches(char)
for each character, until this returnsfalse
or the end is reached.- Parameters:
sequence
- the character sequence to examine, possibly empty- Returns:
true
if this matcher matches every character in the sequence, including when the sequence is empty
-
matchesNoneOf
Returnstrue
if a character sequence contains no matching BMP characters. Equivalent to!matchesAnyOf(sequence)
.The default implementation iterates over the sequence, invoking
matches(char)
for each character, until this returnstrue
or the end is reached.- Parameters:
sequence
- the character sequence to examine, possibly empty- Returns:
true
if this matcher matches no characters in the sequence, including when the sequence is empty
-
indexIn
Returns the index of the first matching BMP character in a character sequence, or-1
if no matching character is present.The default implementation iterates over the sequence in forward order calling
matches(char)
for each character.- Parameters:
sequence
- the character sequence to examine from the beginning- Returns:
- an index, or
-1
if no character matches
-
indexIn
Returns the index of the first matching BMP character in a character sequence, starting from a given position, or-1
if no character matches after that position.The default implementation iterates over the sequence in forward order, beginning at
start
, callingmatches(char)
for each character.- Parameters:
sequence
- the character sequence to examinestart
- the first index to examine; must be nonnegative and no greater thansequence.length()
- Returns:
- the index of the first matching character, guaranteed to be no less than
start
, or-1
if no character matches - Throws:
IndexOutOfBoundsException
- if start is negative or greater thansequence.length()
-
lastIndexIn
Returns the index of the last matching BMP character in a character sequence, or-1
if no matching character is present.The default implementation iterates over the sequence in reverse order calling
matches(char)
for each character.- Parameters:
sequence
- the character sequence to examine from the end- Returns:
- an index, or
-1
if no character matches
-
countIn
Returns the number of matchingchar
s found in a character sequence.Counts 2 per supplementary character, such as for
whitespace()
().negate()
(). -
removeFrom
Returns a string containing all non-matching characters of a character sequence, in order. For example:
... returnsCharMatcher.is('a').removeFrom("bazaar")
"bzr"
. -
retainFrom
Returns a string containing all matching BMP characters of a character sequence, in order. For example:
... returnsCharMatcher.is('a').retainFrom("bazaar")
"aaa"
. -
replaceFrom
Returns a string copy of the input character sequence, with each matching BMP character replaced by a given replacement character. For example:
... returnsCharMatcher.is('a').replaceFrom("radar", 'o')
"rodor"
.The default implementation uses
indexIn(CharSequence)
to find the first matching character, then iterates the remainder of the sequence callingmatches(char)
for each character.- Parameters:
sequence
- the character sequence to replace matching characters inreplacement
- the character to append to the result string in place of each matching character insequence
- Returns:
- the new string
-
replaceFrom
Returns a string copy of the input character sequence, with each matching BMP character replaced by a given replacement sequence. For example:
... returnsCharMatcher.is('a').replaceFrom("yaha", "oo")
"yoohoo"
.Note: If the replacement is a fixed string with only one character, you are better off calling
replaceFrom(CharSequence, char)
directly.- Parameters:
sequence
- the character sequence to replace matching characters inreplacement
- the characters to append to the result string in place of each matching character insequence
- Returns:
- the new string
-
trimFrom
Returns a substring of the input character sequence that omits all matching BMP characters from the beginning and from the end of the string. For example:
... returnsCharMatcher.anyOf("ab").trimFrom("abacatbab")
"cat"
.Note that:
... is equivalent toCharMatcher.inRange('\0', ' ').trimFrom(str)
String.trim()
. -
trimLeadingFrom
Returns a substring of the input character sequence that omits all matching BMP characters from the beginning of the string. For example:
... returnsCharMatcher.anyOf("ab").trimLeadingFrom("abacatbab")
"catbab"
. -
trimTrailingFrom
Returns a substring of the input character sequence that omits all matching BMP characters from the end of the string. For example:
... returnsCharMatcher.anyOf("ab").trimTrailingFrom("abacatbab")
"abacat"
. -
collapseFrom
Returns a string copy of the input character sequence, with each group of consecutive matching BMP characters replaced by a single replacement character. For example:
... returnsCharMatcher.anyOf("eko").collapseFrom("bookkeeper", '-')
"b-p-r"
.The default implementation uses
indexIn(CharSequence)
to find the first matching character, then iterates the remainder of the sequence callingmatches(char)
for each character.- Parameters:
sequence
- the character sequence to replace matching groups of characters inreplacement
- the character to append to the result string in place of each group of matching characters insequence
- Returns:
- the new string
-
trimAndCollapseFrom
Collapses groups of matching characters exactly ascollapseFrom(java.lang.CharSequence, char)
does, except that groups of matching BMP characters at the start or end of the sequence are removed without replacement. -
apply
@InlineMe(replacement="this.matches(character)") @Deprecated public boolean apply(Character character) Deprecated.Provided only to satisfy thePredicate
interface; usematches(char)
instead.Description copied from interface:Predicate
Returns the result of applying this predicate toinput
(Java 8+ users, see notes in the class documentation above). This method is generally expected, but not absolutely required, to have the following properties:- Its execution does not cause any observable side effects.
- The computation is consistent with equals; that is,
Objects.equal
(a, b)
implies thatpredicate.apply(a) == predicate.apply(b))
.
-
toString
-
Predicate
interface; usematches(char)
instead.