@Beta @GwtCompatible(emulated=true) public abstract class CharMatcher extends Object implements Predicate<Character>
char
value, just as Predicate
does
for any Object
. Also offers basic text processing methods based on this function.
Implementations are strongly encouraged to be side-effect-free and immutable.
Throughout the documentation of this class, the phrase "matching character" is used to mean
"any character c
for which this.matches(c)
returns true
".
Note: This class deals only with char
values; it does not understand
supplementary Unicode code points in the range 0x10000
to 0x10FFFF
. Such logical
characters are encoded into a String
using surrogate pairs, and a CharMatcher
treats these just as two separate characters.
Example usages:
String trimmed =WHITESPACE
.trimFrom
(userInput); if (ASCII
.matchesAllOf
(s)) { ... }
See the Guava User Guide article on
CharMatcher
.
Modifier and Type | Field and Description |
---|---|
static CharMatcher |
ANY
Matches any character.
|
static CharMatcher |
ASCII
Determines whether a character is ASCII, meaning that its code point is less than 128.
|
static CharMatcher |
BREAKING_WHITESPACE
Determines whether a character is a breaking whitespace (that is, a whitespace which can be
interpreted as a break between words for formatting purposes).
|
static CharMatcher |
DIGIT
Determines whether a character is a digit according to
Unicode.
|
static CharMatcher |
INVISIBLE
Determines whether a character is invisible; that is, if its Unicode category is any of
SPACE_SEPARATOR, LINE_SEPARATOR, PARAGRAPH_SEPARATOR, CONTROL, FORMAT, SURROGATE, and
PRIVATE_USE according to ICU4J.
|
static CharMatcher |
JAVA_DIGIT
Determines whether a character is a digit according to
Java's
definition . |
static CharMatcher |
JAVA_ISO_CONTROL
Determines whether a character is an ISO control character as specified by
Character.isISOControl(char) . |
static CharMatcher |
JAVA_LETTER
Determines whether a character is a letter according to
Java's
definition . |
static CharMatcher |
JAVA_LETTER_OR_DIGIT
Determines whether a character is a letter or digit according to
Java's definition . |
static CharMatcher |
JAVA_LOWER_CASE
Determines whether a character is lower case according to
Java's definition . |
static CharMatcher |
JAVA_UPPER_CASE
Determines whether a character is upper case according to
Java's definition . |
static CharMatcher |
NONE
Matches no characters.
|
static CharMatcher |
SINGLE_WIDTH
Determines whether a character is single-width (not double-width).
|
static CharMatcher |
WHITESPACE
Determines whether a character is whitespace according to the latest Unicode standard, as
illustrated
here.
|
Modifier | Constructor and Description |
---|---|
protected |
CharMatcher()
Constructor for use by subclasses.
|
Modifier and Type | Method and Description |
---|---|
CharMatcher |
and(CharMatcher other)
Returns a matcher that matches any character matched by both this matcher and
other . |
static CharMatcher |
anyOf(CharSequence sequence)
Returns a
char matcher that matches any character present in the given character
sequence. |
boolean |
apply(Character character)
Equivalent to
matches(char) ; provided only to satisfy the Predicate interface. |
String |
collapseFrom(CharSequence sequence,
char replacement)
Returns a string copy of the input character sequence, with each group of consecutive
characters that match this matcher replaced by a single replacement character.
|
int |
countIn(CharSequence sequence)
Returns the number of matching characters found in a character sequence.
|
static CharMatcher |
forPredicate(Predicate<? super Character> predicate)
Returns a matcher with identical behavior to the given
Character -based predicate, but
which operates on primitive char instances instead. |
int |
indexIn(CharSequence sequence)
Returns the index of the first matching character in a character sequence, or
-1 if no
matching character is present. |
int |
indexIn(CharSequence sequence,
int start)
Returns the index of the first matching character in a character sequence, starting from a
given position, or
-1 if no character matches after that position. |
static CharMatcher |
inRange(char startInclusive,
char endInclusive)
Returns a
char matcher that matches any character in a given range (both endpoints are
inclusive). |
static CharMatcher |
is(char match)
Returns a
char matcher that matches only one specified character. |
static CharMatcher |
isNot(char match)
Returns a
char matcher that matches any character except the one specified. |
int |
lastIndexIn(CharSequence sequence)
Returns the index of the last matching character in a character sequence, or
-1 if no
matching character is present. |
abstract boolean |
matches(char c)
Determines a true or false value for the given character.
|
boolean |
matchesAllOf(CharSequence sequence)
Returns
true if a character sequence contains only matching characters. |
boolean |
matchesAnyOf(CharSequence sequence)
Returns
true if a character sequence contains at least one matching character. |
boolean |
matchesNoneOf(CharSequence sequence)
Returns
true if a character sequence contains no matching characters. |
CharMatcher |
negate()
Returns a matcher that matches any character not matched by this matcher.
|
static CharMatcher |
noneOf(CharSequence sequence)
Returns a
char matcher that matches any character not present in the given character
sequence. |
CharMatcher |
or(CharMatcher other)
Returns a matcher that matches any character matched by either this matcher or
other . |
CharMatcher |
precomputed()
Returns a
char matcher functionally equivalent to this one, but which may be faster to
query than the original; your mileage may vary. |
String |
removeFrom(CharSequence sequence)
Returns a string containing all non-matching characters of a character sequence, in order.
|
String |
replaceFrom(CharSequence sequence,
char replacement)
Returns a string copy of the input character sequence, with each character that matches this
matcher replaced by a given replacement character.
|
String |
replaceFrom(CharSequence sequence,
CharSequence replacement)
Returns a string copy of the input character sequence, with each character that matches this
matcher replaced by a given replacement sequence.
|
String |
retainFrom(CharSequence sequence)
Returns a string containing all matching characters of a character sequence, in order.
|
String |
toString()
Returns a string representation of this
CharMatcher , such as
CharMatcher.or(WHITESPACE, JAVA_DIGIT) . |
String |
trimAndCollapseFrom(CharSequence sequence,
char replacement)
Collapses groups of matching characters exactly as
collapseFrom(java.lang.CharSequence, char) does, except that
groups of matching characters at the start or end of the sequence are removed without
replacement. |
String |
trimFrom(CharSequence sequence)
Returns a substring of the input character sequence that omits all characters this matcher
matches from the beginning and from the end of the string.
|
String |
trimLeadingFrom(CharSequence sequence)
Returns a substring of the input character sequence that omits all characters this matcher
matches from the beginning of the string.
|
String |
trimTrailingFrom(CharSequence sequence)
Returns a substring of the input character sequence that omits all characters this matcher
matches from the end of the string.
|
public static final CharMatcher BREAKING_WHITESPACE
WHITESPACE
for a
discussion of that term.public static final CharMatcher ASCII
public static final CharMatcher DIGIT
public static final CharMatcher JAVA_DIGIT
Java's
definition
. If you only care to match ASCII digits, you can use inRange('0', '9')
.public static final CharMatcher JAVA_LETTER
Java's
definition
. If you only care to match letters of the Latin alphabet, you can use inRange('a', 'z').or(inRange('A', 'Z'))
.public static final CharMatcher JAVA_LETTER_OR_DIGIT
Java's definition
.public static final CharMatcher JAVA_UPPER_CASE
Java's definition
.public static final CharMatcher JAVA_LOWER_CASE
Java's definition
.public static final CharMatcher JAVA_ISO_CONTROL
Character.isISOControl(char)
.public static final CharMatcher INVISIBLE
public static final CharMatcher SINGLE_WIDTH
false
(that is, it tends to assume a character is
double-width).
Note: as the reference file evolves, we will modify this constant to keep it up to date.
public static final CharMatcher ANY
public static final CharMatcher NONE
public static final CharMatcher WHITESPACE
Note: as the Unicode definition evolves, we will modify this constant to keep it up to date.
protected CharMatcher()
toString()
to provide a useful description.public static CharMatcher is(char match)
char
matcher that matches only one specified character.public static CharMatcher isNot(char match)
char
matcher that matches any character except the one specified.
To negate another CharMatcher
, use negate()
.
public static CharMatcher anyOf(CharSequence sequence)
char
matcher that matches any character present in the given character
sequence.public static CharMatcher noneOf(CharSequence sequence)
char
matcher that matches any character not present in the given character
sequence.public static CharMatcher inRange(char startInclusive, char endInclusive)
char
matcher that matches any character in a given range (both endpoints are
inclusive). For example, to match any lowercase letter of the English alphabet, use CharMatcher.inRange('a', 'z')
.IllegalArgumentException
- if endInclusive < startInclusive
public static CharMatcher forPredicate(Predicate<? super Character> predicate)
Character
-based predicate, but
which operates on primitive char
instances instead.public abstract boolean matches(char c)
public CharMatcher negate()
public CharMatcher and(CharMatcher other)
other
.public CharMatcher or(CharMatcher other)
other
.public CharMatcher precomputed()
char
matcher functionally equivalent to this one, but which may be faster to
query than the original; your mileage may vary. Precomputation takes time and is likely to be
worthwhile only if the precomputed matcher is queried many thousands of times.
This method has no effect (returns this
) when called in GWT: it's unclear whether a
precomputed matcher is faster, but it certainly consumes more memory, which doesn't seem like a
worthwhile tradeoff in a browser.
public boolean matchesAnyOf(CharSequence sequence)
true
if a character sequence contains at least one matching character.
Equivalent to !matchesNoneOf(sequence)
.
The default implementation iterates over the sequence, invoking matches(char)
for each
character, until this returns true
or the end is reached.
sequence
- the character sequence to examine, possibly emptytrue
if this matcher matches at least one character in the sequencepublic boolean matchesAllOf(CharSequence sequence)
true
if a character sequence contains only matching characters.
The default implementation iterates over the sequence, invoking matches(char)
for each
character, until this returns false
or the end is reached.
sequence
- the character sequence to examine, possibly emptytrue
if this matcher matches every character in the sequence, including when
the sequence is emptypublic boolean matchesNoneOf(CharSequence sequence)
true
if a character sequence contains no matching characters. Equivalent to
!matchesAnyOf(sequence)
.
The default implementation iterates over the sequence, invoking matches(char)
for each
character, until this returns false
or the end is reached.
sequence
- the character sequence to examine, possibly emptytrue
if this matcher matches every character in the sequence, including when
the sequence is emptypublic int indexIn(CharSequence sequence)
-1
if no
matching character is present.
The default implementation iterates over the sequence in forward order calling matches(char)
for each character.
sequence
- the character sequence to examine from the beginning-1
if no character matchespublic int indexIn(CharSequence sequence, int start)
-1
if no character matches after that position.
The default implementation iterates over the sequence in forward order, beginning at start
, calling matches(char)
for each character.
sequence
- the character sequence to examinestart
- the first index to examine; must be nonnegative and no greater than sequence.length()
start
,
or -1
if no character matchesIndexOutOfBoundsException
- if start is negative or greater than sequence.length()
public int lastIndexIn(CharSequence sequence)
-1
if no
matching character is present.
The default implementation iterates over the sequence in reverse order calling matches(char)
for each character.
sequence
- the character sequence to examine from the end-1
if no character matchespublic int countIn(CharSequence sequence)
@CheckReturnValue public String removeFrom(CharSequence sequence)
CharMatcher.is('a').removeFrom("bazaar")
... returns "bzr"
.@CheckReturnValue public String retainFrom(CharSequence sequence)
CharMatcher.is('a').retainFrom("bazaar")
... returns "aaa"
.@CheckReturnValue public String replaceFrom(CharSequence sequence, char replacement)
CharMatcher.is('a').replaceFrom("radar", 'o')
... returns "rodor"
.
The default implementation uses indexIn(CharSequence)
to find the first matching
character, then iterates the remainder of the sequence calling matches(char)
for each
character.
sequence
- the character sequence to replace matching characters inreplacement
- the character to append to the result string in place of each matching
character in sequence
@CheckReturnValue public String replaceFrom(CharSequence sequence, CharSequence replacement)
CharMatcher.is('a').replaceFrom("yaha", "oo")
... returns "yoohoo"
.
Note: If the replacement is a fixed string with only one character, you are better
off calling replaceFrom(CharSequence, char)
directly.
sequence
- the character sequence to replace matching characters inreplacement
- the characters to append to the result string in place of each matching
character in sequence
@CheckReturnValue public String trimFrom(CharSequence sequence)
CharMatcher.anyOf("ab").trimFrom("abacatbab")
... returns "cat"
.
Note that:
CharMatcher.inRange('\0', ' ').trimFrom(str)
... is equivalent to String.trim()
.@CheckReturnValue public String trimLeadingFrom(CharSequence sequence)
CharMatcher.anyOf("ab").trimLeadingFrom("abacatbab")
... returns "catbab"
.@CheckReturnValue public String trimTrailingFrom(CharSequence sequence)
CharMatcher.anyOf("ab").trimTrailingFrom("abacatbab")
... returns "abacat"
.@CheckReturnValue public String collapseFrom(CharSequence sequence, char replacement)
CharMatcher.anyOf("eko").collapseFrom("bookkeeper", '-')
... returns "b-p-r"
.
The default implementation uses indexIn(CharSequence)
to find the first matching
character, then iterates the remainder of the sequence calling matches(char)
for each
character.
sequence
- the character sequence to replace matching groups of characters inreplacement
- the character to append to the result string in place of each group of
matching characters in sequence
@CheckReturnValue public String trimAndCollapseFrom(CharSequence sequence, char replacement)
collapseFrom(java.lang.CharSequence, char)
does, except that
groups of matching characters at the start or end of the sequence are removed without
replacement.public boolean apply(Character character)
matches(char)
; provided only to satisfy the Predicate
interface. When
using a reference of type CharMatcher
, invoke matches(char)
directly instead.Copyright © 2010-2013. All Rights Reserved.