@Beta @GwtCompatible public abstract class ArrayBasedUnicodeEscaper extends UnicodeEscaper
UnicodeEscaper
that uses an array to quickly look up replacement
characters for a given code point. An additional safe range is provided that
determines whether code points without specific replacements are to be
considered safe and left unescaped or should be escaped in a general way.
A good example of usage of this class is for HTML escaping where the
replacement array contains information about the named HTML entities
such as &
and "
while escapeUnsafe(int)
is
overridden to handle general escaping of the form &#NNNNN;
.
The size of the data structure used by ArrayBasedUnicodeEscaper
is
proportional to the highest valued code point that requires escaping.
For example a replacement map containing the single character
'\
u1000
' will require approximately 16K of memory. If you
need to create multiple escaper instances that have the same character
replacement mapping consider using ArrayBasedEscaperMap
.
Modifier | Constructor and Description |
---|---|
protected |
ArrayBasedUnicodeEscaper(ArrayBasedEscaperMap escaperMap,
int safeMin,
int safeMax,
String unsafeReplacement)
Creates a new ArrayBasedUnicodeEscaper instance with the given replacement
map and specified safe range.
|
protected |
ArrayBasedUnicodeEscaper(Map<Character,String> replacementMap,
int safeMin,
int safeMax,
String unsafeReplacement)
Creates a new ArrayBasedUnicodeEscaper instance with the given replacement
map and specified safe range.
|
Modifier and Type | Method and Description |
---|---|
protected char[] |
escape(int cp)
Escapes a single Unicode code point using the replacement array and safe
range values.
|
String |
escape(String s)
Returns the escaped form of a given literal string.
|
protected abstract char[] |
escapeUnsafe(int cp)
Escapes a code point that has no direct explicit value in the replacement
array and lies outside the stated safe range.
|
protected int |
nextEscapeIndex(CharSequence csq,
int index,
int end)
Scans a sub-sequence of characters from a given
CharSequence ,
returning the index of the next character that requires escaping. |
codePointAt, escapeSlow
asFunction
protected ArrayBasedUnicodeEscaper(Map<Character,String> replacementMap, int safeMin, int safeMax, @Nullable String unsafeReplacement)
safeMax < safeMin
then no code
points are considered safe.
If a code point has no mapped replacement then it is checked against the
safe range. If it lies outside that, then escapeUnsafe(int)
is
called, otherwise no escaping is performed.
replacementMap
- a map of characters to their escaped representationssafeMin
- the lowest character value in the safe rangesafeMax
- the highest character value in the safe rangeunsafeReplacement
- the default replacement for unsafe characters or
null if no default replacement is requiredprotected ArrayBasedUnicodeEscaper(ArrayBasedEscaperMap escaperMap, int safeMin, int safeMax, @Nullable String unsafeReplacement)
safeMax < safeMin
then no code
points are considered safe. This initializer is useful when explicit
instances of ArrayBasedEscaperMap are used to allow the sharing of large
replacement mappings.
If a code point has no mapped replacement then it is checked against the
safe range. If it lies outside that, then escapeUnsafe(int)
is
called, otherwise no escaping is performed.
escaperMap
- the map of replacementssafeMin
- the lowest character value in the safe rangesafeMax
- the highest character value in the safe rangeunsafeReplacement
- the default replacement for unsafe characters or
null if no default replacement is requiredpublic final String escape(String s)
UnicodeEscaper
If you are escaping input in arbitrary successive chunks, then it is not
generally safe to use this method. If an input string ends with an
unmatched high surrogate character, then this method will throw
IllegalArgumentException
. You should ensure your input is valid UTF-16 before calling this
method.
Note: When implementing an escaper it is a good idea to override
this method for efficiency by inlining the implementation of
UnicodeEscaper.nextEscapeIndex(CharSequence, int, int)
directly. Doing this for
PercentEscaper
more than doubled the
performance for unescaped strings (as measured by CharEscapersBenchmark
).
escape
in class UnicodeEscaper
s
- the literal string to be escapedstring
protected final int nextEscapeIndex(CharSequence csq, int index, int end)
UnicodeEscaper
CharSequence
,
returning the index of the next character that requires escaping.
Note: When implementing an escaper, it is a good idea to override
this method for efficiency. The base class implementation determines
successive Unicode code points and invokes UnicodeEscaper.escape(int)
for each of
them. If the semantics of your escaper are such that code points in the
supplementary range are either all escaped or all unescaped, this method
can be implemented more efficiently using CharSequence.charAt(int)
.
Note however that if your escaper does not escape characters in the supplementary range, you should either continue to validate the correctness of any surrogate characters encountered or provide a clear warning to users that your escaper does not validate its input.
See PercentEscaper
for an example.
nextEscapeIndex
in class UnicodeEscaper
csq
- a sequence of charactersindex
- the index of the first character to be scannedend
- the index immediately after the last character to be scannedprotected final char[] escape(int cp)
escapeUnsafe(int)
is called.escape
in class UnicodeEscaper
cp
- the Unicode code point to escape if necessarynull
if no escaping was
neededprotected abstract char[] escapeUnsafe(int cp)
Note that arrays returned by this method must not be modified once they have been returned. However it is acceptable to return the same array multiple times (even for different input characters).
cp
- the Unicode code point to escapenull
if no escaping was
requiredCopyright © 2010-2014. All Rights Reserved.