@Beta @GwtCompatible(emulated=true) public final class Utf8 extends Object
The variant of UTF-8 implemented by this class is the restricted definition of UTF-8 introduced in Unicode 3.1. One implication of this is that it rejects "non-shortest form" byte sequences, even though the JDK decoder may accept them.
|Modifier and Type||Method and Description|
Returns the number of bytes in the UTF-8-encoded form of
Returns whether the given byte array slice is a well-formed UTF-8 byte sequence, as defined by
public static int encodedLength(CharSequence sequence)
sequence. For a string, this method is equivalent to
string.getBytes(UTF_8).length, but is more efficient in both time and space.
sequencecontains ill-formed UTF-16 (unpaired surrogates)
public static boolean isWellFormed(byte bytes)
bytesis a well-formed UTF-8 byte sequence according to Unicode 6.0. Note that this is a stronger criterion than simply whether the bytes can be decoded. For example, some versions of the JDK decoder will accept "non-shortest form" byte sequences, but encoding never reproduces these. Such byte sequences are not considered well-formed.
This method returns
true if and only if
String(bytes, UTF_8).getBytes(UTF_8)) does, but is more efficient in both time and space.
public static boolean isWellFormed(byte bytes, int off, int len)
isWellFormed(byte). Note that this can be false even when
bytes- the input buffer
off- the offset in the buffer of the first byte to read
len- the number of bytes to read from the buffer
Copyright © 2010–2020. All rights reserved.