Package com.google.common.base
Class Utf8
- java.lang.Object
-
- com.google.common.base.Utf8
-
@GwtCompatible(emulated=true) public final class Utf8 extends java.lang.Object
Low-level, high-performance utility methods related to the UTF-8 character encoding. UTF-8 is defined in section D92 of The Unicode Standard Core Specification, Chapter 3.The variant of UTF-8 implemented by this class is the restricted definition of UTF-8 introduced in Unicode 3.1. One implication of this is that it rejects "non-shortest form" byte sequences, even though the JDK decoder may accept them.
- Since:
- 16.0
- Author:
- Martin Buchholz, Clément Roux
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static intencodedLength(java.lang.CharSequence sequence)Returns the number of bytes in the UTF-8-encoded form ofsequence.static booleanisWellFormed(byte[] bytes)Returnstrueifbytesis a well-formed UTF-8 byte sequence according to Unicode 6.0.static booleanisWellFormed(byte[] bytes, int off, int len)Returns whether the given byte array slice is a well-formed UTF-8 byte sequence, as defined byisWellFormed(byte[]).
-
-
-
Method Detail
-
encodedLength
public static int encodedLength(java.lang.CharSequence sequence)
Returns the number of bytes in the UTF-8-encoded form ofsequence. For a string, this method is equivalent tostring.getBytes(UTF_8).length, but is more efficient in both time and space.- Throws:
java.lang.IllegalArgumentException- ifsequencecontains ill-formed UTF-16 (unpaired surrogates)
-
isWellFormed
public static boolean isWellFormed(byte[] bytes)
Returnstrueifbytesis a well-formed UTF-8 byte sequence according to Unicode 6.0. Note that this is a stronger criterion than simply whether the bytes can be decoded. For example, some versions of the JDK decoder will accept "non-shortest form" byte sequences, but encoding never reproduces these. Such byte sequences are not considered well-formed.This method returns
trueif and only ifArrays.equals(bytes, new String(bytes, UTF_8).getBytes(UTF_8))does, but is more efficient in both time and space.
-
isWellFormed
public static boolean isWellFormed(byte[] bytes, int off, int len)
Returns whether the given byte array slice is a well-formed UTF-8 byte sequence, as defined byisWellFormed(byte[]). Note that this can be false even whenisWellFormed(bytes)is true.- Parameters:
bytes- the input bufferoff- the offset in the buffer of the first byte to readlen- the number of bytes to read from the buffer
-
-