Class Ascii

java.lang.Object
com.google.common.base.Ascii

@GwtCompatible public final class Ascii extends Object
Static methods pertaining to ASCII characters (those in the range of values 0x00 through 0x7F), and to strings containing such characters.

ASCII utilities also exist in other classes of this package:

Since:
7.0
Author:
Catherine Berry, Gregory Kick
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final byte
    Acknowledge: A communication control character transmitted by a receiver as an affirmative response to a sender.
    static final byte
    Bell ('\a'): A character for use when there is a need to call for human attention.
    static final byte
    Backspace ('\b'): A format effector which controls the movement of the printing position one printing space backward on the same printing line.
    static final byte
    Cancel: A control character used to indicate that the data with which it is sent is in error or is to be disregarded.
    static final byte
    Carriage Return ('\r'): A format effector which controls the movement of the printing position to the first printing position on the same printing line.
    static final byte
    Device Control 1.
    static final byte
    Device Control 2.
    static final byte
    Device Control 3.
    static final byte
    Device Control 4.
    static final byte
    Delete: This character is used primarily to "erase" or "obliterate" erroneous or unwanted characters in perforated tape.
    static final byte
    Data Link Escape: A communication control character which will change the meaning of a limited number of contiguously following characters.
    static final byte
    End of Medium: A control character associated with the sent data which may be used to identify the physical end of the medium, or the end of the used, or wanted, portion of information recorded on a medium.
    static final byte
    Enquiry: A communication control character used in data communication systems as a request for a response from a remote station.
    static final byte
    End of Transmission: A communication control character used to indicate the conclusion of a transmission, which may have contained one or more texts and any associated headings.
    static final byte
    Escape: A control character intended to provide code extension (supplementary characters) in general information interchange.
    static final byte
    End of Transmission Block: A communication control character used to indicate the end of a block of data for communication purposes.
    static final byte
    End of Text: A communication control character used to terminate a sequence of characters started with STX and transmitted as an entity.
    static final byte
    Form Feed ('\f'): A format effector which controls the movement of the printing position to the first pre-determined printing line on the next form or page.
    static final byte
    File Separator: These four information separators may be used within data in optional fashion, except that their hierarchical relationship shall be: FS is the most inclusive, then GS, then RS, and US is least inclusive.
    static final byte
    Group Separator: These four information separators may be used within data in optional fashion, except that their hierarchical relationship shall be: FS is the most inclusive, then GS, then RS, and US is least inclusive.
    static final byte
    Horizontal Tabulation ('\t'): A format effector which controls the movement of the printing position to the next in a series of predetermined positions along the printing line.
    static final byte
    Line Feed ('\n'): A format effector which controls the movement of the printing position to the next printing line.
    static final char
    The maximum value of an ASCII character.
    static final char
    The minimum value of an ASCII character.
    static final byte
    Negative Acknowledge: A communication control character transmitted by a receiver as a negative response to the sender.
    static final byte
    Alternate name for LF.
    static final byte
    Null ('\0'): The all-zeros character which may serve to accomplish time fill and media fill.
    static final byte
    Record Separator: These four information separators may be used within data in optional fashion, except that their hierarchical relationship shall be: FS is the most inclusive, then GS, then RS, and US is least inclusive.
    static final byte
    Shift In: A control character indicating that the code combinations which follow shall be interpreted according to the standard code table.
    static final byte
    Shift Out: A control character indicating that the code combinations which follow shall be interpreted as outside of the character set of the standard code table until a Shift In character is reached.
    static final byte
    Start of Heading: A communication control character used at the beginning of a sequence of characters which constitute a machine-sensible address or routing information.
    static final byte
    Space: A normally non-printing graphic character used to separate words.
    static final byte
    Alternate name for SP.
    static final byte
    Start of Text: A communication control character which precedes a sequence of characters that is to be treated as an entity and entirely transmitted through to the ultimate destination.
    static final byte
    Substitute: A character that may be substituted for a character which is determined to be invalid or in error.
    static final byte
    Synchronous Idle: A communication control character used by a synchronous transmission system in the absence of any other character to provide a signal from which synchronism may be achieved or retained.
    static final byte
    Unit Separator: These four information separators may be used within data in optional fashion, except that their hierarchical relationship shall be: FS is the most inclusive, then GS, then RS, and US is least inclusive.
    static final byte
    Vertical Tabulation ('\v'): A format effector which controls the movement of the printing position to the next in a series of predetermined printing lines.
    static final byte
    Transmission off.
    static final byte
    Transmission On: Although originally defined as DC1, this ASCII control character is now better known as the XON code used for software flow control in serial communications.
  • Method Summary

    Modifier and Type
    Method
    Description
    static boolean
    Indicates whether the contents of the given character sequences s1 and s2 are equal, ignoring the case of any ASCII alphabetic characters between 'a' and 'z' or 'A' and 'Z' inclusive.
    static boolean
    isLowerCase(char c)
    Indicates whether c is one of the twenty-six lowercase ASCII alphabetic characters between 'a' and 'z' inclusive.
    static boolean
    isUpperCase(char c)
    Indicates whether c is one of the twenty-six uppercase ASCII alphabetic characters between 'A' and 'Z' inclusive.
    static char
    toLowerCase(char c)
    If the argument is an uppercase ASCII character, returns the lowercase equivalent.
    static String
    Returns a copy of the input character sequence in which all uppercase ASCII characters have been converted to lowercase.
    static String
    Returns a copy of the input string in which all uppercase ASCII characters have been converted to lowercase.
    static char
    toUpperCase(char c)
    If the argument is a lowercase ASCII character, returns the uppercase equivalent.
    static String
    Returns a copy of the input character sequence in which all lowercase ASCII characters have been converted to uppercase.
    static String
    Returns a copy of the input string in which all lowercase ASCII characters have been converted to uppercase.
    static String
    truncate(CharSequence seq, int maxLength, String truncationIndicator)
    Truncates the given character sequence to the given maximum length.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • NUL

      public static final byte NUL
      Null ('\0'): The all-zeros character which may serve to accomplish time fill and media fill. Normally used as a C string terminator.

      Although RFC 20 names this as "Null", note that it is distinct from the C/C++ "NULL" pointer.

      Since:
      8.0
      See Also:
    • SOH

      public static final byte SOH
      Start of Heading: A communication control character used at the beginning of a sequence of characters which constitute a machine-sensible address or routing information. Such a sequence is referred to as the "heading." An STX character has the effect of terminating a heading.
      Since:
      8.0
      See Also:
    • STX

      public static final byte STX
      Start of Text: A communication control character which precedes a sequence of characters that is to be treated as an entity and entirely transmitted through to the ultimate destination. Such a sequence is referred to as "text." STX may be used to terminate a sequence of characters started by SOH.
      Since:
      8.0
      See Also:
    • ETX

      public static final byte ETX
      End of Text: A communication control character used to terminate a sequence of characters started with STX and transmitted as an entity.
      Since:
      8.0
      See Also:
    • EOT

      public static final byte EOT
      End of Transmission: A communication control character used to indicate the conclusion of a transmission, which may have contained one or more texts and any associated headings.
      Since:
      8.0
      See Also:
    • ENQ

      public static final byte ENQ
      Enquiry: A communication control character used in data communication systems as a request for a response from a remote station. It may be used as a "Who Are You" (WRU) to obtain identification, or may be used to obtain station status, or both.
      Since:
      8.0
      See Also:
    • ACK

      public static final byte ACK
      Acknowledge: A communication control character transmitted by a receiver as an affirmative response to a sender.
      Since:
      8.0
      See Also:
    • BEL

      public static final byte BEL
      Bell ('\a'): A character for use when there is a need to call for human attention. It may control alarm or attention devices.
      Since:
      8.0
      See Also:
    • BS

      public static final byte BS
      Backspace ('\b'): A format effector which controls the movement of the printing position one printing space backward on the same printing line. (Applicable also to display devices.)
      Since:
      8.0
      See Also:
    • HT

      public static final byte HT
      Horizontal Tabulation ('\t'): A format effector which controls the movement of the printing position to the next in a series of predetermined positions along the printing line. (Applicable also to display devices and the skip function on punched cards.)
      Since:
      8.0
      See Also:
    • LF

      public static final byte LF
      Line Feed ('\n'): A format effector which controls the movement of the printing position to the next printing line. (Applicable also to display devices.) Where appropriate, this character may have the meaning "New Line" (NL), a format effector which controls the movement of the printing point to the first printing position on the next printing line. Use of this convention requires agreement between sender and recipient of data.
      Since:
      8.0
      See Also:
    • NL

      public static final byte NL
      Alternate name for LF. (LF is preferred.)
      Since:
      8.0
      See Also:
    • VT

      public static final byte VT
      Vertical Tabulation ('\v'): A format effector which controls the movement of the printing position to the next in a series of predetermined printing lines. (Applicable also to display devices.)
      Since:
      8.0
      See Also:
    • FF

      public static final byte FF
      Form Feed ('\f'): A format effector which controls the movement of the printing position to the first pre-determined printing line on the next form or page. (Applicable also to display devices.)
      Since:
      8.0
      See Also:
    • CR

      public static final byte CR
      Carriage Return ('\r'): A format effector which controls the movement of the printing position to the first printing position on the same printing line. (Applicable also to display devices.)
      Since:
      8.0
      See Also:
    • SO

      public static final byte SO
      Shift Out: A control character indicating that the code combinations which follow shall be interpreted as outside of the character set of the standard code table until a Shift In character is reached.
      Since:
      8.0
      See Also:
    • SI

      public static final byte SI
      Shift In: A control character indicating that the code combinations which follow shall be interpreted according to the standard code table.
      Since:
      8.0
      See Also:
    • DLE

      public static final byte DLE
      Data Link Escape: A communication control character which will change the meaning of a limited number of contiguously following characters. It is used exclusively to provide supplementary controls in data communication networks.
      Since:
      8.0
      See Also:
    • DC1

      public static final byte DC1
      Device Control 1. Characters for the control of ancillary devices associated with data processing or telecommunication systems, more especially switching devices "on" or "off." (If a single "stop" control is required to interrupt or turn off ancillary devices, DC4 is the preferred assignment.)
      Since:
      8.0
      See Also:
    • XON

      public static final byte XON
      Transmission On: Although originally defined as DC1, this ASCII control character is now better known as the XON code used for software flow control in serial communications. The main use is restarting the transmission after the communication has been stopped by the XOFF control code.
      Since:
      8.0
      See Also:
    • DC2

      public static final byte DC2
      Device Control 2. Characters for the control of ancillary devices associated with data processing or telecommunication systems, more especially switching devices "on" or "off." (If a single "stop" control is required to interrupt or turn off ancillary devices, DC4 is the preferred assignment.)
      Since:
      8.0
      See Also:
    • DC3

      public static final byte DC3
      Device Control 3. Characters for the control of ancillary devices associated with data processing or telecommunication systems, more especially switching devices "on" or "off." (If a single "stop" control is required to interrupt or turn off ancillary devices, DC4 is the preferred assignment.)
      Since:
      8.0
      See Also:
    • XOFF

      public static final byte XOFF
      Transmission off. See XON for explanation.
      Since:
      8.0
      See Also:
    • DC4

      public static final byte DC4
      Device Control 4. Characters for the control of ancillary devices associated with data processing or telecommunication systems, more especially switching devices "on" or "off." (If a single "stop" control is required to interrupt or turn off ancillary devices, DC4 is the preferred assignment.)
      Since:
      8.0
      See Also:
    • NAK

      public static final byte NAK
      Negative Acknowledge: A communication control character transmitted by a receiver as a negative response to the sender.
      Since:
      8.0
      See Also:
    • SYN

      public static final byte SYN
      Synchronous Idle: A communication control character used by a synchronous transmission system in the absence of any other character to provide a signal from which synchronism may be achieved or retained.
      Since:
      8.0
      See Also:
    • ETB

      public static final byte ETB
      End of Transmission Block: A communication control character used to indicate the end of a block of data for communication purposes. ETB is used for blocking data where the block structure is not necessarily related to the processing format.
      Since:
      8.0
      See Also:
    • CAN

      public static final byte CAN
      Cancel: A control character used to indicate that the data with which it is sent is in error or is to be disregarded.
      Since:
      8.0
      See Also:
    • EM

      public static final byte EM
      End of Medium: A control character associated with the sent data which may be used to identify the physical end of the medium, or the end of the used, or wanted, portion of information recorded on a medium. (The position of this character does not necessarily correspond to the physical end of the medium.)
      Since:
      8.0
      See Also:
    • SUB

      public static final byte SUB
      Substitute: A character that may be substituted for a character which is determined to be invalid or in error.
      Since:
      8.0
      See Also:
    • ESC

      public static final byte ESC
      Escape: A control character intended to provide code extension (supplementary characters) in general information interchange. The Escape character itself is a prefix affecting the interpretation of a limited number of contiguously following characters.
      Since:
      8.0
      See Also:
    • FS

      public static final byte FS
      File Separator: These four information separators may be used within data in optional fashion, except that their hierarchical relationship shall be: FS is the most inclusive, then GS, then RS, and US is least inclusive. (The content and length of a File, Group, Record, or Unit are not specified.)
      Since:
      8.0
      See Also:
    • GS

      public static final byte GS
      Group Separator: These four information separators may be used within data in optional fashion, except that their hierarchical relationship shall be: FS is the most inclusive, then GS, then RS, and US is least inclusive. (The content and length of a File, Group, Record, or Unit are not specified.)
      Since:
      8.0
      See Also:
    • RS

      public static final byte RS
      Record Separator: These four information separators may be used within data in optional fashion, except that their hierarchical relationship shall be: FS is the most inclusive, then GS, then RS, and US is least inclusive. (The content and length of a File, Group, Record, or Unit are not specified.)
      Since:
      8.0
      See Also:
    • US

      public static final byte US
      Unit Separator: These four information separators may be used within data in optional fashion, except that their hierarchical relationship shall be: FS is the most inclusive, then GS, then RS, and US is least inclusive. (The content and length of a File, Group, Record, or Unit are not specified.)
      Since:
      8.0
      See Also:
    • SP

      public static final byte SP
      Space: A normally non-printing graphic character used to separate words. It is also a format effector which controls the movement of the printing position, one printing position forward. (Applicable also to display devices.)
      Since:
      8.0
      See Also:
    • SPACE

      public static final byte SPACE
      Alternate name for SP.
      Since:
      8.0
      See Also:
    • DEL

      public static final byte DEL
      Delete: This character is used primarily to "erase" or "obliterate" erroneous or unwanted characters in perforated tape.
      Since:
      8.0
      See Also:
    • MIN

      public static final char MIN
      The minimum value of an ASCII character.
      Since:
      9.0 (was type int before 12.0)
      See Also:
    • MAX

      public static final char MAX
      The maximum value of an ASCII character.
      Since:
      9.0 (was type int before 12.0)
      See Also:
  • Method Details

    • toLowerCase

      public static String toLowerCase(String string)
      Returns a copy of the input string in which all uppercase ASCII characters have been converted to lowercase. All other characters are copied without modification.
    • toLowerCase

      public static String toLowerCase(CharSequence chars)
      Returns a copy of the input character sequence in which all uppercase ASCII characters have been converted to lowercase. All other characters are copied without modification.
      Since:
      14.0
    • toLowerCase

      public static char toLowerCase(char c)
      If the argument is an uppercase ASCII character, returns the lowercase equivalent. Otherwise returns the argument.
    • toUpperCase

      public static String toUpperCase(String string)
      Returns a copy of the input string in which all lowercase ASCII characters have been converted to uppercase. All other characters are copied without modification.
    • toUpperCase

      public static String toUpperCase(CharSequence chars)
      Returns a copy of the input character sequence in which all lowercase ASCII characters have been converted to uppercase. All other characters are copied without modification.
      Since:
      14.0
    • toUpperCase

      public static char toUpperCase(char c)
      If the argument is a lowercase ASCII character, returns the uppercase equivalent. Otherwise returns the argument.
    • isLowerCase

      public static boolean isLowerCase(char c)
      Indicates whether c is one of the twenty-six lowercase ASCII alphabetic characters between 'a' and 'z' inclusive. All others (including non-ASCII characters) return false.
    • isUpperCase

      public static boolean isUpperCase(char c)
      Indicates whether c is one of the twenty-six uppercase ASCII alphabetic characters between 'A' and 'Z' inclusive. All others (including non-ASCII characters) return false.
    • truncate

      public static String truncate(CharSequence seq, int maxLength, String truncationIndicator)
      Truncates the given character sequence to the given maximum length. If the length of the sequence is greater than maxLength, the returned string will be exactly maxLength chars in length and will end with the given truncationIndicator. Otherwise, the sequence will be returned as a string with no changes to the content.

      Examples:

      
       Ascii.truncate("foobar", 7, "..."); // returns "foobar"
       Ascii.truncate("foobar", 5, "..."); // returns "fo..."
       

      Note: This method may work with certain non-ASCII text but is not safe for use with arbitrary Unicode text. It is mostly intended for use with text that is known to be safe for use with it (such as all-ASCII text) and for simple debugging text. When using this method, consider the following:

      • it may split surrogate pairs
      • it may split characters and combining characters
      • it does not consider word boundaries
      • if truncating for display to users, there are other considerations that must be taken into account
      • the appropriate truncation indicator may be locale-dependent
      • it is safe to use non-ASCII characters in the truncation indicator
      Throws:
      IllegalArgumentException - if maxLength is less than the length of truncationIndicator
      Since:
      16.0
    • equalsIgnoreCase

      public static boolean equalsIgnoreCase(CharSequence s1, CharSequence s2)
      Indicates whether the contents of the given character sequences s1 and s2 are equal, ignoring the case of any ASCII alphabetic characters between 'a' and 'z' or 'A' and 'Z' inclusive.

      This method is significantly faster than String.equalsIgnoreCase(java.lang.String) and should be used in preference if at least one of the parameters is known to contain only ASCII characters.

      Note however that this method does not always behave identically to expressions such as:

      • string.toUpperCase().equals("UPPER CASE ASCII")
      • string.toLowerCase().equals("lower case ascii")

      due to case-folding of some non-ASCII characters (which does not occur in String.equalsIgnoreCase(java.lang.String)). However in almost all cases that ASCII strings are used, the author probably wanted the behavior provided by this method rather than the subtle and sometimes surprising behavior of toUpperCase() and toLowerCase().

      Since:
      16.0