- NAME
- encoding — Manipulate encodings
- SYNOPSIS
- INTRODUCTION
- DESCRIPTION
- encoding
convertfrom ?encoding? data
- encoding
convertto ?encoding? string
- encoding
dirs ?directoryList?
- encoding
names
- encoding
system ?encoding?
- EXAMPLE
- SEE
ALSO
- KEYWORDS
encoding — Manipulate encodings
encoding option ?arg arg ...?
Strings in Tcl are logically a sequence of 16-bit Unicode
characters. These strings are represented in memory as a sequence
of bytes that may be in one of several encodings: modified UTF-8
(which uses 1 to 3 bytes per character), 16-bit “Unicode” (which
uses 2 bytes per character, with an endianness that is dependent on
the host architecture), and binary (which uses a single byte per
character but only handles a restricted range of characters). Tcl
does not guarantee to always use the same encoding for the same
string.
Different operating system interfaces or applications may
generate strings in other encodings such as Shift-JIS. The
encoding command helps to bridge the gap between Unicode and
these other formats.
Performs one of several encoding related operations, depending on
option. The legal options are:
- encoding convertfrom
?encoding? data
- Convert data to Unicode from the specified
encoding. The characters in data are treated as
binary data where the lower 8-bits of each character is taken as a
single byte. The resulting sequence of bytes is treated as a string
in the specified encoding. If encoding is not
specified, the current system encoding is used.
- encoding convertto
?encoding? string
- Convert string from Unicode to the specified
encoding. The result is a sequence of bytes that represents
the converted string. Each byte is stored in the lower 8-bits of a
Unicode character (indeed, the resulting string is a binary string
as far as Tcl is concerned, at least initially). If encoding
is not specified, the current system encoding is used.
- encoding dirs
?directoryList?
- Tcl can load encoding data files from the file system that
describe additional encodings for it to work with. This command
sets the search path for *.enc encoding data files to the
list of directories directoryList. If directoryList
is omitted then the command returns the current list of directories
that make up the search path. It is an error for
directoryList to not be a valid list. If, when a search for
an encoding data file is happening, an element in
directoryList does not refer to a readable, searchable
directory, that element is ignored.
- encoding names
- Returns a list containing the names of all of the encodings
that are currently available. The encodings “utf-8” and “iso8859-1”
are guaranteed to be present in the list.
- encoding system
?encoding?
- Set the system encoding to encoding. If encoding
is omitted then the command returns the current system encoding.
The system encoding is used whenever Tcl passes strings to system
calls.
It is common practice to write script files using a text editor
that produces output in the euc-jp encoding, which represents the
ASCII characters as singe bytes and Japanese characters as two
bytes. This makes it easy to embed literal strings that correspond
to non-ASCII characters by simply typing the strings in place in
the script. However, because the source command always reads files
using the current system encoding, Tcl will only source such files
correctly when the encoding used to write the file is the same.
This tends not to be true in an internationalized setting. For
example, if such a file was sourced in North America (where the
ISO8859-1 is normally used), each byte in the file would be treated
as a separate character that maps to the 00 page in Unicode. The
resulting Tcl strings will not contain the expected Japanese
characters. Instead, they will contain a sequence of Latin-1
characters that correspond to the bytes of the original string. The
encoding command can be used to convert this string to the
expected Japanese Unicode characters. For example,
set s [encoding convertfrom euc-jp "\xA4\xCF"]
would return the Unicode string “\u306F”, which is the Hiragana
letter HA.
Tcl_GetEncoding
encoding, unicode
Copyright © 1998 Scriptics Corporation.