Control character
Characters generally represent the graphemes, or written
symbols, of a language in computer storage or electronic communications.
It is often useful, however, to include along with those characters additional
controlling information convenient to help process these characters.
For example, a printer connected to a computer receives
instructions to print characters on paper, and must also receive instructions
to do things like control where the characters are placed on a page, to eject
a page, signal the beginning or end of the transmission, or other functions.
It is convenient to send these instructions along the same communication path
as the ordinary characters, and control characters serve this purpose.
A character encoding thus typically encodes both printable characters
and control characters.
ASCII, for example, reserves codes 0 through 31 and code 127 as control
characters:
The 8 bit ISO-8859-1 maps 32 codes, which are unused in ISO/IEC 8859-1[?], to control characters.
The codes still in common use include codes 7 (Bell, which may cause the
device receiving it to emit a warning of some kind), 8 (Backspace, used either
to erase the last character printed or to overprint it), 9 (Horizontal tab),
10 (Line feed, used to end lines in most UNIX systems and variants),
12 (Form feed, to cause a printer to eject a page), 13 (Carriage return, used
to end lines of text on Mac OS, and on MS-DOS derivatives--which use a
sequence of carriage return and line feed for this purpose), and 27 (Escape).
Occasionally one might encounter modern uses of other codes such as code 4
(End of transmission) used to end a Unix shell session or PostScript
printer transmission.
Code 27 (Escape) is a case worth elaborating.
Even though many of these control characters are never used, the concept of
sending device-control information intermixed with printable characters is so
useful that device makers found a way to send hundreds of device instructions.
Specifically, they used a series of multiple characters called a
"control sequence[?]" or "escape sequence".
Typically code 27 was first sent to alert the device that the following
characters were to be interpreted as a control sequence rather than as plain
characters, then one or more characters would follow specifying some detailed
action, after which the device would go back to interpreting characters
normally.
For example, the sequence of code 27, followed by the printable characters
"[2;10H", would cause a Digital VT-102
terminal to move its cursor[?] to the 10th cell of the
2nd line of the screen.
Some standards exist for these sequences, notably ANSI X3.64 (1979),
which was based on the behavior of VT-100 series terminals.
But the number of non-standard variations in use is large, especially among
printers, where technology has advanced far faster than any standards body
can possibly follow.
ASCII-based keyboards have a key labelled "Control" or
"Ctrl", which is used much like a shift key, being depressed in combination
with another letter or symbol key to cause the keyboard to generate one of
these 32 control codes.
The keyboard produces the code 64 places below the code for the uppercase
letter pressed (basically, it clears bit 5 to zero).
Pressing "control" and the letter "G" (code 71), for example, would produce
the code 7 (Bell).
Keyboards also have single keys that produce codes in this range.
For example, the key labelled "Backspace" typically produces code 8, "Tab"
code 9, "Enter" or "Return" code 13 (though some keyboards might produce
code 10 for "Enter").
Modern keyboards have many keys that do not correspond to ASCII characters
or control characters, for example cursor control arrows and
word processing functions.
These keyboards communicate these keys to the attached computer by one of
three methods: appropriating some otherwise unused control character for the
new use, using some encoding other than ASCII, or using multi-character
control sequences.
Keyboards attached to stand-alone personal computers typically use one
(or both) of the first two methods.
"Dumb" computer terminals typically use control sequences.
The control characters were designed fall into a few groups: printing control,
data structuring, transmission control, and miscellaneous.
Printing control characters tell where to put the next character.
Carriage return says to put the character at the edge of the paper at which
writing begins (it may or may not also move to the next line).
Line feed indicates to put the next character at the next line in the
direction new lines occur (and may or may not also move to the beginning
of the line).
Vertical and horizontal tab request the printer to move the print head to the
next tab stop in the direction of reading.
Form feed starts a new sheet of paper.
Shift in and shift out were to select alternate character sets, fonts,
underlining or other printing modes.
Backspace moves the next position one character backwards, so the printer
can overprint characters to make special characters.
The separators (group, record, etc) were made to structure data, usually on
a tape, in order to simulate punch cards.
End of media warns that the tape (or whatever) is ending.
The transmission control characters were intended to structure a data packet
and control when to retransmit it if it has an error.
The start of header was to mark the non-data section of a data packet--the
part of a message with addresses and other housekeeping data.
The start-of-text marked the end of the header, and the start of the text.
End-of-text marked the end of the data of a message.
A standard convention is to make the two characters preceding the end of text
the checksum or CRC of the message.
Escape was supposed to preface a binary value in a message that might
otherwise be interpreted as a control character.
For example, the value for binary 27 would be Escape Escape.
Substitute was intended to request a translation of the next character from
a printable character to a binary value, usually by setting bit 5 to zero.
This is handy because some transmission media (such as sheets of paper
produced by typewriters) only transmit printable characters.
Cancel would stop a transmission of a packet.
Negative acknowledge requests a retransmission of a packet.
Acknowledge indicates that a transmission was received correctly.
When a transmission medium is half duplex (that is, it can only transmit in
one direction at a time), there is usually a master station that can transmit
at any time, and one or more slave stations that transmit when they have
permission.
Enquiry is used by a master station to ask a slave station to send its next
message.
A slave station indicates that it has completed its transmission by sending
end of transmission.
The device control codes were originally generic, to be defined differently
for each device.
However, a universal need in data transmission is to request the sender to
stop transmitting when a receiver can't take more data right now.
Digital Equipment Corporation invented a convention which used 19,
(device control 3, also known as control S, or "X-OFF") to "S"top transmission,
and 17, (device control 1, AKA control Q, or "X-ON") to start transmission.
This lets manufacturers control the transmission without "transmission control"
wires in the data cable.
This saves money and makes operation more reliable by reducing the number
of connections in a cable.
Data link escape tells the other end of the data link to end a session.
Many of the ASCII control characters were designed for devices of the time
that are not often seen today.
For example, code 22, "Synchronous idle", was originally sent by synchronous
modems (which have to send data constantly) when there was no actual data to
send.
(Modern systems typically use a start bit to announce the beginning of a
transmitted word.)
Code 0, null, is a special case.
In paper tape, it is the case when there are no holes.
It's convenient to treat this as a non-existent character.
Code 127 is likewise a special case.
Its code is all-bits-on in binary, which made it easy to erase a section of
paper tape, a common storage medium of the day, by punching all the holes.
Paper tape became obsolete quickly, so this feature was almost never used.
But because its code is in the range occupied by other printable characters,
many computers used it as an additional printable character (often an
all-black "box" character useful for erasing text by overprinting).
dec hex abbr Name 00 0x00 NUL Null 01 0x01 SOH Start of Heading 02 0x02 STX Start of Text 03 0x03 ETX End of Text 04 0x04 EOT End of Transmission 05 0x05 ENQ Enquiry 06 0x06 ACK Acknowledge 07 0x07 BEL Bell 08 0x08 BS Backspace 09 0x09 HT Horizontal Tab 10 0x0A LF Line Feed 11 0x0B VT Vertical Tab 12 0x0C FF Form Feed 13 0x0D CR Carriage Return 14 0x0E SO Shift Out 15 0x0F SI Shift In 16 0x10 DLE Data Link Escape 17 0x11 DC1 Device Control 1 18 0x12 DC2 Device Control 2 19 0x13 DC3 Device Control 3 20 0x14 DC4 Device Control 4 21 0x15 NAK Negative Acknowledge 22 0x16 SYN Synchronous Idle 23 0x17 ETB End of Transmission Block 24 0x18 CAN Cancel 25 0x19 EM End of Medium 26 0x1A SUB Substitute 27 0x1B ESC Escape 28 0x1C FS File Separator 29 0x1D GS Group Separator 30 0x1E RS Record Separator 31 0x1F US Unit Separator 127 0x7F DEL Rubout/Delete
dec hex abbr Name 128 0x80 PAD Padding Character 129 0x81 HOP High Octet Preset 130 0x82 BPH Break Permitted Here 131 0x83 NBH No Break Here 132 0x84 IND Index 133 0x85 NEL Next Line 134 0x86 SSA Start of Selected Area 135 0x87 ESA End of Selected Area 136 0x88 HTS Horizontal Tab Set 137 0x89 HTJ Horizontal Tab Justified 138 0x8A VTS Vertical Tab Set 139 0x8B PLD Partial Line Forward 140 0x8C PLU Partial Line Backward 141 0x8D RI Reverse Line Feed 142 0x8E SS2 Single-Shift 2 143 0x8F SS3 Single-Shift 3 144 0x90 DCS Device Control String 145 0x91 PU1 Private Use 1 146 0x92 PU2 Private Use 2 147 0x93 STS Set Transmit State 148 0x94 CCH Cancel Character 149 0x95 MW Message Waiting 150 0x96 SPA Start of Protected Area 151 0x97 EPA End of Protected Area 152 0x98 SOS Start of String 153 0x99 SGCI Single Graphic Char Intro 154 0x9A SCI Single Char Intro 155 0x9B CSI Control Sequence Intro 156 0x9C ST String Terminator 157 0x9D OSC Os Command 158 0x9E PM Private Message 159 0x9F APC App Program Command
Table of contents
1 As Commonly Used
2 How they map to keyboards
3 The design purpose
4 Printing control
5 Data structuring
6 Transmission control
7 Miscellaneous
As Commonly Used
How they map to keyboards
The design purpose
Printing control
Data structuring
Transmission control
Miscellaneous