This is a small collection of Unicode characters I sometimes need to copy to the clipboard or reference in some way. It started with just a few whitespace characters, but grew overtime as I added an assortment of dashes, mathematical operators, control codes, and other symbols.
Dashes and Hyphens
-
ASCII hyphen, with multiple usage, or “ambiguous semantic value”; the width should be “average”. Sent using the - key.
‐
Unambiguous a hyphen character, as in “top-to-bottom”; narrow width.
‑
As HYPHEN, but not an allowed line break point.
‒
As HYPHEN-MINUS, but has the same width as digits.
–
Indicate a range of values. Width is 1/2 em (or 1 en).
—
Make a break in the flow of a sentence. Width is 1em.
‾
An overline, overscore, or overbaroverbar, is a typographical feature of a horizontal line drawn immediately above the text.
Mathematical Operators
−
Subtraction arithmetic operator.
±
U+00B1
PLUS-MINUS SIGN
±
Mathematical symbol with multiple meanings, such as an inclusive range of values, a confidence interval, or a measurement uncertainty.
÷
U+00F7
DIVISION SIGN
÷
Division arithmetic operator.
×
U+00D7
MULTIPLICATION SIGN
×
Multiplication arithmetic operator.
Miscellaneous Symbols
°
A typographical symbol used to represent, among other things, degrees of arc and degrees of temperature.
©
U+00A9
COPYRIGHT SIGN
©
The symbol used in copyright notices.
®
U+00AE
REGISTERED SIGN
®
The symbol provides notice that the preceding word or symbol is a registered trademark or service mark.
™
U+2122
TRADE MARK SIGN
™
The symbol to indicate that the preceding mark is an unregistered trademark.
…
U+2026
HORIZONTAL ELLIPSIS
…
The dot dot dot indicates an intentional omission of a word, sentence, or whole section from a text without altering its original meaning.
⋮
U+22EE
VERTICAL ELLIPSIS
⋮
The vertical dot dot dot is useful for showing omissions in matrices, rows, or vertical lists. Also used as a kebab or meatball icon in user interfaces.
≡
U+2261
IDENTICAL TO
≡
The triple bar (tribar) has multiple, context-dependent meanings. Most people know it as the hamburger icon in user interfaces.
Non-Breaking Whitespace
U+FEFF
ZERO WIDTH NO-BREAK SPACE
0 em
U+202F
NARROW NO-BREAK SPACE
Depends on font, typically 1/5 or 1/6 em
U+00A0
NO-BREAK SPACE
Depends on font, typically 1/4 em, but often not adjusted
Whitespace
Love it or hate it. Sent using the Tab key.
U+200B
ZERO WIDTH SPACE
​
0 em
U+2005
FOUR-PER-EM SPACE (mid space)
1/4 em
U+2004
THREE-PER-EM SPACE (thick space)
 
1/3 em
U+2002
EN SPACE (nut)
 
1/2 em (or 1 en)
U+2003
EM SPACE (mutton)
 
1 em
Depends on font, narrower than THIN SPACE
Depends on font, typically 1/5 em (or sometimes 1/6 em)
Depends on font, typically 1/4 em, often adjusted. Sent using the Space key.
U+2008
PUNCTUATION SPACE
 
Depends on font, the width of a period .
U+2007
FIGURE SPACE
 
(Tabular width), Depends on font, the width of digits
File Name Alternatives
Most operating systems reserve a set of characters that may not be used in filenames. A sample of some of the reserved characters on Windows include /
, \
, ?
, *
, :
, |
, "
, <
, and >
.
Here are some potential alternatives. Depending on the font used, some options are better than others.
Solidus (Slash, Forward Slash)
None are particularly good.
⧸
Appears to be too large to be used as an alternative in many fonts, but looks fine when viewed on Windows in File Explorer, Terminal, and the Command Prompt.
U+0338
COMBINING LONG SOLIDUS OVERLAY
A space followed by this overlay character.
Reverse Solidus (Backslash)
U+20E5
COMBINING REVERSE SOLIDUS OVERLAY
A space followed by this overlay character.
⟍
U+27CD
MATHEMATICAL FALLING DIAGONAL
⧵
U+29F5
REVERSE SOLIDUS OPERATOR
⧹
U+29F9
BIG REVERSE SOLIDUS
Appears to be too large to be used as an alternative in many fonts, but looks fine when viewed on Windows in File Explorer, Terminal, and the Command Prompt.
Question Mark
⁇
U+2047
DOUBLE QUESTION MARK
❓
U+2753
BLACK QUESTION MARK ORNAMENT
Asterisk
⚹
Possibly the best looking alternative, depending on the font.
٭
U+066D
ARABIC FIVE POINTED STAR
🞶
U+1F7B6
MEDIUM SIX SPOKED ASTERISK
Colon
꞉
U+A789
MODIFIER LETTER COLON
Used as a tone letter in some orthographies Budu (Congo), Sabaot (Kenya), and several Papua New Guinea languages.
❤️ This can be used in Windows file names, and appears to be the best alternative to the actual colon character.
׃
U+05C3
HEBREW PUNCTUATION SOF PASUQ
May be used as a Hebrew punctuation colon. In RTL (right-to-left) writing systems, this might be the best alternative.
∶
Preferred to U+003A :
for denotation of division or scale in mathematical use.
︰
U+FE30
PRESENTATION FORM FOR VERTICAL TWO DOT LEADER
Vertical Line (Vertical Bar, Pipe)
ǀ
U+01C0
LATIN LETTER DENTAL CLICK
Double Quote
ʺ
U+02BA
MODIFIER LETTER DOUBLE PRIME
ˮ
U+02EE
MODIFIER LETTER DOUBLE APOSTROPHE
”
U+201D
RIGHT DOUBLE QUOTATION MARK
“
U+201C
LEFT DOUBLE QUOTATION MARK
Less Than
˂
U+02C2
MODIFIER LETTER LEFT ARROWHEAD
Greater Than
˃
U+02C3
MODIFIER LETTER RIGHT ARROWHEAD
Control Codes
The following control code characters were historically used by computer systems to embed additional information or instructions in ASCII strings or test data streams, such a the cursor position or to delineate sections of data.
Some of these are commonplace, such as the Format Effectors, while others are rarely used today.
U+0008
BS :: Backspace
\b
^H
Move the cursor one position leftwards.
U+0009
HT :: Horizontal (Character) Tabulation
\t
^I
Moves the cursor to the next character tab stop. Sent using the Tab key.
U+000A
LF :: Line Feed
\n
^J
On typewriters, printers, and some terminal emulators, moves the cursor down one row without affecting its column position, however it is generally used to indicate end-of-line in text files.
- On Unix,
LF
is used on its own to mark end-of-line. - In DOS and Windows,
LF
is used following CR
as part of the CR LF
end-of-line sequence.
Sent using the
Enter or
Return keys.
U+000B
VT :: Vertical (Line) Tabulation
\v
^K
Position the form at the next line tab stop.
U+000C
FF :: Form Feed
\f
^L
On printers, load the next page. Treated as whitespace in many programming languages, and may be used to separate logical divisions in code. In some terminal emulators, it clears the screen. It still appears in some common plain text files as a page break character.
U+000D
CR :: Carriage Return
\r
^M
Originally used to move the cursor to column zero while staying on the same line, whereas it is now generally used to indicate end-of-line in text files.
- On systems such as the Commodore 64, Apple II, and classic Mac OS (prior to Mac OS X),
CR
is used on its own to mark end-of-line. - In DOS and Windows, it is used preceding
LF
as part of the CR LF
end-of-line sequence.
Sent using the
Enter or
Return keys.
Can be used as delimiters to mark fields of data structures. If used for hierarchical levels, US
is the lowest level (dividing plain-text data items), while RS
, GS
, and FS
are of increasing level to divide groups made up of items of the level beneath it.
While it’s pretty easy to use JSON, XML, or YAML to serialize data, sometimes a less robust solution is okay. For example, you could use :
to join a key and value pair and then ;
to join multiple pairs together.
key1:value1;key2:value2
That’s simple enough, but what if the key and/or value contains one of those joining characters? Well, you could instead use the US
and RS
control codes instead to get the job done, since they’re far less likely to be used in either the key or value.
key1␟value1␞key2␟value2
U+001C
FS :: File Separator
^\
U+001D
GS :: Group Separator
^]
U+001E
RS :: Record Separator
^^
U+001F
US :: Unit Separator
^_
Transmission Controls
Historically used for message transmission, which may include a header, message text, and post-text footer, or even multiple headings and associated texts.
␁header␂text␃␄
␁header␂text␃footer␄
␁header␂text␁header␂text␃footer␄
U+0001
SOH :: Start of Heading
^A
Used to delimits the start of a message header. The header is terminated by STX
.
U+0002
STX :: Start of Text
^B
Used to terminate the message header and mark the start of the message text. The text is terminated by ETX
.
U+0003
ETX :: End of Text
^C
Used to terminate the message text and mark the start of optional ‘post-text’, such as a structured footer. Followed by EOT
. In keyboard input, often used as a ‘break’ character (Ctrl+C) to interrupt or terminate a program or process.
U+0004
EOT :: End of Transmission
^D
Marks the end of a transmitted message. Often used on Unix to indicate end-of-file on a terminal, interpreted by the shell as the command exit or logout.
Other transmission control codes are used for back and forth communication between systems, which may involve establishing or terminating connections, handshaking, and the transmission of data.
A very basic example of a two-way handshake and data transmission goes something like this: A host will send ENQ
to the server and wait for a response. When the server receives the packet, it will respond with ACK
if it is ready to receive data or NAK
if it’s not. Once the host receives the ACK
, the handshake will complete, and the host will begin sending data one packet at a time. After each packet is sent, the host will wait for an ACK
response before sending the next packet. This back and forth would continue until all data is sent. The host ends the transmission by sending EOT
.
Three-way TCP handshaking uses SYN
and ACK
for synchronizing communications. A client sends a SYN
along with a sequence number (X). The server responds by sending its own SYN
and sequence number (Y) along with an ACK
and the client’s sequence number (X). When the client receives the ACK
with the correct sequence (X), it responds by sending its own ACK
along with the server’s sequence number (Y). The handshake is complete with the client knowing the server and the server knowing the client.
Signal intended to trigger a response at the receiving end, to see if it is still present.
U+0006
ACK :: Acknowledge
^F
Response to an ENQ
, or an indication of successful receipt of a message.
U+0010
DLE :: Data Link Escape
^P
Cause a limited number of contiguously following octets to be interpreted in some different way, for example as raw data (as opposed to control codes or graphic characters). The details of this are implementation dependent.
U+0015
NAK :: Negative Acknowledge
^U
Sent by a station as a negative response to the station with which the connection has been set up. In binary synchronous communication protocol, the NAK is used to indicate that an error was detected in the previously received block and that the receiver is ready to accept retransmission of that block. In multipoint systems, the NAK is used as the not-ready reply to a poll.
U+0016
SYN :: Synchronous Idle
^V
Used in synchronous transmission systems to provide a signal from which synchronous correction may be achieved between data terminal equipment, particularly when no other character is being transmitted.
U+0017
ETB :: End of Transmission Block
^W
Indicates the end of a transmission block of data when data are divided into such blocks for transmission purposes. If it is not in use for another purpose, IPTC 7901 recommends interpreting ETB as an end of paragraph character.
Device Controls
These four control codes are reserved for the control of devices, such as the Telex teleprinter, where DC1 (known also as XON) and DC2 were intended for activating the device, while DC3 (known also as XOFF) and DC4 were intended for pausing and turning off the device.
U+0011
DC1 :: Device Control 1 (XON)
^Q
U+0012
DC2 :: Device Control 2
^R
U+0013
DC3 :: Device Control 3 (XOFF)
^S
U+0014
DC4 :: Device Control 4
^T
Locking Shifts
The SO
and SI
codes were used to convert between 8-bit and 7-bit character codes. In a 7-bit environment, the Shift Out (SO) control would change the meaning of bytes 0x21 through 0x7E (i.e. the graphical codes, excluding the space) to invoke characters from an alternative set, and the Shift In (SI) control would change them back.
Switch to an alternative character set.
Return to regular character set after Shift Out.
Others