You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
typesetting/quad/qtest/mds/byte-strings.md

92 lines
3.4 KiB
Markdown

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# Bytes and Byte Strings
A _byte_ is an exact integer between `0` and `255`, inclusive. The
`byte?` predicate recognizes numbers that represent bytes.
Examples:
```racket
> (byte? 0)
#t
> (byte? 256)
#f
```
A _byte string_ is similar to a string—see \[missing\]—but its content
is a sequence of bytes instead of characters. Byte strings can be used
in applications that process pure ASCII instead of Unicode text. The
printed form of a byte string supports such uses in particular, because
a byte string prints like the ASCII decoding of the byte string, but
prefixed with a `#`. Unprintable ASCII characters or non-ASCII bytes in
the byte string are written with octal notation.
> +\[missing\] in \[missing\] documents the fine points of the syntax of
> byte strings.
Examples:
```racket
> #"Apple"
#"Apple"
> (bytes-ref #"Apple" 0)
65
> (make-bytes 3 65)
#"AAA"
> (define b (make-bytes 2 0))
> b
#"\0\0"
> (bytes-set! b 0 1)
> (bytes-set! b 1 255)
> b
#"\1\377"
```
The `display` form of a byte string writes its raw bytes to the current
output port \(see \[missing\]\). Technically, `display` of a normal
\(i.e,. character\) string prints the UTF-8 encoding of the string to
the current output port, since output is ultimately defined in terms of
bytes; `display` of a byte string, however, writes the raw bytes with no
encoding. Along the same lines, when this documentation shows output, it
technically shows the UTF-8-decoded form of the output.
Examples:
```racket
> (display #"Apple")
Apple
> (display "\316\273") ; same as "λ"
λ
> (display #"\316\273") ; UTF-8 encoding of λ
λ
```
For explicitly converting between strings and byte strings, Racket
supports three kinds of encodings directly: UTF-8, Latin-1, and the
current locales encoding. General facilities for byte-to-byte
conversions \(especially to and from UTF-8\) fill the gap to support
arbitrary string encodings.
Examples:
```racket
> (bytes->string/utf-8 #"\316\273")
"λ"
> (bytes->string/latin-1 #"\316\273")
"λ"
> (parameterize ([current-locale "C"]) ; C locale supports ASCII,
(bytes->string/locale #"\316\273")) ; only, so...
bytes->string/locale: byte string is not a valid encoding
for the current locale
byte string: #"\316\273"
> (let ([cvt (bytes-open-converter "cp1253" ; Greek code page
"UTF-8")]
[dest (make-bytes 2)])
(bytes-convert cvt #"\353" 0 1 dest)
(bytes-close-converter cvt)
(bytes->string/utf-8 dest))
"λ"
```
> +\[missing\] in \[missing\] provides more on byte strings and
> byte-string procedures.