You’re probably familiar with the saying “Computers speak in 1s and 0s”, but have you ever really grappled with what that means? It’s amazing to think how much they are capable of doing with just two symbols. While most of my work here on this site will remain at higher level abstractions, I thought it would be fun to write a short introduction to how computers represent information at the lower level. This post is mainly geared towards the curious-but-uninitiated reader, so I’ll try to keep it light on the math. Though at its core, after all, this is a mathematical topic.
In our counting system, we use 10 symbols:
In the binary system—the language of the computer—we only have two symbols:
0 and 1, which we call bits. We say that our counting system is base-10, or
decimal, and the binary system is base-2. Base-2 number systems
have a few interesting properties that make them suited for digital use. Among them, the fact that they are very easy to represent
physically. For example, a light switch can be either on or off, which
can be represented by — or — respectively. This is the basis for
binary logic.
But if, then, we only have two symbols, how are computers able of representing so much information? How are long strings of s and s converted into the letters and symbols we see on our screen?
Bytes
Let us consider one bit. It can either be on or off . What if we add another bit? We can now represent four different states:
| First Bit | Second Bit | Decimal Value |
|---|---|---|
We can keep on adding bits, and the number of states we can represent grows
exponentially. For example, with bits, we can represent states,
with , states, 4 bits, states, and so on. With only \(8\)
bits, we can represent states. We call this sequence of
bits a byte.
Generalizing this, is the number of states that
any sequence of bits can represent. This yields
insight into how we can interpret binary strings in our familiar
base-10 world. Consider the byte: 10110111. The first bit (if reading from right to
left) represents , the second bit , the third bit
, and so on, up to . If a bit is set to , we add
its value to our total; if it is , we ignore it. Thus, we can convert
the byte 10110111 to decimal as follows:
Optional: A More Formal Definition
For the more math-inclined reader, we can make this more formal. Let be a binary string of length , and let be the th bit of \(\lambda\) (read from right to left; i.e., is the rightmost bit, called the
least significant bitsince it has the least value ). Then we can convert to decimal as follows:
Signed and Unsigned Values
It is important that we make the distinction here between positive and negative
values. In the binary system, we have the concept of signed and unsigned
values. Recall that a byte has states. If we use all bits to
represent a positive number (unsigned), we can represent values from to .
However, if we wish to allow for negative values (signed), we can only represent values in the range
to .
This changes the way we interpret the bits. For example, if we have the
unsigned byte
11111111, we interpret it . If we considered it a signed byte, we would
interpret it as . We won’t get into the nitty gritty of this here, but I’d
encourage you to look up Two’s complement if you’re interested in learning
more.
Characters, Hex, and More
Cool, so now we can represent numbers with bits. But that doesn’t answer the question: How is this text on my screen understood by the hardware in charge of rendering it?
The answer lies in character encoding schemes. Just as we agreed that the byte 10110111 represents the number 183, we can also agree on mappings between numbers and characters.
The most fundamental encoding is ASCII (American Standard Code for Information Interchange), which uses 7 bits to represent 128 characters—enough for English letters, numbers, punctuation, and control characters. For example:
- The letter
Ais represented by the decimal value 65 (binary01000001) - The letter
ais 97 (binary01100001) - The digit
0is 48 (binary00110000)
When your screen renders this text, each character is stored as a number in memory, and your display driver knows to convert 01000001 into the glyph “A” on your screen.
Modern systems use Unicode (specifically UTF-8), which can represent over a million characters across all writing systems—emojis, mathematical symbols, Chinese characters, and more. UTF-8 cleverly uses 1-4 bytes per character, staying backward-compatible with ASCII while supporting the entire world’s writing systems.
Hexadecimal Notation
Before we wrap up, there’s one more number system worth mentioning: hexadecimal (base-16). Since bytes can represent values 0-255, and writing long binary strings is tedious, programmers often use hex notation. Hex uses 16 symbols: 0-9 and A-F, where A=10, B=11, …, F=15.
A single hex digit represents 4 bits (a “nibble”), so a byte can be written with just 2 hex digits. Our byte 10110111 becomes B7 in hex—much more compact! You’ll see hex everywhere in computing: memory addresses, color codes (#E93CAC is the hex color code for the pink accent you see here on my site!), and network protocols.
And there you have it: from individual bits flipping on and off, to bytes encoding numbers, to numbers representing characters, to characters forming the text you’re reading right now. Every layer of abstraction—from hardware to software—builds on this simple foundation of 1s and 0s. Read on to the next sequential post: Logic to see how these bits are manipulated to perform computations!