Algorithms

JPEG Decoding for Beginners

A step-by-step walkthrough of the JPEG decoding process using a real 16x16 pixel image, covering everything from reading file markers and Huffman tables to performing inverse DCT and color space conversion.

[FF D8]

If you want to understand how a JPG file works, grab a compiler and a hex editor — we're going to decode it step by step.

For this article, I used a heavily compressed Google favicon (16×16 pixels) as an example. The description is somewhat simplified but should be sufficient for understanding the specification.

[FF D8] — the start of image marker, always found at the very beginning of every JPG file.

The following bytes [FF FE] indicate the beginning of a comment section. The next 2 bytes [00 04] specify the section length. The values [3A 29] are the comment itself — the ASCII codes for ":" and ")", i.e., a smiley face emoticon visible in the right panel of the hex editor.

A Little Theory

Here's a brief overview of the key processes in JPEG compression:

Color space conversion from RGB to YCbCr
Channel subsampling — reducing the Cb and Cr chrominance channels (e.g., 2:1 reduction both horizontally and vertically)
Dividing the image into 8×8 blocks
Discrete Cosine Transform (DCT) — applied to each 8×8 block, producing a matrix of 64 coefficients
The top-left coefficient is the DC coefficient (the most important one), while the remaining 63 are AC coefficients
Quantization — multiplying by the quantization matrix coefficients (this is where the lossy compression happens)
Huffman encoding — lossless compression of the quantized coefficients

Data for blocks is interleaved in small portions: Y₀₀Y₁₀Y₀₁Y₁₁Cb₀₀Cr₀₀Y₂₀…

Reading the File

A JPEG file is structured as a sequence of segments, each preceded by a marker (2 bytes, where the first byte is always [FF]). Most segments also store their length in the following 2 bytes.

DQT Marker [FF DB] — Quantization Table

[00 43] — Section length: 67 bytes
[0_] — Value precision: 0 (0 = 1 byte, 1 = 2 bytes)
[_0] — Table ID: 0

The remaining 64 bytes fill an 8×8 table in zigzag order:

Quantization matrix values (in hex):

[A0 6E 64 A0 F0 FF FF FF]
[78 78 8C BE FF FF FF FF]
[8C 82 A0 F0 FF FF FF FF]
[8C AA DC FF FF FF FF FF]
[B4 DC FF FF FF FF FF FF]
[F0 FF FF FF FF FF FF FF]
[FF FF FF FF FF FF FF FF]
[FF FF FF FF FF FF FF FF]

In decimal:

[160 110 100 160 240 255 255 255]
[120 120 140 190 255 255 255 255]
[140 130 160 240 255 255 255 255]
[140 170 220 255 255 255 255 255]
[180 220 255 255 255 255 255 255]
[240 255 255 255 255 255 255 255]
[255 255 255 255 255 255 255 255]
[255 255 255 255 255 255 255 255]

SOF0 Marker [FF C0] — Baseline DCT

[00 11] — Section length: 17 bytes
[08] — Precision: 8 bits
[00 10] — Image height: 16 pixels
[00 10] — Image width: 16 pixels
[03] — Number of channels: 3

Channel 1:

[01] — Channel ID: 1 (Y)
[2_] — Horizontal sampling factor (H₁): 2
[_2] — Vertical sampling factor (V₁): 2
[00] — Quantization table ID: 0

Channel 2:

[02] — Channel ID: 2 (Cb)
[1_] — Horizontal sampling factor (H₂): 1
[_1] — Vertical sampling factor (V₂): 1
[01] — Quantization table ID: 1

Channel 3:

[03] — Channel ID: 3 (Cr)
[1_] — Horizontal sampling factor (H₃): 1
[_1] — Vertical sampling factor (V₃): 1
[01] — Quantization table ID: 1

Hmax = 2, Vmax = 2. Channel i is subsampled by a factor of Hmax/Hi horizontally and Vmax/Vi vertically.

DHT Marker [FF C4] — Huffman Table

Number of codes per length:

Length:       1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
Code count: [01 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00]

Values: [03] and [02]. This means there are 2 codes total: one of length 1 with value 3, and one of length 2 with value 2.

The Huffman tree is built as follows:

SOS Marker [FF DA] — Start of Scan

[00 0C] — Section length: 12 bytes
[03] — Number of channels: 3

Channel 1 (Y):

[01] — Channel ID: 1
[0_] — DC Huffman table: 0
[_0] — AC Huffman table: 0

Channel 2 (Cb):

[02] — Channel ID: 2
[1_] — DC Huffman table: 1
[_1] — AC Huffman table: 1

Channel 3 (Cr):

[03] — Channel ID: 3
[1_] — DC Huffman table: 1
[_1] — AC Huffman table: 1

[00], [3F], [00] — parameters for progressive mode (not covered in this article).

Encoded Data

After the SOS marker, the actual encoded image data begins.

Finding the DC Coefficient

Read the bit sequence (if you encounter [FF 00], treat it as [FF] — it's not a marker). Traverse the Huffman tree following the bits: 0 means go left, 1 means go right. Stop when you reach a leaf node.
Extract the node's value. A value of 0 means the coefficient is 0. Otherwise, the value tells you how many subsequent bits represent the coefficient.
If the first bit is 1, keep the value as-is. Otherwise, compute: DC = value − 2^length + 1. Place the result in the zigzag's top-left position.

Finding AC Coefficients

Continue reading the bit sequence.
Get the node's value. A value of 0 means fill the remaining positions in the matrix with zeros. Otherwise, the high nibble tells you how many zeros precede the coefficient, and the low nibble tells you the bit length of the coefficient.
Apply the same sign conversion logic as for DC coefficients.

Continue until the entire 8×8 matrix is filled or a zero code (end of block) is encountered.

Result — the first Y-channel coefficient matrix (before dequantization):

[ 2  0  3  0  0  0  0  0]
[ 0  1  2  0  0  0  0  0]
[ 0 -1 -1  0  0  0  0  0]
[ 1  0  0  0  0  0  0  0]
[ 0  0  0  0  0  0  0  0]
[ 0  0  0  0  0  0  0  0]
[ 0  0  0  0  0  0  0  0]
[ 0  0  0  0  0  0  0  0]

The remaining three Y-channel matrices (decoded the same way):

[-4  1  1  1  0  0  0  0]    [ 5 -1  1  0  0  0  0  0]    [-4  2  2  1  0  0  0  0]
[ 0  0  1  0  0  0  0  0]    [-1 -2 -1  0  0  0  0  0]    [-1  0 -1  0  0  0  0  0]
[ 0 -1  0  0  0  0  0  0]    [ 0 -1  0  0  0  0  0  0]    [-1 -1  0  0  0  0  0  0]
[ 0  0  0  0  0  0  0  0]    [-1  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]
[ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]
[ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]
[ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]
[ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]

Note that DC coefficients are stored as differences from the previous DC coefficient of the same channel:

Matrix 2 DC: 2 + (−4) = −2
Matrix 3 DC: −2 + 5 = 3
Matrix 4 DC: 3 + (−4) = −1

Corrected matrices (with absolute DC values):

[-2  1  1  1  0  0  0  0]    [ 3 -1  1  0  0  0  0  0]    [-1  2  2  1  0  0  0  0]
[ 0  0  1  0  0  0  0  0]    [-1 -2 -1  0  0  0  0  0]    [-1  0 -1  0  0  0  0  0]
[ 0 -1  0  0  0  0  0  0]    [ 0 -1  0  0  0  0  0  0]    [-1 -1  0  0  0  0  0  0]
[ 0  0  0  0  0  0  0  0]    [-1  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]
[ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]
[ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]
[ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]
[ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]

The Cb and Cr channel matrices:

Cb:                           Cr:
[-1  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]
[ 1  1  0  0  0  0  0  0]    [ 1 -1  0  0  0  0  0  0]
[ 0  0  0  0  0  0  0  0]    [ 1  0  0  0  0  0  0  0]
[ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]
[ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]
[ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]
[ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]
[ 0  0  0  0  0  0  0  0]    [ 0  0  0  0  0  0  0  0]

Calculations

Dequantization

We multiply each coefficient matrix element by the corresponding element of the quantization table.

Y-channel matrices after dequantization:

[ 320    0  300  0  0  0  0  0]    [-320  110  100  160  0  0  0  0]
[   0  120  280  0  0  0  0  0]    [   0    0  140    0  0  0  0  0]
[   0 -130 -160  0  0  0  0  0]    [   0 -130    0    0  0  0  0  0]
[ 140    0    0  0  0  0  0  0]    [   0    0    0    0  0  0  0  0]
[   0    0    0  0  0  0  0  0]    [   0    0    0    0  0  0  0  0]
[   0    0    0  0  0  0  0  0]    [   0    0    0    0  0  0  0  0]
[   0    0    0  0  0  0  0  0]    [   0    0    0    0  0  0  0  0]
[   0    0    0  0  0  0  0  0]    [   0    0    0    0  0  0  0  0]

[ 480 -110  100  0  0  0  0  0]    [-160  220  200  160  0  0  0  0]
[-120 -240 -140  0  0  0  0  0]    [-120    0 -140    0  0  0  0  0]
[   0 -130    0  0  0  0  0  0]    [-140 -130    0    0  0  0  0  0]
[-140    0    0  0  0  0  0  0]    [   0    0    0    0  0  0  0  0]
[   0    0    0  0  0  0  0  0]    [   0    0    0    0  0  0  0  0]
[   0    0    0  0  0  0  0  0]    [   0    0    0    0  0  0  0  0]
[   0    0    0  0  0  0  0  0]    [   0    0    0    0  0  0  0  0]
[   0    0    0  0  0  0  0  0]    [   0    0    0    0  0  0  0  0]

Cb and Cr matrices after dequantization:

Cb:                                  Cr:
[-170    0  0  0  0  0  0  0]    [   0    0  0  0  0  0  0  0]
[ 180  210  0  0  0  0  0  0]    [ 180 -210  0  0  0  0  0  0]
[   0    0  0  0  0  0  0  0]    [ 240    0  0  0  0  0  0  0]
[   0    0  0  0  0  0  0  0]    [   0    0  0  0  0  0  0  0]
[   0    0  0  0  0  0  0  0]    [   0    0  0  0  0  0  0  0]
[   0    0  0  0  0  0  0  0]    [   0    0  0  0  0  0  0  0]
[   0    0  0  0  0  0  0  0]    [   0    0  0  0  0  0  0  0]
[   0    0  0  0  0  0  0  0]    [   0    0  0  0  0  0  0  0]

Inverse DCT

The formula for the inverse DCT:

s_yx = (1/4) × Σ_u=0..7 Σ_v=0..7 C_uC_v × S_vu × cos((2x+1)uπ/16) × cos((2y+1)vπ/16)

where C_x = 1/√2 when x = 0, and C_x = 1 otherwise.

After computing the inverse DCT and rounding, the first Y-channel block gives:

[138  92  27 -17 -17  28  93 139]
[136  82   5 -51 -55  -8  61 111]
[143  80  -9 -77 -89 -41  32  86]
[157  95   6 -62 -76 -33  36  86]
[147 103  37 -12 -21  11  62 100]
[ 87  72  50  36  37  55  79  95]
[-10   5  31  56  71  73  68  62]
[-87 -50   6  56  79  72  48  29]

After adding 128 and clamping values to the [0, 255] range:

The Cb and Cr matrices after inverse DCT (these are 8×8 because Cb/Cr are not subsampled into multiple blocks):

Cb:                                         Cr:
[ 60  52  38  20   0 -18 -32 -40]    [ 19  27  41  60  80  99 113 120]
[ 48  41  29  13  -3 -19 -31 -37]    [  0   6  18  34  51  66  78  85]
[ 25  20  12   2  -9 -19 -27 -32]    [-27 -22 -14  -4   7  17  25  30]
[ -4  -6  -9 -13 -17 -20 -23 -25]    [-43 -41 -38 -34 -30 -27 -24 -22]
[-37 -35 -33 -29 -25 -21 -18 -17]    [-35 -36 -39 -43 -47 -51 -53 -55]
[-67 -63 -55 -44 -33 -22 -14 -10]    [ -5  -9 -17 -28 -39 -50 -58 -62]
[-90 -84 -71 -56 -39 -23 -11  -4]    [ 32  26  14  -1 -18 -34 -46 -53]
[-102 -95 -81 -62 -42 -23  -9  -1]    [ 58  50  36  18  -2 -20 -34 -42]

Values are then clamped to [0, 255] after adding 128.

YCbCr to RGB Conversion

Since the Cb and Cr channels were subsampled (each 8×8 Cb/Cr block covers the entire 16×16 image), we need to upsample them to match the Y channel's resolution. Each Cb/Cr pixel maps to a 2×2 block of Y pixels.

The conversion formulas:

R = round(Y + 1.402 × (Cr − 128))
G = round(Y − 0.34414 × (Cb − 128) − 0.71414 × (Cr − 128))
B = round(Y + 1.772 × (Cb − 128))

Final RGB values for the upper-left 8×8 block:

R channel:

[255 249 195 149 169 215 255 255]
[255 238 172 116 131 179 255 255]
[255 209 127  58  64 112 209 255]
[255 224 143  73  76 120 212 255]
[217 193 134  84  86 118 185 223]
[177 162 147 132 145 162 201 218]
[ 57  74 101 125 144 146 147 142]
[  0  18  76 125 153 146 128 108]

G channel:

[220 186 118  72  67 113 172 205]
[220 175  95  39  29  77 139 190]
[238 192 100  31  16  64 132 185]
[238 207 116  46  28  72 135 186]
[255 242 175 125 113 145 193 231]
[226 211 188 173 172 189 209 226]
[149 166 192 216 230 232 225 220]
[ 73 110 167 216 239 232 206 186]

B channel:

[255 255 250 204 179 225 255 255]
[255 255 227 171 141 189 224 255]
[255 255 193 124  90 138 186 239]
[255 255 209 139 102 146 189 240]
[255 255 203 153 130 162 195 233]
[255 244 216 201 189 206 211 228]
[108 125 148 172 183 185 173 168]
[ 32  69 123 172 192 185 154 134]

Conclusion

I am not a specialist in this field, but I enjoy figuring out how things work. When I decided to understand the JPEG format, I searched for a worked example with real numbers but couldn't find one — only theoretical descriptions and convoluted specifications. So I did the decoding myself and tried to document it as clearly as possible. I hope this article saves someone some time.

Here are some resources I found helpful:

ru.wikipedia.org/JPEG — a good introductory overview
en.wikipedia.org/JPEG — more detail on the encoding/decoding process
JPEG Standard (ISO/IEC 10918-1 ITU-T Recommendation T.81) — the official specification
impulseadventure.com/photo — great articles with Huffman tree construction examples
JPEGsnoop — a utility for extracting detailed information from JPEG files

[FF D9]