Thursday, April 5, 2012

A minimal encoder for uncompressed PNGs

I've often wondered how hard it is to output a PNG file directly, without using a library or a standard tool like pnmtopng. (I'm not sure when you'd actually want to do this; maybe for a tiny embedded system with a web interface.)

I found that constructing a simple, uncompressed PNG does not require a whole lot of code, but there are some odd details I got wrong on the first try. Here's a crash course in writing a minimal PNG encoder. We'll use only a small subset of the PNG specification, but I'll link to the full spec so you can read more.

The example code is not too fast; it's written in Python and has tons of string copying everywhere. My goal was to express the idea clearly, and let you worry about coding it up in C for your embedded system or whatever. If you're careful, you can avoid ever copying the image data.

We will assume the raw image data is a Python byte string (non-Unicode), consisting of one byte each for red, green, and blue, for each pixel in English reading order. For reference, here is how we'd "encode" this data in the much simpler PPM format.

def to_ppm(width, height, data):
return 'P6\n%d %d\n255\n%s' % (width, height, data)

I lied when I said we'd use no libraries at all. I will import Python's standard struct module. I figured an exercise in converting integers to 4-byte big endian format would be excessively boring. Here's how we do it with struct.

import struct

def be32(n):
return struct.pack('>I', n)

A PNG file contains a sequence of data chunks, each with an associated length, type, and CRC checksum. The type is a 4-byte quantity which can be interpreted as four ASCII letters. We'll implement crc later.

def png_chunk(ty, data):
return be32(len(data)) + ty + data + be32(crc(ty + data))

The IHDR chunk, always the first chunk in a file, contains basic header information such as width and height. We will hardcode a color depth of 8 bits, color type 2 (RGB truecolor), and standard 0 values for the other fields.

def png_header(width, height):
return png_chunk('IHDR',
struct.pack('>IIBBBBB', width, height, 8, 2, 0, 0, 0))

The actual image data is stored in DEFLATE format, the same compression used by gzip and friends. Fortunately for our minimalist project, DEFLATE allows uncompressed blocks. Each one has a 5-byte header: the byte 0 (or 1 for the last block), followed by a 16-bit data length, and then the same length value with all of the bits flipped. Note that these are little-endian numbers, unlike the rest of PNG. Never assume a format is internally consistent!

MAX_DEFLATE = 0xffff
def deflate_block(data, last=False):
n = len(data)
assert n <= MAX_DEFLATE
return struct.pack('<BHH', bool(last), n, 0xffff ^ n) + data

Since a DEFLATE block can only hold 64 kB, we'll need to split our image data into multiple blocks. We will actually want a more general function to split a sequence into chunks of size n (allowing the last chunk to be smaller than n).

def pieces(seq, n):
return [seq[i:i+n] for i in xrange(0, len(seq), n)]

PNG wants the DEFLATE blocks to be encapsulated as a zlib data stream. For our purposes, this means we prefix a header of 78 01 hex, and suffix an Adler-32 checksum of the "decompressed" data. That's right, a self-contained PNG encoder needs to implement two different checksum algorithms.

def zlib_stream(data):
segments = pieces(data, MAX_DEFLATE)

blocks = ''.join(deflate_block(p) for p in segments[:-1])
blocks += deflate_block(segments[-1], last=True)

return '\x78\x01' + blocks + be32(adler32(data))

We're almost done, but there's one more wrinkle. PNG has a pre-compression filter step, which transforms a scanline of data at a time. A filter doesn't change the size of the image data, but is supposed to expose redundancies, leading to better compression. We aren't compressing anyway, so we choose the no-op filter. This means we prefix a zero byte to each scanline.

At last we can build the PNG file. It consists of the magic PNG signature, a header chunk, our zlib stream inside an IDAT chunk, and an empty IEND chunk to mark the end of the file.

def to_png(width, height, data):
lines = ''.join('\0'+p for p in pieces(data, 3*width))

return ('\x89PNG\r\n\x1a\n'
+ png_header(width, height)
+ png_chunk('IDAT', zlib_stream(lines))
+ png_chunk('IEND', ''))

Actually, a PNG file may contain any number of IDAT chunks. The zlib stream is given by the concatenation of their contents. It might be convenient to emit one IDAT chunk per DEFLATE block. But the IDAT boundaries really can occur anywhere, even halfway through the zlib checksum. This flexibility is convenient for encoders, and a hassle for decoders. For example, one of many historical PNG bugs in Internet Explorer is triggered by empty IDAT chunks.

Here are those checksum algorithms we need. My CRC function follows the approach of code fragment 5 from Wikipedia. For better performance you would want to precompute a lookup table, as suggested by the PNG spec.

def crc(data):
c = 0xffffffff
for x in data:
c ^= ord(x)
for k in xrange(8):
v = 0xedb88320 if c & 1 else 0
c = v ^ (c >> 1)
return c ^ 0xffffffff

def adler32(data):
s1, s2 = 1, 0
for x in data:
s1 = (s1 + ord(x)) % 65521
s2 = (s2 + s1) % 65521
return (s2 << 16) + s1

Now we can test this code. We'll generate a grid of red-green-yellow gradients, and write it in both PPM and PNG formats.

w, h = 500, 300
img = ''
for y in xrange(h):
for x in xrange(w):
img += chr(x % 256) + chr(y % 256) + '\0'

open('out.ppm', 'wb').write(to_ppm(w, h, img))
open('out.png', 'wb').write(to_png(w, h, img))

Then we can verify that the two files contain identical image data.

$ pngtopnm out.png | sha1sum - out.ppm
e19c1229221c608b2a45a4488f9959403b8630a0  -
e19c1229221c608b2a45a4488f9959403b8630a0  out.ppm

That's it! As usual, the code is on GitHub. You can also read what others have written on similar subjects here, here, here, or here.

2 comments:

  1. Out of curiosity, why do you set the block size to 3*width? On the line:

    lines = ''.join('\0'+p for p in pieces(data, 3*width))

    ReplyDelete
  2. Ah, 3 bytes per pixel, and width pixels per line.

    Is it a requirement that each DEFLATE block contain a horizontal line of pixels? Or could you stuff all the pixels (% 64000) into a single DEFLATE block?

    ReplyDelete