I've often wondered how hard it is to output a PNG file directly, without using a library or a standard tool like
pnmtopng. (I'm not sure when you'd actually want to do this; maybe for a tiny embedded system with a web interface.)
I found that constructing a simple, uncompressed PNG does not require a whole lot of code, but there are some odd details I got wrong on the first try. Here's a crash course in writing a minimal PNG encoder. We'll use only a small subset of the PNG specification, but I'll link to the full spec so you can read more.
The example code is not too fast; it's written in Python and has tons of string copying everywhere. My goal was to express the idea clearly, and let you worry about coding it up in C for your embedded system or whatever. If you're careful, you can avoid ever copying the image data.
We will assume the raw image data is a Python byte string (non-Unicode), consisting of one byte each for red, green, and blue, for each pixel in English reading order. For reference, here is how we'd "encode" this data in the much simpler PPM format.
def to_ppm(width, height, data):
return 'P6\n%d %d\n255\n%s' % (width, height, data)
I lied when I said we'd use no libraries at all. I will import Python's standard
struct module. I figured an exercise in converting integers to 4-byte big endian format would be excessively boring. Here's how we do it with
return struct.pack('>I', n)
A PNG file contains a sequence of data chunks, each with an associated length, type, and CRC checksum. The type is a 4-byte quantity which can be interpreted as four ASCII letters. We'll implement
def png_chunk(ty, data):
return be32(len(data)) + ty + data + be32(crc(ty + data))
IHDR chunk, always the first chunk in a file, contains basic header information such as width and height. We will hardcode a color depth of 8 bits, color type 2 (RGB truecolor), and standard 0 values for the other fields.
def png_header(width, height):
struct.pack('>IIBBBBB', width, height, 8, 2, 0, 0, 0))
The actual image data is stored in DEFLATE format, the same compression used by gzip and friends. Fortunately for our minimalist project, DEFLATE allows uncompressed blocks. Each one has a 5-byte header: the byte
1 for the last block), followed by a 16-bit data length, and then the same length value with all of the bits flipped. Note that these are little-endian numbers, unlike the rest of PNG. Never assume a format is internally consistent!
MAX_DEFLATE = 0xffff
def deflate_block(data, last=False):
n = len(data)
assert n <= MAX_DEFLATE
return struct.pack('<BHH', bool(last), n, 0xffff ^ n) + data
Since a DEFLATE block can only hold 64 kB, we'll need to split our image data into multiple blocks. We will actually want a more general function to split a sequence into chunks of size
n (allowing the last chunk to be smaller than
def pieces(seq, n):
return [seq[i:i+n] for i in xrange(0, len(seq), n)]
PNG wants the DEFLATE blocks to be encapsulated as a zlib data stream. For our purposes, this means we prefix a header of
78 01 hex, and suffix an Adler-32 checksum of the "decompressed" data. That's right, a self-contained PNG encoder needs to implement two different checksum algorithms.
segments = pieces(data, MAX_DEFLATE)
blocks = ''.join(deflate_block(p) for p in segments[:-1])
blocks += deflate_block(segments[-1], last=True)
return '\x78\x01' + blocks + be32(adler32(data))
We're almost done, but there's one more wrinkle. PNG has a pre-compression filter step, which transforms a scanline of data at a time. A filter doesn't change the size of the image data, but is supposed to expose redundancies, leading to better compression. We aren't compressing anyway, so we choose the no-op filter. This means we prefix a zero byte to each scanline.
def to_png(width, height, data):
lines = ''.join('\0'+p for p in pieces(data, 3*width))
+ png_header(width, height)
+ png_chunk('IDAT', zlib_stream(lines))
+ png_chunk('IEND', ''))
Actually, a PNG file may contain any number of
IDAT chunks. The zlib stream is given by the concatenation of their contents. It might be convenient to emit one
IDAT chunk per DEFLATE block. But the
IDAT boundaries really can occur anywhere, even halfway through the zlib checksum. This flexibility is convenient for encoders, and a hassle for decoders. For example, one of many historical PNG bugs in Internet Explorer is triggered by empty
Here are those checksum algorithms we need. My CRC function follows the approach of code fragment 5 from Wikipedia. For better performance you would want to precompute a lookup table, as suggested by the PNG spec.
c = 0xffffffff
for x in data:
c ^= ord(x)
for k in xrange(8):
v = 0xedb88320 if c & 1 else 0
c = v ^ (c >> 1)
return c ^ 0xffffffff
s1, s2 = 1, 0
for x in data:
s1 = (s1 + ord(x)) % 65521
s2 = (s2 + s1) % 65521
return (s2 << 16) + s1
Now we can test this code. We'll generate a grid of red-green-yellow gradients, and write it in both PPM and PNG formats.
w, h = 500, 300
img = ''
for y in xrange(h):
for x in xrange(w):
img += chr(x % 256) + chr(y % 256) + '\0'
open('out.ppm', 'wb').write(to_ppm(w, h, img))
open('out.png', 'wb').write(to_png(w, h, img))
Then we can verify that the two files contain identical image data.
$ pngtopnm out.png | sha1sum - out.ppm e19c1229221c608b2a45a4488f9959403b8630a0 - e19c1229221c608b2a45a4488f9959403b8630a0 out.ppm