Saturday, January 10, 2015

151-byte static Linux binary in Rust

Part of the sales pitch for Rust is that it's "as bare metal as C".¹ Rust can do anything C can do, run anywhere C can run,² with code that's just as efficient, and at least as safe (but usually much safer).

I'd say this claim is about 95% true, which is pretty good by the standards of marketing claims. A while back I decided to put it to the test, by making the smallest, most self-contained Rust program possible. After resolving a few issues along the way, I ended up with a 151-byte, statically linked executable for AMD64 Linux. With the release of Rust 1.0-alpha, it's time to show this off.

Here's the Rust code:

#![crate_type="rlib"]
#![allow(unstable)]

#[macro_use] extern crate syscall;

use std::intrinsics;

fn exit(n: usize) -> ! {
    unsafe {
        syscall!(EXIT, n);
        intrinsics::unreachable()
    }
}

fn write(fd: usize, buf: &[u8]) {
    unsafe {
        syscall!(WRITE, fd, buf.as_ptr(), buf.len());
    }
}

#[no_mangle]
pub fn main() {
    write(1, "Hello!\n".as_bytes());
    exit(0);
}

This uses my syscall library, which provides the syscall! macro. We wrap the underlying system calls with Rust functions, each exposing a safe interface to the unsafe syscall! macro. The main function uses these two safe functions and doesn't need its own unsafe annotation. Even in such a small program, Rust allows us to isolate memory unsafety to a subset of the code.

Because of crate_type="rlib", rustc will build this as a static library, from which we extract a single object file tinyrust.o:

$ rustc tinyrust.rs \
    -O -C no-stack-check -C relocation-model=static \
    -L syscall.rs/target
$ ar x libtinyrust.rlib tinyrust.o
$ objdump -dr tinyrust.o
0000000000000000 <main>:
   0:   b8 01 00 00 00          mov    $0x1,%eax
   5:   bf 01 00 00 00          mov    $0x1,%edi
   a:   be 00 00 00 00          mov    $0x0,%esi
                        b: R_X86_64_32  .rodata.str1625
   f:   ba 07 00 00 00          mov    $0x7,%edx
  14:   0f 05                   syscall 
  16:   b8 3c 00 00 00          mov    $0x3c,%eax
  1b:   31 ff                   xor    %edi,%edi
  1d:   0f 05                   syscall

We disable stack exhaustion checking, as well as position-independent code, in order to slim down the output. After optimization, the only instructions that survive come from inline assembly blocks in the syscall library.

Note that main doesn't end in a ret instruction. The exit function (which gets inlined) is marked with a "return type" of !, meaning "doesn't return". We make good on this by invoking the unreachable intrinsic after syscall!. LLVM will optimize under the assumption that we can never reach this point, making no guarantees about the program behavior if it is reached. This represents the fact that the kernel is actually going to kill the process before syscall!(EXIT, n) can return.

Because we use inline assembly and intrinsics, this code is not going to work on a stable-channel build of Rust 1.0. It will require an alpha or nightly build until such time as inline assembly and intrinsics::unreachable are added to the stable language of Rust 1.x.

Note that I didn't even use #![no_std]! This program is so tiny that everything it pulls from libstd is a type definition, macro, or fully inlined function. As a result there's nothing of libstd left in the compiler output. In a larger program you may need #![no_std], although its role is greatly reduced following the removal of Rust's runtime.

Linking

This is where things get weird.

Whether we compile from C or Rust,³ the standard linker toolchain is going to include a bunch of junk we don't need. So I cooked up my own linker script:

SECTIONS {
    . = 0x400078;
  
    combined . : AT(0x400078) ALIGN(1) SUBALIGN(1) {
        *(.text*)
        *(.data*)
        *(.rodata*)
        *(.bss*)
    }
}

We smash all the sections together, with no alignment padding, then extract that section as a headerless binary blob:

$ ld --gc-sections -e main -T script.ld -o payload tinyrust.o
$ objcopy -j combined -O binary payload payload.bin

Finally we stick this on the end of a custom ELF header. The header is written in NASM syntax but contains no instructions, only data fields. The base address 0x400078 seen above is the end of this header, when the whole file is loaded at 0x400000. There's no guarantee that ld will put main at the beginning of the file, so we need to separately determine the address of main and fill that in as the e_entry field in the ELF file header.

$ ENTRY=$(nm -f posix payload | grep '^main ' | awk '{print $3}')
$ nasm -f bin -o tinyrust -D entry=0x$ENTRY elf.s
$ chmod +x ./tinyrust
$ ./tinyrust
Hello!

It works! And the size:

$ wc -c < tinyrust
158

Seven bytes too big!

The final trick

To get down to 151 bytes, I took inspiration from this classic article, which observes that padding fields in the ELF header can be used to store other data. Like, say, a string constant. The Rust code changes to access this constant:

use std::{mem, raw};

#[no_mangle]
pub fn main() {
    let message: &'static [u8] = unsafe {
        mem::transmute(raw::Slice {
            data: 0x00400008 as *const u8,
            len: 7,
        })
    };

    write(1, message);
    exit(0);
}

A Rust slice like &[u8] consists of a pointer to some memory, and a length indicating the number of elements that may be found there. The module std::raw exposes this as an ordinary struct that we build, then transmute to the actual slice type. The transmute function generates no code; it just tells the type checker to treat our raw::Slice<u8> as if it were a &[u8]. We return this value out of the unsafe block, taking advantage of the "everything is an expression" syntax, and then print the message as before.

Trying out the new version:

$ rustc tinyrust.rs \
    -O -C no-stack-check -C relocation-model=static \
    -L syscall.rs/target
$ ar x libtinyrust.rlib tinyrust.o
$ objdump -dr tinyrust.o
0000000000000000 <main>:        
   0:   b8 01 00 00 00          mov    $0x1,%eax
   5:   bf 01 00 00 00          mov    $0x1,%edi
   a:   be 08 00 40 00          mov    $0x400008,%esi
   f:   ba 07 00 00 00          mov    $0x7,%edx
  14:   0f 05                   syscall 
  16:   b8 3c 00 00 00          mov    $0x3c,%eax
  1b:   31 ff                   xor    %edi,%edi
  1d:   0f 05                   syscall 

...
$ wc -c < tinyrust
151
$ ./tinyrust
Hello!

The object code is the same as before, except that the relocation for the string constant has become an absolute address. The binary is smaller by 7 bytes (the size of "Hello!\n") and it still works!

You can find the full code on GitHub. The code in this article works on rustc 1.0.0-dev (44a287e6e 2015-01-08). If I update the code on GitHub, I will also update the version number printed by the included build script.

I'd be curious to hear if anyone can make my program smaller!

C is not really "bare metal", but that's another story. ↩
From a pure language perspective. If you want to talk about availability of compilers and libraries, then Rust still has a bit of a disadvantage ;) ↩
In fact, this code grew out of an earlier experiment with really small binaries in C. ↩