Sunday, May 13, 2012

Automatic binary hardening with Autoconf

How do you protect a C program against memory corruption exploits? We should try to write code with no bugs, but we also need protection against any bugs which may lurk. Put another way, I try not to crash my bike but I still wear a helmet.

Operating systems now support a variety of tricks to make life difficult for would-be attackers. But most of these hardening features need to be enabled at compile time. When I started contributing to Mosh, I made it a goal to build with full hardening on every platform, not just proactive distributions like Ubuntu. This means detecting available hardening features at compile time.

Mosh uses Autotools, so this code is naturally part of the Autoconf script. I know that Autotools has a bad reputation in some circles, and I'm not going to defend it here. But a huge number of existing projects use Autotools. They can benefit today from a drop-in hardening recipe.

I've published an example project which uses Autotools to detect and enable some binary hardening features. To the extent possible under law, I waive all copyright and related or neighboring rights to the code I wrote for this project. (There are some third-party files in the m4/ subdirectory; those are governed by the respective licenses which appear in each file.) I want this code to be widely useful, and I welcome any refinements you have.

This article explains how my auto-detection code works, with some detail about the hardening measures themselves. If you just want to add hardening to your project, you don't necessarily need to read the whole thing. At the end I talk a bit about the performance implications.

How it works

The basic idea is simple. We use AX_CHECK_{COMPILE,LINK}_FLAG from the Autoconf Archive to detect support for each feature. The syntax is

AX_CHECK_COMPILE_FLAG(flag, action-if-supported, action-if-unsupported, extra-flags)

For extra-flags we generally pass -Werror so the compiler will fail on unrecognized flags. Since the project contains both C and C++ code, we check each flag once for the C compiler and once for the C++ compiler. Also, some flags depend on others, or have multiple alternative forms. This is reflected in the nesting structure of the action-if-supported and action-if-unsupported blocks. You can see the full story in configure.ac.

We accumulate all the supported flags into HARDEN_{C,LD}FLAGS and substitute these into each Makefile.am. The hardening flags take effect even if the user overrides CFLAGS on the command line. To explicitly disable hardening, pass

./configure --disable-hardening

A useful command when testing is

grep HARDEN config.log

Complications

Clang will not error out on unrecognized flags, even with -Werror. Instead it prints a message like

clang: warning: argument unused during compilation: '-foo'

and continues on blithely. I don't want these warnings to appear during the actual build, so I hacked around Clang's behavior. The script wrap-compiler-for-flag-check runs a command and errors out if the command prints a line containing "warning: argument unused". Then configure temporarily sets

CC="$srcdir/scripts/wrap-compiler-for-flag-check $CC"

while performing the flag checks.

When I integrated hardening into Mosh, I discovered that Ubuntu's default hardening flags conflict with ours. For example we set -Wstack-protector, meaning "warn about any unprotected functions", and they set --param=ssp-buffer-size=4, meaning "don't protect functions with fewer than 4 bytes of buffers". Our stack-protector flags are strictly more aggressive, so I disabled Ubuntu's by adding these lines to debian/rules:

export DEB_BUILD_MAINT_OPTIONS = hardening=-stackprotector
-include /usr/share/dpkg/buildflags.mk

We did something similar for Fedora.

Yet another problem is that Debian distributes skalibs (a Mosh dependency) as a static-only library, built without -fPIC, which in turn prevents Mosh from using -fPIC. Mosh can build the relevant parts of skalibs internally, but Debian and Ubuntu don't want us doing that. The unfortunate solution is simply to reimplement the small amount of skalibs we were using on Linux.

The flags

Here are the specific protections I enabled.

-D_FORTIFY_SOURCE=2 enables some compile-time and run-time checks on memory and string manipulation. This requires -O1 or higher. See also man 7 feature_test_macros.
-fno-strict-overflow prevents GCC from optimizing away arithmetic overflow tests.
-fstack-protector-all detects stack buffer overflows after they occur, using a stack canary. We also set -Wstack-protector (warn about unprotected functions) and --param ssp-buffer-size=1 (protect regardless of buffer size). (Actually, the "-all" part of -fstack-protector-all might imply ssp-buffer-size=1.)
Attackers can use fragments of legitimate code already in memory to stitch together exploits. This is much harder if they don't know where any of that code is located. Shared libraries get random addresses by default, but your program doesn't. Even an exploit against a shared library can take advantage of that. So we build a position independent executable (PIE), with the goal that every executable page has a randomized address.
Exploits can't overwrite read-only memory. Some areas could be marked as read-only except that the dynamic loader needs to perform relocations there. The GNU linker flag -z relro arranges to set them as read-only once the dynamic loader is done with them.

In particular, this can protect the PLT and GOT, which are classic targets for memory corruption. But PLT entries normally get resolved on demand, which means they're writable as the program runs. We set -z now to resolve PLT entries at startup so they get RELRO protection.

In the example project I also enabled -Wall -Wextra -Werror. These aren't hardening flags and we don't need to detect support, but they're quite important for catching security problems. If you can't make your project -Wall-clean, you can at least add security-relevant checks such as -Wformat-security.

Demonstration

On x86 Linux, we can check the hardening features using Tobias Klein's checksec.sh. First, as a control, let's build with no hardening.

$ ./build.sh --disable-hardening
+ autoreconf -fi
+ ./configure --disable-hardening
...
+ make
...
$ ~/checksec.sh --file src/test
No RELRO       No canary found  NX enabled  No PIE

The no-execute bit (NX) is mainly a kernel and CPU feature. It does not require much compiler support, and is enabled by default these days. Now we'll try full hardening:

$ ./build.sh
+ autoreconf -fi
+ ./configure
...
checking whether C compiler accepts -fno-strict-overflow... yes
checking whether C++ compiler accepts -fno-strict-overflow... yes
checking whether C compiler accepts -D_FORTIFY_SOURCE=2... yes
checking whether C++ compiler accepts -D_FORTIFY_SOURCE=2... yes
checking whether C compiler accepts -fstack-protector-all... yes
checking whether C++ compiler accepts -fstack-protector-all... yes
checking whether the linker accepts -fstack-protector-all... yes
checking whether C compiler accepts -Wstack-protector... yes
checking whether C++ compiler accepts -Wstack-protector... yes
checking whether C compiler accepts --param ssp-buffer-size=1... yes
checking whether C++ compiler accepts --param ssp-buffer-size=1... yes
checking whether C compiler accepts -fPIE... yes
checking whether C++ compiler accepts -fPIE... yes
checking whether the linker accepts -fPIE -pie... yes
checking whether the linker accepts -Wl,-z,relro... yes
checking whether the linker accepts -Wl,-z,now... yes
...
+ make
...
$ ~/checksec.sh --file src/test
Full RELRO     Canary found     NX enabled  PIE enabled

We can dig deeper on some of these. objdump -d shows that the unhardened executable puts main at a fixed address, say 0x4006e0, while the position-independent executable specifies a small offset like 0x9e0. We can also see the stack-canary checks:

b80:  sub   $0x18,%rsp
b84:  mov   %fs:0x28,%rax
b8d:  mov   %rax,0x8(%rsp)

      ... function body ...

b94:  mov   0x8(%rsp),%rax
b99:  xor   %fs:0x28,%rax
ba2:  jne   bb4 <c_fun+0x34>

      ... normal epilogue ...

bb4:  callq 9c0 <__stack_chk_fail@plt>

The function starts by copying a "canary" value from %fs:0x28 to the stack. On return, that value had better still be there; otherwise, an attacker has clobbered our stack frame.

The canary is chosen randomly by glibc at program start. The %fs segment has a random offset in linear memory, which makes it hard for an attacker to discover the canary through an information leak. This also puts it within thread-local storage, so glibc could use a different canary value for each thread (but I'm not sure if it does).

The hardening flags adapt to any other compiler options we specify. For example, let's try a static build:

$ ./build.sh LDFLAGS=-static
+ autoreconf -fi
+ ./configure LDFLAGS=-static
...
checking whether C compiler accepts -fPIE... yes
checking whether C++ compiler accepts -fPIE... yes
checking whether the linker accepts -fPIE -pie... no
...
+ make
...
$ file src/test
src/test: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux),
statically linked, for GNU/Linux 2.6.26, not stripped
$ ~/checksec.sh --file src/test
Partial RELRO  Canary found     NX enabled  No PIE

We can't have position independence with static linking. And checksec.sh thinks we aren't RELRO-protecting the PLT — but that's because we don't have one.

Performance

So what's the catch? These protections can slow down your program significantly. I ran a few benchmarks for Mosh, on three test machines:

A wimpy netbook: 1.6 GHz Atom N270, Ubuntu 12.04 i386
A reasonable laptop: 2.1 GHz Core 2 Duo T8100, Debian sid amd64
A beefy desktop: 3.0 GHz Phenom II X6 1075T, Debian sid amd64

In all three cases I built Mosh using GCC 4.6.3. Here's the relative slowdown, in percent.

Protections	Netbook	Laptop	Desktop
Everything	16.0	4.4	2.1
All except PIE	4.7	3.3	2.2
All except stack protector	11.0	1.0	1.1

PIE really hurts on i386 because data references use an extra register, and registers are scarce to begin with. It's much cheaper on amd64 thanks to PC-relative addressing.

There are other variables, of course. One Debian stable system with GCC 4.4 saw a 30% slowdown, with most of it coming from the stack protector. So this deserves further scrutiny, if your project is performance-critical. Mosh doesn't use very much CPU anyway, so I decided security is the dominant priority.