Showing posts with label systems. Show all posts
Showing posts with label systems. Show all posts

Wednesday, October 29, 2014

A taste of Rust (yum) for C/C++ programmers

If, like me, you've been frustrated with the status quo in systems languages, this article will give you a taste of why Rust is so exciting. In a tiny amount of code, it shows a lot of ways that Rust really kicks ass compared to C and C++. It's not just safe and fast, it's a lot more convenient.

Web browsers do string interning to condense the strings that make up the Web, such as tag and attribute names, into small values that can be compared quickly. I recently added event logging support to Servo's string interner. This will allow us to record traces from real websites, which we can use to guide further optimizations.

Here are the events we can log:

#[deriving(Show)]
pub enum Event {
    Intern(u64),
    Insert(u64, String),
    Remove(u64),
}

Interned strings have a 64-bit ID, which is recorded in every event. The String we store for "insert" events is like C++'s std::string; it points to a buffer in the heap, and it owns that buffer.

This enum is a bit fancier than a C enum, but its representation in memory is no more complex than a C struct. There's a tag for the three alternatives, a 64-bit ID, and a few fields that make up the String. When we pass or return an Event by value, it's at worst a memcpy of a few dozen bytes. There's no implicit heap allocation, garbage collection, or anything like that. We didn't define a way to copy an event; this means the String buffer always has a unique owner who is responsible for freeing it.

The deriving(Show) attribute tells the compiler to auto-generate a text representation, so we can print an Event just as easily as a built-in type.

Next we declare a global vector of events, protected by a mutex:

lazy_static! {
    pub static ref LOG: Mutex<Vec<Event>>
        = Mutex::new(Vec::with_capacity(50_000));
}

lazy_static! will initialize both of them when LOG is first used. Like String, the Vec is a growable buffer. We won't turn on event logging in release builds, so it's fine to pre-allocate space for 50,000 events. (You can put underscores anywhere in a integer literal to improve readability.)

lazy_static!, Mutex, and Vec are all implemented in Rust using gnarly low-level code. But the amazing thing is that all three expose a safe interface. It's simply not possible to use the variable before it's initialized, or to read the value the Mutex protects without locking it, or to modify the vector while iterating over it.

The worst you can do is deadlock. And Rust considers that pretty bad, still, which is why it discourages global state. But it's clearly what we need here. Rust takes a pragmatic approach to safety. You can always write the unsafe keyword and then use the same pointer tricks you'd use in C. But you don't need to be quite so guarded when writing the other 95% of your code. I want a language that assumes I'm brilliant but distracted :)

Rust catches these mistakes at compile time, and produces the same code you'd see with equivalent constructs in C++. For a more in-depth comparison, see Ruud van Asseldonk's excellent series of articles about porting a spectral path tracer from C++ to Rust. The Rust code performs basically the same as Clang / GCC / MSVC on the same platform. Not surprising, because Rust uses LLVM and benefits from the same backend optimizations as Clang.

lazy_static! is not a built-in language feature; it's a macro provided by a third-party library. Since the library uses Cargo, I can include it in my project by adding

[dependencies.lazy_static]
git = "https://github.com/Kimundi/lazy-static.rs"

to Cargo.toml and then adding

#[phase(plugin)]
extern crate lazy_static;

to src/lib.rs. Cargo will automatically fetch and build all dependencies. Code reuse becomes no harder than in your favorite scripting language.

Finally, we define a function that pushes a new event onto the vector:

pub fn log(e: Event) {
    LOG.lock().push(e);
}

LOG.lock() produces an RAII handle that will automatically unlock the mutex when it falls out of scope. In C++ I always hesitate to use temporaries like this because if they're destroyed too soon, my program will segfault or worse. Rust has compile-time lifetime checking, so I can do things that would be reckless in C++.

If you scroll up you'll see a lot of prose and not a lot of code. That's because I got a huge amount of functionality for free. Here's the logging module again:

#[deriving(Show)]
pub enum Event {
    Intern(u64),
    Insert(u64, String),
    Remove(u64),
}

lazy_static! {
    pub static ref LOG: Mutex<Vec<Event>>
        = Mutex::new(Vec::with_capacity(50_000));
}

pub fn log(e: Event) {
    LOG.lock().push(e);
}

This goes in src/event.rs and we include it from src/lib.rs.

#[cfg(feature = "log-events")]
pub mod event;

The cfg attribute is how Rust does conditional compilation. Another project can specify

[dependencies.string_cache]
git = "https://github.com/servo/string-cache"
features = ["log-events"]

and add code to dump the log:

for e in string_cache::event::LOG.lock().iter() {
    println!("{}", e);
}

Any project which doesn't opt in to log-events will see zero impact from any of this.

If you'd like to learn Rust, the Guide is a good place to start. We're getting close to 1.0 and the important concepts have been stable for a while, but the details of syntax and libraries are still in flux. It's not too early to learn, but it might be too early to maintain a large library.

By the way, here are the events generated by interning the three strings foobarbaz foo blockquote:

Insert(0x7f1daa023090, foobarbaz)
Intern(0x7f1daa023090)
Intern(0x6f6f6631)
Intern(0xb00000002)

There are three different kinds of IDs, indicated by the least significant bits. The first is a pointer into a standard interning table, which is protected by a mutex. The other two are created without synchronization, which improves parallelism between parser threads.

In UTF-8, the string foo is smaller than a 64-bit pointer, so we store the characters directly. blockquote is too big for that, but it corresponds to a well-known HTML tag. 0xb is the index of blockquote in a static list of strings that are common on the Web. Static atoms can also be used in pattern matching, and LLVM's optimizations for C's switch statements will apply.

Wednesday, September 17, 2014

Raw system calls for Rust

I wrote a small library for making raw system calls from Rust programs. It provides a macro that expands into in-line system call instructions, with no run-time dependencies at all. Here's an example:

#![feature(phase)]

#[phase(plugin, link)]
extern crate syscall;

fn write(fd: uint, buf: &[u8]) {
    unsafe {
        syscall!(WRITE, fd, buf.as_ptr(), buf.len());
    }
}

fn main() {
    write(1, "Hello, world!\n".as_bytes());
}

Right now it only supports x86-64 Linux, but I'd love to add other platforms. Pull requests are much appreciated. :)

Saturday, January 28, 2012

Writing kernel exploits

Yesterday I gave a talk about writing kernel exploits. I've posted the slides [PDF]. Here is the original description:

Did you know that a NULL pointer can compromise your entire system? Do you know how UNIX pipes, multithreading, and an obscure network protocol from 1981 are combined to take over Linux machines today? OS kernels are full of strange and interesting vulnerabilities, thanks to the subtle nature of systems code. And the kernel's ultimate authority is the ultimate prize for an attacker.

In this talk you will learn how kernel exploits work, with detailed code examples. Compared to userspace, exploiting the kernel requires a whole different bag of tricks, and we'll cover some of the most important ones. We will focus on Linux systems and x86 hardware, though most ideas will generalize. We'll start with a few toy examples, then look at some real, high-profile Linux exploits from the past two years.

You will also see how to protect your own Linux machines against kernel exploits. We'll talk about the continual cat-and-mouse game between system administrators and those who would attack even hardened kernels.

Thanks again to SIPB for giving me a venue to talk about whatever I find interesting.