Saturday, March 19, 2011

x86 disassembly in Haskell with hdis86

The first serious projects I coded in Haskell were compilers, code analyzers, and the like. I think it's a domain that really plays to the strengths of the language. So it makes sense that we'd want a top-notch disassembler library for Haskell.

udis86 is a fast, complete, flexible disassembler for x86 and x86-64 / AMD64. It provides a clean C API, which was no trouble to import using hsc2hs. But any C API is going to feel clunky next to high-level idiomatic Haskell code. So I built two additional layers of wrapping, to make something that will feel like a natural part of your next Haskell app.

You can download my library hdis86 from Hackage, or browse the source at GitHub. By default, a copy of udis86 is embedded in hdis86. If you already have udis86 installed as a shared library, you can link against that instead.

Getting started

Let's try it out. We'll feed in a ByteString of machine code, and pretty-print the result with groom.

$ ghci
λ> :m + Hdis86 Data.ByteString Text.Groom
λ> let code = pack [0xcc, 0xf0, 0xff, 0x44, 0x9e, 0x0f]
λ> Prelude.putStrLn . groom $ disassemble intel64 code
[Inst [] Iint3 [],
 Inst [Lock] Iinc
      (Memory{mSize = Bits32, mBase = Reg64 RSI, mIndex = Reg64 RBX,
              mScale = 4, mOffset = Immediate{iSize = Bits8, iValue = 15}})]]

If you're not familiar with x86, you might be surprised by the range of simple and complicated instructions. The first instruction is a humble int3, which executes a trap to a debugger. It takes no operands and has no prefixes.

The second instruction is an increment of a 32-bit memory region, whose address is computed by summing the value in register RSI, the value in register RBX times a scale factor of 4, and a fixed offset of 15. This instruction also has a lock prefix, which changes its concurrency semantics. Some prefixes have a direct meaning like this; others (like Rex) will change how the disassembler decodes operands.

We can also ask for Intel-style assembly syntax:

λ> let cfg = intel64 { cfgSyntax = SyntaxIntel }
λ> mapM_ (Prelude.putStrLn . mdAssembly) $ disassembleMetadata cfg code
lock inc dword [rsi+rbx*4+0xf]

Analyzing instructions

If we're just printing code, showing an Instruction is needlessly verbose. The real point of the Instruction type is machine-code analysis, with the help of Haskell's pattern matching capabilities.

Sometimes two fragments of code will differ only in which registers they use. Perhaps two compilers made different choices during register allocation. We'll write a program to detect this.

import Hdis86
import Text.Groom
import Control.Monad
import qualified Data.Map as M
import qualified Data.ByteString as B

When one code fragment uses register X the same way that another fragment uses register Y, we record a constraint X :-> Y:

data Constraint = Register :-> Register

We compare the two code fragments to produce a list of Constraints, or Nothing if they differ in ways beyond register selection. We'll use this helper function to check a list of Boolean conditions:

(==>) :: [Bool] -> a -> Maybe a
ps ==> x = guard (and ps) >> Just x

We compare operands pairwise. Register operands are easy:

operand :: Operand -> Operand -> Maybe [Constraint]
operand (Reg rx) (Reg ry) = Just [rx :-> ry]

For a memory operand, we need to check that the size, scale, and offset are equal:

-- size, base register, index register, scale, offset
operand (Mem (Memory sx bx ix kx ox)) (Mem (Memory sy by iy ky oy))
= [sx == sy, kx == ky, ox == oy] ==> [bx :-> by, ix :-> iy]

Immediate operands need a similar check for equality, but they don't produce any register constraints:

operand (Ptr   px) (Ptr   py) = [px == py] ==> []
operand (Imm ix) (Imm iy) = [ix == ix] ==> []
operand (Jump ix) (Jump iy) = [ix == iy] ==> []
operand (Const ix) (Const iy) = [ix == iy] ==> []

In all other cases, we have two different operand constructors, which means they don't match:

operand _ _ = Nothing

To check a pair of instructions, we check their prefixes and opcodes, then check operands pairwise:

inst :: Instruction -> Instruction -> Maybe [Constraint]
inst (Inst px ox rx) (Inst py oy ry) = do
guard $ and [px == py, ox == oy, length rx == length ry]
concat `fmap` zipWithM operand rx ry

Once we have a list of Constraints, we need to check it for consistency. If register X maps to Y in one place, it can't map to Z somewhere else:

unify :: [Constraint] -> Maybe (M.Map Register Register)
unify = foldM f M.empty where
f m (rx :-> ry) = case M.lookup rx m of
Nothing -> Just (M.insert rx ry m)
Just ry' -> [ry == ry'] ==> m

Now we put it all together, checking consistency in both directions:

regMap :: [Instruction] -> [Instruction] -> Maybe (M.Map Register Register)
regMap xs ys = do
cs <- concat `fmap` zipWithM inst xs ys
let swap (x :-> y) = (y :-> x)
_ <- unify $ map swap cs
unify cs

main :: IO ()
main = putStrLn . groom $ regMap (f prog_a) (f prog_b) where
f = disassemble intel64

We'll test these two code fragments:

prog_a, prog_b :: B.ByteString
prog_a = B.pack
[ 0x7e, 0x3a -- jle 0x3c
, 0x48, 0x89, 0xf5 -- mov rbp, rsi
, 0xbb, 1, 0, 0, 0 -- mov ebx, 0x1
, 0x48, 0x8b, 0x7d, 0x08 ] -- mov rdi, [rbp+0x8]
prog_b = B.pack
[ 0x7e, 0x3a -- jle 0x3c
, 0x48, 0x89, 0xf3 -- mov rbx, rsi
, 0xbd, 1, 0, 0, 0 -- mov ebp, 0x1
, 0x48, 0x8b, 0x7b, 0x08 ] -- mov rdi, [rbx+0x8]

And the result is:

$ runhaskell regmap.hs 
     [(RegNone, RegNone), (Reg32 RBX, Reg32 RBP),
      (Reg64 RBP, Reg64 RBX), (Reg64 RSI, Reg64 RSI),
      (Reg64 RDI, Reg64 RDI)])

In reality, this analysis is grossly incomplete. Some instructions have implicit register operands, and some pairs of registers overlap, like EBX and RBX. And we have to understand contextual requirements. It's not okay to remap RAX to RBX if some calling code is expecting a return value in RAX.

The complexity of x86 (not to mention the halting problem) means that binary code analysis will never be easy. Hopefully hdis86 can be one of many useful tools in this domain. As always, suggestions or patches are welcome.


  1. This comment has been removed by a blog administrator.

  2. The 'elf' package should parse ELF files. With it, hdis86, code gens like Harpy, and LLVM bindings Haskell can turn into quite a nice assembly playpen!

  3. You might also want to take a look at LLVM libraries. It's been some 6 months since I last looked at it, but it has a disassembler library, in theory machine-independent. I think x86 and x86_64 work quite well, but the next best working target, ARM, was rather incomplete back then. There's apparently also some extended disassembler code there - not sure if the API of that is fixed yet - which allows you to do some pretty advanced stuff at least within basic blocks, I think like querying something like what's the expression for register EBX at the end of the basic block, and getting corresponding LLVM intermediate representation for the code.

  4. Nice article, thanks for the information. It's very complete information. I will bookmark for next reference
    jaring futsal | jaring golf | jaring pengaman proyek |
    jaring pengaman bangunan | jaring pengaman gedung

  5. Given article is very helpful and very useful for my admin, and pardon me permission to share articles here hopefully helped :

    Obat Stenosis Spinal
    Cara Menghilangkan Benjolan Di Belakang Telinga
    Obat eksim basah paling ampuh
    Obat tradisional gabagen pada bayi

  6. HP Officejet Pro 8610 setup your HP OfficeJet Pro printer and solve all of your printer problems. Download software drivers from 8610.

  7. The students will be able to check their UP Board Result for Class 12 via SMS also. The UP Board 12th Result 2021 would comprise the student's roll UP Intermediate Result 2021 Around 25 lakh students will be able to check their UP Board Result 2021 Class 12 for the UP 12th examination which will be conducted in April and May 2021.

  8. الحشرات الضارة هي الحشرات التي تلحق الضرر بالإنسان وممتلكاتهم ومحاصيلهم ومواشيهم ، وتنتشر في جميع أنحاء العالم ، وقد تسبب الحشرات أضرارًا مباشرة ، أي بامتصاص دماء الكائنات الحية ، أو أكل أنسجتها ، أو بشكل غير مباشر ، عن طريق إزالتها. تنتقل مسببات الأمراض أو الطفيليات إلى الكائنات الحية ، وغالبًا ما يكون الضرر هو البالغين. وأحيانًا يكون الضرر في البالغين ويرقاتهم. قد تعمل هذه الحشرات بمفردها أو قد تهاجم في مجموعات. في أي حال ، يجب مكافحة الآفات والقضاء عليها لتجنب التسبب في ضرر

    شركة رش حشرات

    عندما نتحدث عن شركة لتطهير المنزل ، يجب أن نأخذ في الاعتبار أن الشركة تقوم بمهام متعددة داخل المنزل ، مثل: اعتني بكافة أعمال التنظيف الشامل سواء في شقتك أو في منزل مكون من عدة طوابق ، بما في ذلك جميع الغرف والحمامات والمطابخ والسلالم والجدران والأسقف والأثاث وما إلى ذلك ، ليس هذا فقط ولكن الشركة البولندية محترفة الأثاث والأرضيات والأجهزة والخزائن. ويتم كل هذا بمساعدة معدات محددة وأدوات التنظيف والتطهير لضمان التنظيف الفعال في وقت محدود.

    شركة تعقيم منازل

    نحن شركة مقاولات في الكويت نقوم بجميع الأعمال المتعلقة بالإنشاءات والمباني العامة ونقوم بتحليلها والبحث فيها وإجراء دراسة شاملة وشاملة لأية عيوب قد تحدث مع الأخذ بعين الاعتبار الأخطاء ، لأن كل هذه هي ينعكس في الشركة وسمعتها ، لذلك نقوم ببناء الهياكل الهندسية المختلفة ، وبناء المستشفيات والمدارس والمراكز ، والمولات التجارية ، وجميع الدوائر الحكومية ، والمباني الخاصة والعامة ، بالإضافة إلى التنفيذ الرائع للفيلات والقصور ، فنحن نقدم ما يلي الجوانب خدمة ممتازة في مجالات الدهان ، التجصيص ، الكهرباء ، الفنون الزخرفية ، السباكة ، بلاط السيراميك والديكور المنزلي

    شركة مقاولات الكويت

  9. Gutt Websäit : Zonahobisaya
    Gutt Websäit : Biografi
    Gutt Websäit : Zonahobisaya
    Gutt Websäit : Zonahobisaya
    Gutt Websäit : Zonahobisaya
    Gutt Websäit : Resep
    Gutt Websäit : Logo
    Gutt Websäit : Zonahobisaya