Saturday, March 19, 2011

x86 disassembly in Haskell with hdis86

The first serious projects I coded in Haskell were compilers, code analyzers, and the like. I think it's a domain that really plays to the strengths of the language. So it makes sense that we'd want a top-notch disassembler library for Haskell.

udis86 is a fast, complete, flexible disassembler for x86 and x86-64 / AMD64. It provides a clean C API, which was no trouble to import using hsc2hs. But any C API is going to feel clunky next to high-level idiomatic Haskell code. So I built two additional layers of wrapping, to make something that will feel like a natural part of your next Haskell app.

You can download my library hdis86 from Hackage, or browse the source at GitHub. By default, a copy of udis86 is embedded in hdis86. If you already have udis86 installed as a shared library, you can link against that instead.

Getting started

Let's try it out. We'll feed in a ByteString of machine code, and pretty-print the result with groom.

$ ghci
λ> :m + Hdis86 Data.ByteString Text.Groom
λ> let code = pack [0xcc, 0xf0, 0xff, 0x44, 0x9e, 0x0f]
λ> Prelude.putStrLn . groom $ disassemble intel64 code
[Inst [] Iint3 [],
 Inst [Lock] Iinc
   [Mem
      (Memory{mSize = Bits32, mBase = Reg64 RSI, mIndex = Reg64 RBX,
              mScale = 4, mOffset = Immediate{iSize = Bits8, iValue = 15}})]]

If you're not familiar with x86, you might be surprised by the range of simple and complicated instructions. The first instruction is a humble int3, which executes a trap to a debugger. It takes no operands and has no prefixes.

The second instruction is an increment of a 32-bit memory region, whose address is computed by summing the value in register RSI, the value in register RBX times a scale factor of 4, and a fixed offset of 15. This instruction also has a lock prefix, which changes its concurrency semantics. Some prefixes have a direct meaning like this; others (like Rex) will change how the disassembler decodes operands.

We can also ask for Intel-style assembly syntax:

λ> let cfg = intel64 { cfgSyntax = SyntaxIntel }
λ> mapM_ (Prelude.putStrLn . mdAssembly) $ disassembleMetadata cfg code
int3 
lock inc dword [rsi+rbx*4+0xf]

Analyzing instructions

If we're just printing code, showing an Instruction is needlessly verbose. The real point of the Instruction type is machine-code analysis, with the help of Haskell's pattern matching capabilities.

Sometimes two fragments of code will differ only in which registers they use. Perhaps two compilers made different choices during register allocation. We'll write a program to detect this.

import Hdis86
import Text.Groom
import Control.Monad
import qualified Data.Map as M
import qualified Data.ByteString as B

When one code fragment uses register X the same way that another fragment uses register Y, we record a constraint X :-> Y:

data Constraint = Register :-> Register

We compare the two code fragments to produce a list of Constraints, or Nothing if they differ in ways beyond register selection. We'll use this helper function to check a list of Boolean conditions:

(==>) :: [Bool] -> a -> Maybe a
ps ==> x = guard (and ps) >> Just x

We compare operands pairwise. Register operands are easy:

operand :: Operand -> Operand -> Maybe [Constraint]
operand (Reg rx) (Reg ry) = Just [rx :-> ry]

For a memory operand, we need to check that the size, scale, and offset are equal:

-- size, base register, index register, scale, offset
operand (Mem (Memory sx bx ix kx ox)) (Mem (Memory sy by iy ky oy))
  = [sx == sy, kx == ky, ox == oy] ==> [bx :-> by, ix :-> iy]

Immediate operands need a similar check for equality, but they don't produce any register constraints:

operand (Ptr   px) (Ptr   py) = [px == py] ==> []
operand (Imm   ix) (Imm   iy) = [ix == ix] ==> []
operand (Jump  ix) (Jump  iy) = [ix == iy] ==> []
operand (Const ix) (Const iy) = [ix == iy] ==> []

In all other cases, we have two different operand constructors, which means they don't match:

operand _ _ = Nothing

To check a pair of instructions, we check their prefixes and opcodes, then check operands pairwise:

inst :: Instruction -> Instruction -> Maybe [Constraint]
inst (Inst px ox rx) (Inst py oy ry) = do
  guard $ and [px == py, ox == oy, length rx == length ry]
  concat `fmap` zipWithM operand rx ry

Once we have a list of Constraints, we need to check it for consistency. If register X maps to Y in one place, it can't map to Z somewhere else:

unify :: [Constraint] -> Maybe (M.Map Register Register)
unify = foldM f M.empty where
  f m (rx :-> ry) = case M.lookup rx m of
    Nothing  -> Just (M.insert rx ry m)
    Just ry' -> [ry == ry'] ==> m

Now we put it all together, checking consistency in both directions:

regMap :: [Instruction] -> [Instruction] -> Maybe (M.Map Register Register)
regMap xs ys = do
  cs <- concat `fmap` zipWithM inst xs ys
  let swap (x :-> y) = (y :-> x)
  _ <- unify $ map swap cs
  unify cs

main :: IO ()
main = putStrLn . groom $ regMap (f prog_a) (f prog_b) where
  f = disassemble intel64

We'll test these two code fragments:

prog_a, prog_b :: B.ByteString
prog_a = B.pack
  [ 0x7e, 0x3a               -- jle  0x3c
  , 0x48, 0x89, 0xf5         -- mov  rbp, rsi
  , 0xbb, 1, 0, 0, 0         -- mov  ebx, 0x1
  , 0x48, 0x8b, 0x7d, 0x08 ] -- mov  rdi, [rbp+0x8]
prog_b = B.pack
  [ 0x7e, 0x3a               -- jle  0x3c
  , 0x48, 0x89, 0xf3         -- mov  rbx, rsi
  , 0xbd, 1, 0, 0, 0         -- mov  ebp, 0x1
  , 0x48, 0x8b, 0x7b, 0x08 ] -- mov  rdi, [rbx+0x8]

And the result is:

$ runhaskell regmap.hs 
Just
  (fromList
     [(RegNone, RegNone), (Reg32 RBX, Reg32 RBP),
      (Reg64 RBP, Reg64 RBX), (Reg64 RSI, Reg64 RSI),
      (Reg64 RDI, Reg64 RDI)])

In reality, this analysis is grossly incomplete. Some instructions have implicit register operands, and some pairs of registers overlap, like EBX and RBX. And we have to understand contextual requirements. It's not okay to remap RAX to RBX if some calling code is expecting a return value in RAX.

The complexity of x86 (not to mention the halting problem) means that binary code analysis will never be easy. Hopefully hdis86 can be one of many useful tools in this domain. As always, suggestions or patches are welcome.

19 comments:

AnonymousMarch 19, 2011 at 8:01 PM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
AlexMarch 20, 2011 at 4:54 AM
The 'elf' package should parse ELF files. With it, hdis86, code gens like Harpy, and LLVM bindings Haskell can turn into quite a nice assembly playpen!
ReplyDelete
Replies
SamiMarch 20, 2011 at 5:22 AM
You might also want to take a look at LLVM libraries. It's been some 6 months since I last looked at it, but it has a disassembler library, in theory machine-independent. I think x86 and x86_64 work quite well, but the next best working target, ARM, was rather incomplete back then. There's apparently also some extended disassembler code there - not sure if the API of that is fixed yet - which allows you to do some pretty advanced stuff at least within basic blocks, I think like querying something like what's the expression for register EBX at the end of the basic block, and getting corresponding LLVM intermediate representation for the code.
ReplyDelete
Replies
Arwaa AzizahApril 19, 2019 at 6:30 PM
Given article is very helpful and very useful for my admin, and pardon me permission to share articles here hopefully helped :

Obat Stenosis Spinal
Cara Menghilangkan Benjolan Di Belakang Telinga
Obat eksim basah paling ampuh
Obat tradisional gabagen pada bayi
ReplyDelete
Replies
ZonahobisayaMarch 22, 2022 at 1:57 AM
Gutt Websäit : Zonahobisaya
Gutt Websäit : Biografi
Gutt Websäit : Zonahobisaya
Gutt Websäit : Zonahobisaya
Gutt Websäit : Zonahobisaya
Gutt Websäit : Resep
Gutt Websäit : Logo
Gutt Websäit : Zonahobisaya
ReplyDelete
Replies
Siora Surgical Pvt. Ltd.June 6, 2022 at 12:03 AM
The first severe projects I coded in Haskell had been compilers, code analyzers, and the like. I suppose it's a domain that without a doubt performs to the strengths of the language. So it makes experience that we would need a pinnacle-notch disassembler library for Haskell. Utegration is the various pinnacle corporations using sap consulting firms in USA with the exclusive characteristic of supplying SAP® solutions in all regions of utility operations. Moreover, they combine consumer experience and billing, managed services, analytics, financials, and assets management in the consumer’s business. The agency can also provide an SAP® enterprise consultant to offer assistance in terms of ERP implementation considering commercial enterprise goals and goals.
ReplyDelete
Replies
AnonymousOctober 22, 2022 at 3:56 AM
총판출장샵
총판출장샵
고고출장샵
심심출장샵
서울출장샵
서울출장샵
홍천출장샵
서울출장샵

ReplyDelete
Replies
AnonymousDecember 3, 2022 at 10:10 PM
당진출장샵
총판출장샵
가평출장샵
수원출장샵
강원도출장샵
청주출장샵
충북출장샵
김포출장샵

ReplyDelete
Replies
pgslot มาแรง ไม่แพ้ ลิเวอร์พูลMarch 4, 2023 at 6:53 AM
pgslot มาแรง ไม่แพ้ ลิเวอร์พูล เพราะเว็บของเรานั้นก็กำลัง ขับเคลื่อนเปรียบเสมือนเครื่องจักรที่กำลังมาแรงที่สุดเพราะเว็บ pg-slot.game ของเรานั้นกำลังนำเสนอเกมให้ทุกท่านที่เข้ามาได้รู้จัก
ReplyDelete
Replies
조조출장샵 고조출장샵February 24, 2024 at 6:05 AM
단밤콜걸
콜걸
서울콜걸
부산콜걸
인천콜걸
광주콜걸
세종콜걸
울산콜걸
ReplyDelete
Replies
Mahmoud AminOctober 23, 2024 at 5:18 AM
ما هو أفضل موقع توصيات الأسهم السعودية
ReplyDelete
Replies
toto-siteDecember 13, 2024 at 2:39 AM
Situs web sing apik kanthi konten sing menarik, iki sing dakkarepake. Thanks kanggo njaga situs web iki, aku bakal ngunjungi.

ReplyDelete
Replies
bsc.news-카지노사이트January 17, 2025 at 6:29 PM

INTERESTING!! you should upload more☺️ this is really good!!!
ReplyDelete
Replies
outlookindiacom-안전놀이터January 17, 2025 at 6:30 PM
Everyone loves it when people come together
and share opinions.
ReplyDelete
Replies
AnonymousMarch 29, 2025 at 2:31 PM
Amazing article! You have a talent for explaining complex ideas in a way that’s easy to understand. Thank you for sharing.
ReplyDelete
Replies
CarolynmaryNovember 12, 2025 at 2:30 AM

Read information now. Read information now
ReplyDelete
Replies
카지노사이트 모음January 11, 2026 at 2:55 AM

Your webpage is fantastic and this is a fantastic inspiring submit

ReplyDelete
Replies
토토사이트 순위January 11, 2026 at 2:56 AM

You did cheerfully empower me because of the ideas you have posted.
ReplyDelete
Replies
토토사이트 순위January 11, 2026 at 2:57 AM
Remarkable posting! A lot of useful details and idea

ReplyDelete
Replies

Add comment

main is usually a function

Saturday, March 19, 2011

x86 disassembly in Haskell with hdis86

Getting started

Analyzing instructions

19 comments:

About Me

Followers