The first serious projects I coded in Haskell were compilers, code analyzers, and the like. I think it's a domain that really plays to the strengths of the language. So it makes sense that we'd want a top-notch disassembler library for Haskell.
udis86
is a fast, complete, flexible disassembler for x86 and x86-64 / AMD64. It provides a clean C API, which was no trouble to import using hsc2hs
. But any C API is going to feel clunky next to high-level idiomatic Haskell code. So I built two additional layers of wrapping, to make something that will feel like a natural part of your next Haskell app.
You can download my library hdis86
from Hackage, or browse the source at GitHub. By default, a copy of udis86
is embedded in hdis86
. If you already have udis86
installed as a shared library, you can link against that instead.
Getting started
Let's try it out. We'll feed in a ByteString
of machine code, and pretty-print the result with groom
.
$ ghci
λ> :m + Hdis86 Data.ByteString Text.Groom
λ> let code = pack [0xcc, 0xf0, 0xff, 0x44, 0x9e, 0x0f]
λ> Prelude.putStrLn . groom $ disassemble intel64 code
[Inst [] Iint3 [],
Inst [Lock] Iinc
[Mem
(Memory{mSize = Bits32, mBase = Reg64 RSI, mIndex = Reg64 RBX,
mScale = 4, mOffset = Immediate{iSize = Bits8, iValue = 15}})]]
If you're not familiar with x86, you might be surprised by the range of simple and complicated instructions. The first instruction is a humble int3
, which executes a trap to a debugger. It takes no operands and has no prefixes.
The second instruction is an increment of a 32-bit memory region, whose address is computed by summing the value in register RSI
, the value in register RBX
times a scale factor of 4, and a fixed offset of 15. This instruction also has a lock prefix, which changes its concurrency semantics. Some prefixes have a direct meaning like this; others (like Rex
) will change how the disassembler decodes operands.
We can also ask for Intel-style assembly syntax:
λ> let cfg = intel64 { cfgSyntax = SyntaxIntel }
λ> mapM_ (Prelude.putStrLn . mdAssembly) $ disassembleMetadata cfg code
int3
lock inc dword [rsi+rbx*4+0xf]
Analyzing instructions
If we're just printing code, show
ing an Instruction
is needlessly verbose. The real point of the Instruction
type is machine-code analysis, with the help of Haskell's pattern matching capabilities.
Sometimes two fragments of code will differ only in which registers they use. Perhaps two compilers made different choices during register allocation. We'll write a program to detect this.
import Hdis86
import Text.Groom
import Control.Monad
import qualified Data.Map as M
import qualified Data.ByteString as B
When one code fragment uses register X the same way that another fragment uses register Y, we record a constraint X :-> Y
:
data Constraint = Register :-> Register
We compare the two code fragments to produce a list of Constraint
s, or Nothing
if they differ in ways beyond register selection. We'll use this helper function to check a list of Boolean conditions:
(==>) :: [Bool] -> a -> Maybe a
ps ==> x = guard (and ps) >> Just x
We compare operands pairwise. Register operands are easy:
operand :: Operand -> Operand -> Maybe [Constraint]
operand (Reg rx) (Reg ry) = Just [rx :-> ry]
For a memory operand, we need to check that the size, scale, and offset are equal:
-- size, base register, index register, scale, offset
operand (Mem (Memory sx bx ix kx ox)) (Mem (Memory sy by iy ky oy))
= [sx == sy, kx == ky, ox == oy] ==> [bx :-> by, ix :-> iy]
Immediate operands need a similar check for equality, but they don't produce any register constraints:
operand (Ptr px) (Ptr py) = [px == py] ==> []
operand (Imm ix) (Imm iy) = [ix == ix] ==> []
operand (Jump ix) (Jump iy) = [ix == iy] ==> []
operand (Const ix) (Const iy) = [ix == iy] ==> []
In all other cases, we have two different operand constructors, which means they don't match:
operand _ _ = Nothing
To check a pair of instructions, we check their prefixes and opcodes, then check operands pairwise:
inst :: Instruction -> Instruction -> Maybe [Constraint]
inst (Inst px ox rx) (Inst py oy ry) = do
guard $ and [px == py, ox == oy, length rx == length ry]
concat `fmap` zipWithM operand rx ry
Once we have a list of Constraint
s, we need to check it for consistency. If register X maps to Y in one place, it can't map to Z somewhere else:
unify :: [Constraint] -> Maybe (M.Map Register Register)
unify = foldM f M.empty where
f m (rx :-> ry) = case M.lookup rx m of
Nothing -> Just (M.insert rx ry m)
Just ry' -> [ry == ry'] ==> m
Now we put it all together, checking consistency in both directions:
regMap :: [Instruction] -> [Instruction] -> Maybe (M.Map Register Register)
regMap xs ys = do
cs <- concat `fmap` zipWithM inst xs ys
let swap (x :-> y) = (y :-> x)
_ <- unify $ map swap cs
unify cs
main :: IO ()
main = putStrLn . groom $ regMap (f prog_a) (f prog_b) where
f = disassemble intel64
We'll test these two code fragments:
prog_a, prog_b :: B.ByteString
prog_a = B.pack
[ 0x7e, 0x3a -- jle 0x3c
, 0x48, 0x89, 0xf5 -- mov rbp, rsi
, 0xbb, 1, 0, 0, 0 -- mov ebx, 0x1
, 0x48, 0x8b, 0x7d, 0x08 ] -- mov rdi, [rbp+0x8]
prog_b = B.pack
[ 0x7e, 0x3a -- jle 0x3c
, 0x48, 0x89, 0xf3 -- mov rbx, rsi
, 0xbd, 1, 0, 0, 0 -- mov ebp, 0x1
, 0x48, 0x8b, 0x7b, 0x08 ] -- mov rdi, [rbx+0x8]
And the result is:
$ runhaskell regmap.hs
Just
(fromList
[(RegNone, RegNone), (Reg32 RBX, Reg32 RBP),
(Reg64 RBP, Reg64 RBX), (Reg64 RSI, Reg64 RSI),
(Reg64 RDI, Reg64 RDI)])
In reality, this analysis is grossly incomplete. Some instructions have implicit register operands, and some pairs of registers overlap, like EBX
and RBX
. And we have to understand contextual requirements. It's not okay to remap RAX
to RBX
if some calling code is expecting a return value in RAX
.
The complexity of x86 (not to mention the halting problem) means that binary code analysis will never be easy. Hopefully hdis86
can be one of many useful tools in this domain. As always, suggestions or patches are welcome.
This comment has been removed by a blog administrator.
ReplyDeleteThe 'elf' package should parse ELF files. With it, hdis86, code gens like Harpy, and LLVM bindings Haskell can turn into quite a nice assembly playpen!
ReplyDeleteYou might also want to take a look at LLVM libraries. It's been some 6 months since I last looked at it, but it has a disassembler library, in theory machine-independent. I think x86 and x86_64 work quite well, but the next best working target, ARM, was rather incomplete back then. There's apparently also some extended disassembler code there - not sure if the API of that is fixed yet - which allows you to do some pretty advanced stuff at least within basic blocks, I think like querying something like what's the expression for register EBX at the end of the basic block, and getting corresponding LLVM intermediate representation for the code.
ReplyDeleteGiven article is very helpful and very useful for my admin, and pardon me permission to share articles here hopefully helped :
ReplyDeleteObat Stenosis Spinal
Cara Menghilangkan Benjolan Di Belakang Telinga
Obat eksim basah paling ampuh
Obat tradisional gabagen pada bayi
Gutt Websäit : Zonahobisaya
ReplyDeleteGutt Websäit : Biografi
Gutt Websäit : Zonahobisaya
Gutt Websäit : Zonahobisaya
Gutt Websäit : Zonahobisaya
Gutt Websäit : Resep
Gutt Websäit : Logo
Gutt Websäit : Zonahobisaya
The first severe projects I coded in Haskell had been compilers, code analyzers, and the like. I suppose it's a domain that without a doubt performs to the strengths of the language. So it makes experience that we would need a pinnacle-notch disassembler library for Haskell. Utegration is the various pinnacle corporations using sap consulting firms in USA with the exclusive characteristic of supplying SAP® solutions in all regions of utility operations. Moreover, they combine consumer experience and billing, managed services, analytics, financials, and assets management in the consumer’s business. The agency can also provide an SAP® enterprise consultant to offer assistance in terms of ERP implementation considering commercial enterprise goals and goals.
ReplyDelete총판출장샵
ReplyDelete총판출장샵
고고출장샵
심심출장샵
서울출장샵
서울출장샵
홍천출장샵
서울출장샵
당진출장샵
ReplyDelete총판출장샵
가평출장샵
수원출장샵
강원도출장샵
청주출장샵
충북출장샵
김포출장샵
pgslot มาแรง ไม่แพ้ ลิเวอร์พูล เพราะเว็บของเรานั้นก็กำลัง ขับเคลื่อนเปรียบเสมือนเครื่องจักรที่กำลังมาแรงที่สุดเพราะเว็บ pg-slot.game ของเรานั้นกำลังนำเสนอเกมให้ทุกท่านที่เข้ามาได้รู้จัก
ReplyDelete단밤콜걸
ReplyDelete콜걸
서울콜걸
부산콜걸
인천콜걸
광주콜걸
세종콜걸
울산콜걸
ما هو أفضل موقع توصيات الأسهم السعودية
ReplyDelete