Someone on #haskell
asked whether you could recover a function's name from its value, i.e.
GHCi> name id
"id"
GHCi> name map
"map"
This is easy in some languages. But Haskell is not designed to provide this kind of run-time information. We'll need some non-portable hacks. I tested this code with GHC 6.12.1 on amd64
Linux; see below for portability notes.
How it works
Most values in GHC Haskell are represented by a closure. Closures representing functions are common in functional-language compilers, but GHC extends the idea to cover many other sorts of values.
Closures exist to store data. But the run-time system also needs operational information about each closure: how to garbage-collect it, how to force its evaluation, etc. This information is known at compile time and is shared between many closures. All algebraic values with the same constructor will share this information, as will all function values created from the same lambda in the program's source.
So each closure stores a pointer to an info table, holding this operational information. Info tables are generated at compile time, and stored as part of an executable's read-only data section. This means that they have statically-known addresses, with associated names in the executable's symbol table. We'll use these symbol names to name our functions.
We can dump an executable's symbol table with nm
:
$ nm -f posix foo
...
ghczmprim_GHCziBool_Bool_closure_tbl D 0000000000749978
ghczmprim_GHCziBool_False_closure D 0000000000749970
ghczmprim_GHCziBool_False_static_info T 00000000004e4dd8
ghczmprim_GHCziBool_True_closure D 0000000000749990
ghczmprim_GHCziBool_True_static_info T 00000000004e4d80
ghczmprim_GHCziDebug_debugErrLn1_closure D 00000000007499a0
ghczmprim_GHCziDebug_debugErrLn1_info T 00000000004e4ec0
...
Haskell identifiers can contain characters not allowed in symbol names. GHC uses a name-mangling scheme to build symbol names. For example, the first symbol above decodes to
ghc-prim_GHC.Bool_Bool_closure_tbl
Reading the symbols
We'll use vacuum to inspect heap objects. vacuum can do a lot of cool things, like visualize heap structures from a running Haskell program. We'll only use one of vacuum's functions here.
We'll also use the GHC API to un-mangle symbol names. GHC is a 20-year effort that has evolved alongside the Haskell language. It follows some legacy conventions like a mostly-flat module hierarchy. So the module we need is named simply Encoding
.
import Data.Word
import Data.Maybe
import Text.Printf
import Control.Parallel ( pseq )
import qualified Data.Map as Map
import qualified System.Posix.Files as Posix
import qualified System.Process as Proc
import qualified Foreign.Ptr as Ptr
import qualified GHC.Vacuum as Vac
import qualified Encoding as GHC
We invoke nm
as a subprocess and parse its output:
type Symbols = Map.Map Word String
getSymbols :: IO Symbols
getSymbols = do
exe <- Posix.readSymbolicLink "/proc/self/exe"
out <- Proc.readProcess "nm" ["-f", "posix", exe] ""
let offset = 0x10
let f (sym:_:addr:_) = Just (read ("0x"++addr) - offset, GHC.zDecodeString sym)
f _ = Nothing
return . Map.fromList . catMaybes . map (f . words) . lines $ out
We're using the Linux proc
filesystem to get a symbolic link to our application's executable.
The symbols in memory appear at an address 0x10
= 16 bytes or 2 machine words lower than in the executable's symbol table. I'm not sure why; perhaps it's because of GHC's "tables next to code" optimization.
Resolving a symbol
Once we have the symbol table, looking up a value is relatively easy:
name :: Symbols -> a -> String
name syms x = fromMaybe unk $ Map.lookup ptr syms where
ptr = x `pseq` (fromIntegral . Ptr.ptrToWordPtr . Vac.getInfoPtr $ x)
unk = printf "<unknown info table at 0x%016x>" ptr
We use vacuum to get the value's info table pointer, convert this to a Word
, then look it up in the symbol table.
We explicitly evaluate x
with pseq
, to avoid seeing a thunk.
Testing it
We'll test with
$ ghc --make name.hs -package ghc
$ ./name
Each test below is commented with the expected output. First, let's try a few non-function values:
main :: IO ()
main = do
syms <- getSymbols
let test = putStrLn . name syms
test 3 -- integer-gmp_GHC.Integer.Type_S#_con_info
test (3 :: Int) -- ghc-prim_GHC.Types_I#_static_info
test "xyz" -- ghc-prim_GHC.Types_:_con_info
GHC defaults to 3 :: Integer
, as -Wall
will tell you. As we see, Integer
and Int
are both implemented as algebraic data:
data Integer
= S# Int#
| J# Int# ByteArray#
data Int = I# Int#
The string "xyz"
is a list built out of (:)
cells.
Next let's try a few functions:
test map -- base_GHC.Base_map_info
test getChar -- base_System.IO_getChar_info
test (+) -- integer-gmp_GHC.Integer_plusInteger_info
Not bad! (+)
defaults to operating on Integer
, and GHC inlines the type class dictionary, giving us the underlying plusInteger
function.
Now let's see the limits of this technique:
test (\_ -> 'x') -- s1jD_info
test (const 'x') -- stg_PAP_info
test test -- stg_PAP_info
Our lambda expression gets a useless compiler-generated name. The application of const
is worse; it uses an info table common to all partial applications. However, we could use vacuum to follow the fields of the PAP
closure, which I'll leave as an exercise to the reader. ;)
test
itself is also a partial application. It's defined by applying two arguments to the function (.)
defined as
(.) f g x = f (g x)
If we eta-expand test
:
let test x = putStrLn $ name syms x
then we'll get another compiler-generated name like s1mh_info
.
Notes
This is a hack, and probably not suitable for any serious purpose. Shelling out to nm
to get a symbol table is particularly ugly. I tried to use bindings to BFD, but ran into some segfaults.
The above code will work only on 64-bit machines, but could be adapted for 32-bit. I bet the magic offset would change. It works on GHC 6.12.1, and should work on other recent versions, if you can get vacuum to build.
It definitely requires a Unix system, and specifically Linux or something emulating Linux's proc
filesystem. You'll need nm
from GNU Binutils, which is standard on a system configured for C development.
It won't work if you run strip
on your binaries...
Nice hack but maybe the Haskell Prime team will see this and consider some changes to the next spec. Maybe.
ReplyDeleteI think you have a bug. "We explicitly evaluate x with seq, to avoid seeing a thunk." is incorrect.
ReplyDeleteThe "x `seq` foo" does not promise that x is in WHNF before starting to evaluate foo. You need `pseq` instead.
See section 7.18.4 of the GHC user manual: "Note that we use pseq rather than seq. The two are almost equivalent, but differ in their runtime behaviour in a subtle way: seq can evaluate its arguments in either order, but pseq is required to evaluate its first argument before its second, which makes it more suitable for controlling the evaluation order in conjunction with par."
@TuringTest:
ReplyDeleteI was wondering about that. Since you've confirmed I'm not crazy, I'll change it.
This article is very good. We would like to leave a good article. To read more articles, click here >>>
ReplyDeletewww.bloglovin.com
5e68b4dadffdc.site123.me
babyedok.wixsite.com/mysite
manop1012.blogspot.com
medium.com/@babyedok
Such an informative article that you shared with us.
ReplyDeletehttps://premiumcrack.com/traktor-pro-crack/
I will be looking forward to your next post. Thank you
ReplyDelete"เจ็บแต่จบ" ติ๊ก บิ๊กบราเธอร์ ตัดใจ
This is my blog. Click here.
ReplyDeleteเว็บแทงหวยออนไลน์ สมัครเลยตอนนี้ "
FL Studio Crack\
ReplyDeleteFL Studio Full Version receives further updates and fixes them automatically. The user should have the features they selected, fix bugs, continue to develop and update while we develop the FL Studio Reg Keys. It is clear to note that as soon as we start using them, we will feel their imagination and reaction. Additionally, we find that this software shows a person that they can do more with their fingers.
อยากรู้ไหมทำไมถึงแนะนำให้ใช้บริการเดิมพันออนไลน์กับ 789bet ซึ่งเรามีหตุผลและคำแนะนำที่น่าสนใจ ที่เราอยากจะบอกคุณว่าทำไมถึงควรทำเงินกับการบริการของเราแนะนำว่า ที่นี่จัดบริการให้คุณครบแน่นอนเพราะบริการที่ทันสมัย พัฒนาการบริการตลอดเวลา และมีพันธมิตรเป็นคาสิโนดังหลายแห่งร่วมบริการ สมาชิกสามารถเล่นได้ทั้งวันพร้อมทั้งยังมีเทคโนโลยีสมัยใหม่อยู่ตลอดเวลา คลิกเข้าเล่นที่นี่ https://www.789betnow.com
ReplyDeleteGutt Websäit : Zonahobisaya
ReplyDeleteGutt Websäit : Zonahobisaya
Gutt Websäit : Zonahobisaya
Gutt Websäit : Zonahobisaya
Gutt Websäit : Zonahobisaya
Gutt Websäit : Zonahobisaya
Gutt Websäit : Resep Masakan
Gutt Websäit : Zonahobisaya
Great article,thanks for sharing with us!
ReplyDelete영월출장샵
ReplyDelete서귀포출장샵
계룡출장샵
당진출장샵
서귀포출장샵
경기도출장샵
경기도출장샵
충북출장샵
당진출장샵
ReplyDelete총판출장샵
가평출장샵
수원출장샵
강원도출장샵
청주출장샵
충북출장샵
김포출장샵
pg slot asia สล็อต เป็นเว็บไซต์พนันสล็อตออนไลน์ ลำดับหนึ่ง ที่คนอีกหลายๆคนนั้นต้องเคยได้ฟังชื่อนี่กันมาบ้างแล้วเพราะเว็บไซต์ของพวกเรานั้น PG SLOT เปิดให้บริการโกรธยะหนึ่ง
ReplyDeleteking kong slot สามารถเล่นบนสมาร์โฟน ที่ได้ก็ได้ พวกเราเอาเกมยอดนิยม เกมสล็อตที่ โบนัสแตกบ่อยมากที่สุด pg เป็นผู้ให้บริการรองรับความต้องการของผู้เล่น ได้เป็นอย่างดี 24 ชม.
ReplyDeleteThis article give lot of inspiration
ReplyDeletePost more good contents pls
ReplyDeleteThis was an excellent page.
ReplyDeletepgsoft pgslot เกมยอดฮิตรายปี 2023 ที่ได้รับเสียงตอบรับจากแฟนเกมสล็อตอย่างล้นหลาม pgslot ด้วยต้นแบบการเล่นที่ไม่ซ้ำใครและก็เข้าใจง่าย เล่นได้ทั้งยังในแนวดิ่งแล้วก็แนวระดับ
ReplyDeletepg soft เครดิตฟรี50 ในคู่มือฉบับสมบูรณ์นี้ เราจะสำรวจทุกสิ่งที่คุณจำเป็นต้องรู้เกี่ยวกับการรับเครดิตฟรี รวมถึงวิธีการทำงาน PG วิธีหาเครดิตฟรี และคำถามที่พบบ่อยคุณเป็นแฟนของเกมการพนันออนไลน์และคาสิโนหรือไม่
ReplyDeleteHelpful info here. Please stay us updated like this. Thanks for sharing this
ReplyDeleteGreat site. Plenty of helpful information here. I am sending it to several buddies
ReplyDeleteWow that was unusual. Just wanted to say wonderful blog! Keep it up
ReplyDeleteThank goodness I found it. You’ve made my day! Thanks again for this information
ReplyDelete