This will be part of a multi-part post on Go binaries and reverse engineering them. Part 2
At the time of writing, the most recent major version of Go is Go 1.13
Why Go?
Go has been part of a new wave of malware for a variety of reasons. Large binary size, relative portability, and lack of tooling are some of the main offenders. We are working to resolve the lack of tooling to provide better insight on what Go binaries are doing.
The _start
and main
functions on a Go binary instantiate the Go runtime
system (garbage collector, functions to control goroutines, etc). The real
start of a Go binary is in main.main
.
The interface{} type
Every non-primitive type in Go can be treated as an interface{}
. Internally,
this is referred to as an iface
:
type iface struct {
tab *itab
data unsafe.Pointer
}
As you can see in the code snippet, ifaces contain two pointers: a pointer to the actual data of the object and a pointer to the information table (called the itab within Go source) which contains metadata about the type.
type itab struct {
inter *interfacetype
_type *_type
hash uint32 // copy of _type.hash. Used for type switches.
_ [4]byte
fun [1]uintptr // variable sized. fun[0]==0 means _type does not implement inter.
}
The fun
field is a virtual dispatch table for the object.
The interfacetype and type struct
(also referred to as the rtype struct)
contain metadata about the type of the variable:
type interfacetype struct {
typ _type
pkgpath name
mhdr []imethod
}
type _type struct {
size uintptr
ptrdata uintptr // size of memory prefix holding all pointers
hash uint32
tflag tflag
align uint8
fieldalign uint8
kind uint8
alg *typeAlg
// gcdata stores the GC type data for the garbage collector.
// If the KindGCProg bit is set in kind, gcdata is a GC program.
// Otherwise it is a ptrmask bitmap. See mbitmap.go for details.
gcdata *byte
str nameOff
ptrToThis typeOff
}
interfacetype
serves primarily as a wrapper around the _type
struct that adds additional metadata to it.
The kind
field is an enum that gives us exact types.
Note the field types that are offset types: nameOff
and typeOff
. These are
offsets starting from the .typelink
section of the binary. In particular, the
str
field contains the symbol of the type if the tflag
field is set.
An example of the interfacetype of a string. When reversing, we can often reconstruct itabs of variables to find the programmer’s original string for them
The Go compiler sometimes optimizes this structure away: Primitive types whose data field fit in one word of memory are directly stored in the data field of an interface instead of being a pointer to data.
0x2
gets directly placed on rsp+0x20
(the associated itab for the variable is in rsp+0x18
)
Types whose itab contains nothing but their type struct (e.x. from having no
methods that require dynamic dispatch) store a pointer to the type struct
directly instead of an itab. This is referred to within the Go source as an
eface. The primary reason for this
is to support the generic empty interface{}
type.
To instantiate this structure, Go uses the ConvXXX family of functions which initialize and also allow for conversion between the different types of potential optimizations. The XXX in the functions denote the type of transformation. For example, runtime.convT2E converts a type struct to an eface.
Some common prefixes for convXXX:
Prefix | Type |
---|---|
I | iface |
E | eface |
T | rtype |
These convXXX calls, from a reversing standpoint, denote type conversions of variables.
Function Calling Conventions
When a function or goroutine gets called, it first allocates 8kb of memory to the stack.
Every function has a preamble that checks if that amount of stack is sufficient.
The routine calls runtime.morestack
to allocate more space as needed, after which it
will copy over the contents of the stack over to the new stack space.
Function prologue to give main enough stack space
Historically, this was implemented with each goroutine maintaining its own
segmented stack within the heap. Now, each Go function instead owns its
sub-section of the stack, which morestack
dynamically resizes and reallocates as
needed.
Arguments and return values are placed on the stack owned by the caller’s function. The return value is pre-allocated to the space immediately after arguments and the callee will write to the reserved to return values. By doing this, Go can support multiple return values, since more stack space can be allocated after arguments for more return values.
Here, fmt.ErrorF
’s arguments are on the stack starting from rsp+110h
. Its
return values are in rsp+110h+var_E8
and rsp+110h+var_E8+8
. Since they’re
immediately used, the compiler inserts movs to place them in registers. Note
that the entire area from rsp+110h
is actually owned by the caller -
fmt.ErrorF
will actually use morestack to own an entirely different offset of
rsp
that’s potentially very far from rsp+110h
The calling convention in Go is based on Plan9’s calling convention. Registers are caller-saved onto the stack. Arguments to functions are also allocated stack space, and the return values of the functions are also all allocated stack space.
We can see interactions with interfaces and their itabs for the calling convention
Every parameter will be visible as two associated “parameters”: one containing the pointer to its itab and the other containing a pointer to its data section. Looking at these together, we can figure out types and names of parameters and return values in a function
Note that sometimes the compiler will also optimize away the pointer to the itab entirely if the parameter is a constant value that fits in a single word (e.x. a constant int).
Goroutines and Defer
The special implementation of seperate function stacks is intended to
support goroutines, which are meant to be “lightweight threads”.
The actual machinery used to support the go
keyword uses the
runtime.newproc function:
func newproc(siz int32, fn *funcval)
doNothing()
’s itable (unk_106F600
) is used as a parameter for runtime.newproc
The expected stack layout for calling a goroutine is as follows:
Stack Layout |
---|
total size of all arguments (8 * number of arguments) |
function pointer of the goroutine to call |
pointer to argument 1’s interface |
pointer to argument 2’s interface |
… |
pointer to the nth argument’s interface |
The Go runtime will rely on type information present in each argument’s itable as per the aforementioned calling convention.
This pattern is similarly used for implementation of the defer
keyword. A
function pointer and the size of the arguments that proceed it are given to
the runtime.deferproc
function, which adds the function to a defer stack.
The compiler also adds an additional runtime.deferreturn
stub at the end of
a function that uses defer
. This deferreturn function pops through the defer
stack to execute our deferred functions.
The same machinery as our doNothing()
example, where its itable (unk_10CA900
) is used as a parameter
Update: I’ve been made aware that in Go 1.14 and on, deferred functions are actually be occasionally inlined at the end of the function rather than being passed to a defer stack. The old pattern will still exist in some cases for defers that can happen more than once (e.x. in a loop). However, there’s a new type of defer:
A function with defers will maintain a bitvector with each bit representing a different deferred function in the function. If the deferred function is called, its corresponding defer bit will be set in the bitvector. At the end of the function, the bitvector is checked and all functions whose defer bit is set are executed.
.pclntab
The Line Tables in a Go binary map functions to lines in the source. It’s referenced by the runtime during panics to give relevant stack trace information, and is also used by the Garbage collector to traverse the stack.
From Go 1.2 and on, there is a global LineTable, called the pclntab
, kept in
the .gopclntab
section of the binary.
Every function will have an associated entry in the .pclntab
, which also includes
a string containing the function’s name. This allows us to recover function names from
trivially stripped binaries
(See @timstrazz’ previous work on doing this).
Channels
All of the channel features in Go are syntactic sugar for the following function calls:
Syntactic Sugar | Runtime Function |
---|---|
make(chan <type>) |
runtime.makechan |
<-channel |
runtime.chanrecvl |
channel <- data |
runtime.sendchanl |