Reverse Engineering Go, Part I

This will be part of a multi-part post on Go binaries and reverse engineering them. Part 2

At the time of writing, the most recent major version of Go is Go 1.13

Why Go?

Go has been part of a new wave of malware for a variety of reasons. Large binary size, relative portability, and lack of tooling are some of the main offenders. We are working to resolve the lack of tooling to provide better insight on what Go binaries are doing.

The _start and main functions on a Go binary instantiate the Go runtime system (garbage collector, functions to control goroutines, etc). The real start of a Go binary is in main.main.

The interface{} type

Every non-primitive type in Go can be treated as an interface{}. Internally, this is referred to as an iface:

type iface struct {
	tab  *itab
	data unsafe.Pointer
}

As you can see in the code snippet, ifaces contain two pointers: a pointer to the actual data of the object and a pointer to the information table (called the itab within Go source) which contains metadata about the type.

type itab struct {
	inter *interfacetype
	_type *_type
	hash  uint32 // copy of _type.hash. Used for type switches.
	_     [4]byte
	fun   [1]uintptr // variable sized. fun[0]==0 means _type does not implement inter.
}

The fun field is a virtual dispatch table for the object. The interfacetype and type struct (also referred to as the rtype struct) contain metadata about the type of the variable:

type interfacetype struct {
	typ     _type
	pkgpath name
	mhdr    []imethod
}

type _type struct {
	size       uintptr
	ptrdata    uintptr // size of memory prefix holding all pointers
	hash       uint32
	tflag      tflag
	align      uint8
	fieldalign uint8
	kind       uint8
	alg        *typeAlg
	// gcdata stores the GC type data for the garbage collector.
	// If the KindGCProg bit is set in kind, gcdata is a GC program.
	// Otherwise it is a ptrmask bitmap. See mbitmap.go for details.
	gcdata    *byte
	str       nameOff
	ptrToThis typeOff
}

interfacetype serves primarily as a wrapper around the _type struct that adds additional metadata to it.

The kind field is an enum that gives us exact types.

Note the field types that are offset types: nameOff and typeOff. These are offsets starting from the .typelink section of the binary. In particular, the str field contains the symbol of the type if the tflag field is set.

An example of the interfacetype of a string. When reversing, we can often reconstruct itabs of variables to find the programmer’s original string for them

The Go compiler sometimes optimizes this structure away: Primitive types whose data field fit in one word of memory are directly stored in the data field of an interface instead of being a pointer to data.

0x2 gets directly placed on rsp+0x20 (the associated itab for the variable is in rsp+0x18)

Types whose itab contains nothing but their type struct (e.x. from having no methods that require dynamic dispatch) store a pointer to the type struct directly instead of an itab. This is referred to within the Go source as an eface. The primary reason for this is to support the generic empty interface{} type.

To instantiate this structure, Go uses the ConvXXX family of functions which initialize and also allow for conversion between the different types of potential optimizations. The XXX in the functions denote the type of transformation. For example, runtime.convT2E converts a type struct to an eface.

Some common prefixes for convXXX:

Prefix	Type
I	iface
E	eface
T	rtype

These convXXX calls, from a reversing standpoint, denote type conversions of variables.

Function Calling Conventions

When a function or goroutine gets called, it first allocates 8kb of memory to the stack. Every function has a preamble that checks if that amount of stack is sufficient. The routine calls runtime.morestack to allocate more space as needed, after which it will copy over the contents of the stack over to the new stack space.

Function prologue to give main enough stack space

Historically, this was implemented with each goroutine maintaining its own segmented stack within the heap. Now, each Go function instead owns its sub-section of the stack, which morestack dynamically resizes and reallocates as needed.

Arguments and return values are placed on the stack owned by the caller’s function. The return value is pre-allocated to the space immediately after arguments and the callee will write to the reserved to return values. By doing this, Go can support multiple return values, since more stack space can be allocated after arguments for more return values.

Here, fmt.ErrorF’s arguments are on the stack starting from rsp+110h. Its return values are in rsp+110h+var_E8 and rsp+110h+var_E8+8. Since they’re immediately used, the compiler inserts movs to place them in registers. Note that the entire area from rsp+110h is actually owned by the caller - fmt.ErrorF will actually use morestack to own an entirely different offset of rsp that’s potentially very far from rsp+110h

The calling convention in Go is based on Plan9’s calling convention. Registers are caller-saved onto the stack. Arguments to functions are also allocated stack space, and the return values of the functions are also all allocated stack space.

We can see interactions with interfaces and their itabs for the calling convention

Every parameter will be visible as two associated “parameters”: one containing the pointer to its itab and the other containing a pointer to its data section. Looking at these together, we can figure out types and names of parameters and return values in a function

Note that sometimes the compiler will also optimize away the pointer to the itab entirely if the parameter is a constant value that fits in a single word (e.x. a constant int).

Goroutines and Defer

The special implementation of seperate function stacks is intended to support goroutines, which are meant to be “lightweight threads”. The actual machinery used to support the go keyword uses the runtime.newproc function:

func newproc(siz int32, fn *funcval)

doNothing()’s itable (unk_106F600) is used as a parameter for runtime.newproc

The expected stack layout for calling a goroutine is as follows:

Stack Layout
total size of all arguments (8 * number of arguments)
function pointer of the goroutine to call
pointer to argument 1’s interface
pointer to argument 2’s interface
…
pointer to the nth argument’s interface

The Go runtime will rely on type information present in each argument’s itable as per the aforementioned calling convention.

This pattern is similarly used for implementation of the defer keyword. A function pointer and the size of the arguments that proceed it are given to the runtime.deferproc function, which adds the function to a defer stack. The compiler also adds an additional runtime.deferreturn stub at the end of a function that uses defer. This deferreturn function pops through the defer stack to execute our deferred functions.

The same machinery as our doNothing() example, where its itable (unk_10CA900) is used as a parameter

Update: I’ve been made aware that in Go 1.14 and on, deferred functions are actually be occasionally inlined at the end of the function rather than being passed to a defer stack. The old pattern will still exist in some cases for defers that can happen more than once (e.x. in a loop). However, there’s a new type of defer:

A function with defers will maintain a bitvector with each bit representing a different deferred function in the function. If the deferred function is called, its corresponding defer bit will be set in the bitvector. At the end of the function, the bitvector is checked and all functions whose defer bit is set are executed.

.pclntab

The Line Tables in a Go binary map functions to lines in the source. It’s referenced by the runtime during panics to give relevant stack trace information, and is also used by the Garbage collector to traverse the stack.

From Go 1.2 and on, there is a global LineTable, called the pclntab, kept in the .gopclntab section of the binary.

Every function will have an associated entry in the .pclntab, which also includes a string containing the function’s name. This allows us to recover function names from trivially stripped binaries (See @timstrazz’ previous work on doing this).

Channels

All of the channel features in Go are syntactic sugar for the following function calls:

Syntactic Sugar	Runtime Function
`make(chan <type>)`	`runtime.makechan`
`<-channel`	`runtime.chanrecvl`
`channel <- data`	`runtime.sendchanl`