Text file
src/cmd/compile/abi-internal.md
1# Go internal ABI specification
2
3Self-link: [go.dev/s/regabi](https://go.dev/s/regabi)
4
5This document describes Go’s internal application binary interface
6(ABI), known as ABIInternal.
7Go's ABI defines the layout of data in memory and the conventions for
8calling between Go functions.
9This ABI is *unstable* and will change between Go versions.
10If you’re writing assembly code, please instead refer to Go’s
11[assembly documentation](/doc/asm.html), which describes Go’s stable
12ABI, known as ABI0.
13
14All functions defined in Go source follow ABIInternal.
15However, ABIInternal and ABI0 functions are able to call each other
16through transparent *ABI wrappers*, described in the [internal calling
17convention proposal](https://golang.org/design/27539-internal-abi).
18
19Go uses a common ABI design across all architectures.
20We first describe the common ABI, and then cover per-architecture
21specifics.
22
23*Rationale*: For the reasoning behind using a common ABI across
24architectures instead of the platform ABI, see the [register-based Go
25calling convention proposal](https://golang.org/design/40724-register-calling).
26
27## Memory layout
28
29Go's built-in types have the following sizes and alignments.
30Many, though not all, of these sizes are guaranteed by the [language
31specification](/doc/go_spec.html#Size_and_alignment_guarantees).
32Those that aren't guaranteed may change in future versions of Go (for
33example, we've considered changing the alignment of int64 on 32-bit).
34
35| Type | 64-bit | | 32-bit | |
36|-----------------------------|--------|-------|--------|-------|
37| | Size | Align | Size | Align |
38| bool, uint8, int8 | 1 | 1 | 1 | 1 |
39| uint16, int16 | 2 | 2 | 2 | 2 |
40| uint32, int32 | 4 | 4 | 4 | 4 |
41| uint64, int64 | 8 | 8 | 8 | 4 |
42| int, uint | 8 | 8 | 4 | 4 |
43| float32 | 4 | 4 | 4 | 4 |
44| float64 | 8 | 8 | 8 | 4 |
45| complex64 | 8 | 4 | 8 | 4 |
46| complex128 | 16 | 8 | 16 | 4 |
47| uintptr, *T, unsafe.Pointer | 8 | 8 | 4 | 4 |
48
49The types `byte` and `rune` are aliases for `uint8` and `int32`,
50respectively, and hence have the same size and alignment as these
51types.
52
53The layout of `map`, `chan`, and `func` types is equivalent to *T.
54
55To describe the layout of the remaining composite types, we first
56define the layout of a *sequence* S of N fields with types
57t<sub>1</sub>, t<sub>2</sub>, ..., t<sub>N</sub>.
58We define the byte offset at which each field begins relative to a
59base address of 0, as well as the size and alignment of the sequence
60as follows:
61
62```
63offset(S, i) = 0 if i = 1
64 = align(offset(S, i-1) + sizeof(t_(i-1)), alignof(t_i))
65alignof(S) = 1 if N = 0
66 = max(alignof(t_i) | 1 <= i <= N)
67sizeof(S) = 0 if N = 0
68 = align(offset(S, N) + sizeof(t_N), alignof(S))
69```
70
71Where sizeof(T) and alignof(T) are the size and alignment of type T,
72respectively, and align(x, y) rounds x up to a multiple of y.
73
74The `interface{}` type is a sequence of 1. a pointer to the runtime type
75description for the interface's dynamic type and 2. an `unsafe.Pointer`
76data field.
77Any other interface type (besides the empty interface) is a sequence
78of 1. a pointer to the runtime "itab" that gives the method pointers and
79the type of the data field and 2. an `unsafe.Pointer` data field.
80An interface can be "direct" or "indirect" depending on the dynamic
81type: a direct interface stores the value directly in the data field,
82and an indirect interface stores a pointer to the value in the data
83field.
84An interface can only be direct if the value consists of a single
85pointer word.
86
87An array type `[N]T` is a sequence of N fields of type T.
88
89The slice type `[]T` is a sequence of a `*[cap]T` pointer to the slice
90backing store, an `int` giving the `len` of the slice, and an `int`
91giving the `cap` of the slice.
92
93The `string` type is a sequence of a `*[len]byte` pointer to the
94string backing store, and an `int` giving the `len` of the string.
95
96A struct type `struct { f1 t1; ...; fM tM }` is laid out as the
97sequence t1, ..., tM, tP, where tP is either:
98
99- Type `byte` if sizeof(tM) = 0 and any of sizeof(t*i*) ≠ 0.
100- Empty (size 0 and align 1) otherwise.
101
102The padding byte prevents creating a past-the-end pointer by taking
103the address of the final, empty fN field.
104
105Note that user-written assembly code should generally not depend on Go
106type layout and should instead use the constants defined in
107[`go_asm.h`](/doc/asm.html#data-offsets).
108
109## Function call argument and result passing
110
111Function calls pass arguments and results using a combination of the
112stack and machine registers.
113Each argument or result is passed either entirely in registers or
114entirely on the stack.
115Because access to registers is generally faster than access to the
116stack, arguments and results are preferentially passed in registers.
117However, any argument or result that contains a non-trivial array or
118does not fit entirely in the remaining available registers is passed
119on the stack.
120
121Each architecture defines a sequence of integer registers and a
122sequence of floating-point registers.
123At a high level, arguments and results are recursively broken down
124into values of base types and these base values are assigned to
125registers from these sequences.
126
127Arguments and results can share the same registers, but do not share
128the same stack space.
129Beyond the arguments and results passed on the stack, the caller also
130reserves spill space on the stack for all register-based arguments
131(but does not populate this space).
132
133The receiver, arguments, and results of function or method F are
134assigned to registers or the stack using the following algorithm:
135
1361. Let NI and NFP be the length of integer and floating-point register
137 sequences defined by the architecture.
138 Let I and FP be 0; these are the indexes of the next integer and
139 floating-point register.
140 Let S, the type sequence defining the stack frame, be empty.
1411. If F is a method, assign F’s receiver.
1421. For each argument A of F, assign A.
1431. Add a pointer-alignment field to S. This has size 0 and the same
144 alignment as `uintptr`.
1451. Reset I and FP to 0.
1461. For each result R of F, assign R.
1471. Add a pointer-alignment field to S.
1481. For each register-assigned receiver and argument of F, let T be its
149 type and add T to the stack sequence S.
150 This is the argument's (or receiver's) spill space and will be
151 uninitialized at the call.
1521. Add a pointer-alignment field to S.
153
154Assigning a receiver, argument, or result V of underlying type T works
155as follows:
156
1571. Remember I and FP.
1581. If T has zero size, add T to the stack sequence S and return.
1591. Try to register-assign V.
1601. If step 3 failed, reset I and FP to the values from step 1, add T
161 to the stack sequence S, and assign V to this field in S.
162
163Register-assignment of a value V of underlying type T works as follows:
164
1651. If T is a boolean or integral type that fits in an integer
166 register, assign V to register I and increment I.
1671. If T is an integral type that fits in two integer registers, assign
168 the least significant and most significant halves of V to registers
169 I and I+1, respectively, and increment I by 2
1701. If T is a floating-point type and can be represented without loss
171 of precision in a floating-point register, assign V to register FP
172 and increment FP.
1731. If T is a complex type, recursively register-assign its real and
174 imaginary parts.
1751. If T is a pointer type, map type, channel type, or function type,
176 assign V to register I and increment I.
1771. If T is a string type, interface type, or slice type, recursively
178 register-assign V’s components (2 for strings and interfaces, 3 for
179 slices).
1801. If T is a struct type, recursively register-assign each field of V.
1811. If T is an array type of length 0, do nothing.
1821. If T is an array type of length 1, recursively register-assign its
183 one element.
1841. If T is an array type of length > 1, fail.
1851. If I > NI or FP > NFP, fail.
1861. If any recursive assignment above fails, fail.
187
188The above algorithm produces an assignment of each receiver, argument,
189and result to registers or to a field in the stack sequence.
190The final stack sequence looks like: stack-assigned receiver,
191stack-assigned arguments, pointer-alignment, stack-assigned results,
192pointer-alignment, spill space for each register-assigned argument,
193pointer-alignment.
194The following diagram shows what this stack frame looks like on the
195stack, using the typical convention where address 0 is at the bottom:
196
197 +------------------------------+
198 | . . . |
199 | 2nd reg argument spill space |
200 | 1st reg argument spill space |
201 | <pointer-sized alignment> |
202 | . . . |
203 | 2nd stack-assigned result |
204 | 1st stack-assigned result |
205 | <pointer-sized alignment> |
206 | . . . |
207 | 2nd stack-assigned argument |
208 | 1st stack-assigned argument |
209 | stack-assigned receiver |
210 +------------------------------+ ↓ lower addresses
211
212To perform a call, the caller reserves space starting at the lowest
213address in its stack frame for the call stack frame, stores arguments
214in the registers and argument stack fields determined by the above
215algorithm, and performs the call.
216At the time of a call, spill space, result stack fields, and result
217registers are left uninitialized.
218Upon return, the callee must have stored results to all result
219registers and result stack fields determined by the above algorithm.
220
221There are no callee-save registers, so a call may overwrite any
222register that doesn’t have a fixed meaning, including argument
223registers.
224
225### Example
226
227Consider the function `func f(a1 uint8, a2 [2]uintptr, a3 uint8) (r1
228struct { x uintptr; y [2]uintptr }, r2 string)` on a 64-bit
229architecture with hypothetical integer registers R0–R9.
230
231On entry, `a1` is assigned to `R0`, `a3` is assigned to `R1` and the
232stack frame is laid out in the following sequence:
233
234 a2 [2]uintptr
235 r1.x uintptr
236 r1.y [2]uintptr
237 a1Spill uint8
238 a3Spill uint8
239 _ [6]uint8 // alignment padding
240
241In the stack frame, only the `a2` field is initialized on entry; the
242rest of the frame is left uninitialized.
243
244On exit, `r2.base` is assigned to `R0`, `r2.len` is assigned to `R1`,
245and `r1.x` and `r1.y` are initialized in the stack frame.
246
247There are several things to note in this example.
248First, `a2` and `r1` are stack-assigned because they contain arrays.
249The other arguments and results are register-assigned.
250Result `r2` is decomposed into its components, which are individually
251register-assigned.
252On the stack, the stack-assigned arguments appear at lower addresses
253than the stack-assigned results, which appear at lower addresses than
254the argument spill area.
255Only arguments, not results, are assigned a spill area on the stack.
256
257### Rationale
258
259Each base value is assigned to its own register to optimize
260construction and access.
261An alternative would be to pack multiple sub-word values into
262registers, or to simply map an argument's in-memory layout to
263registers (this is common in C ABIs), but this typically adds cost to
264pack and unpack these values.
265Modern architectures have more than enough registers to pass all
266arguments and results this way for nearly all functions (see the
267appendix), so there’s little downside to spreading base values across
268registers.
269
270Arguments that can’t be fully assigned to registers are passed
271entirely on the stack in case the callee takes the address of that
272argument.
273If an argument could be split across the stack and registers and the
274callee took its address, it would need to be reconstructed in memory,
275a process that would be proportional to the size of the argument.
276
277Non-trivial arrays are always passed on the stack because indexing
278into an array typically requires a computed offset, which generally
279isn’t possible with registers.
280Arrays in general are rare in function signatures (only 0.7% of
281functions in the Go 1.15 standard library and 0.2% in kubelet).
282We considered allowing array fields to be passed on the stack while
283the rest of an argument’s fields are passed in registers, but this
284creates the same problems as other large structs if the callee takes
285the address of an argument, and would benefit <0.1% of functions in
286kubelet (and even these very little).
287
288We make exceptions for 0 and 1-element arrays because these don’t
289require computed offsets, and 1-element arrays are already decomposed
290in the compiler’s SSA representation.
291
292The ABI assignment algorithm above is equivalent to Go’s stack-based
293ABI0 calling convention if there are zero architecture registers.
294This is intended to ease the transition to the register-based internal
295ABI and make it easy for the compiler to generate either calling
296convention.
297An architecture may still define register meanings that aren’t
298compatible with ABI0, but these differences should be easy to account
299for in the compiler.
300
301The assignment algorithm assigns zero-sized values to the stack
302(assignment step 2) in order to support ABI0-equivalence.
303While these values take no space themselves, they do result in
304alignment padding on the stack in ABI0.
305Without this step, the internal ABI would register-assign zero-sized
306values even on architectures that provide no argument registers
307because they don't consume any registers, and hence not add alignment
308padding to the stack.
309
310The algorithm reserves spill space for arguments in the caller’s frame
311so that the compiler can generate a stack growth path that spills into
312this reserved space.
313If the callee has to grow the stack, it may not be able to reserve
314enough additional stack space in its own frame to spill these, which
315is why it’s important that the caller do so.
316These slots also act as the home location if these arguments need to
317be spilled for any other reason, which simplifies traceback printing.
318
319There are several options for how to lay out the argument spill space.
320We chose to lay out each argument according to its type's usual memory
321layout but to separate the spill space from the regular argument
322space.
323Using the usual memory layout simplifies the compiler because it
324already understands this layout.
325Also, if a function takes the address of a register-assigned argument,
326the compiler must spill that argument to memory in its usual memory
327layout and it's more convenient to use the argument spill space for
328this purpose.
329
330Alternatively, the spill space could be structured around argument
331registers.
332In this approach, the stack growth spill path would spill each
333argument register to a register-sized stack word.
334However, if the function takes the address of a register-assigned
335argument, the compiler would have to reconstruct it in memory layout
336elsewhere on the stack.
337
338The spill space could also be interleaved with the stack-assigned
339arguments so the arguments appear in order whether they are register-
340or stack-assigned.
341This would be close to ABI0, except that register-assigned arguments
342would be uninitialized on the stack and there's no need to reserve
343stack space for register-assigned results.
344We expect separating the spill space to perform better because of
345memory locality.
346Separating the space is also potentially simpler for `reflect` calls
347because this allows `reflect` to summarize the spill space as a single
348number.
349Finally, the long-term intent is to remove reserved spill slots
350entirely – allowing most functions to be called without any stack
351setup and easing the introduction of callee-save registers – and
352separating the spill space makes that transition easier.
353
354## Closures
355
356A func value (e.g., `var x func()`) is a pointer to a closure object.
357A closure object begins with a pointer-sized program counter
358representing the entry point of the function, followed by zero or more
359bytes containing the closed-over environment.
360
361Closure calls follow the same conventions as static function and
362method calls, with one addition. Each architecture specifies a
363*closure context pointer* register and calls to closures store the
364address of the closure object in the closure context pointer register
365prior to the call.
366
367## Software floating-point mode
368
369In "softfloat" mode, the ABI simply treats the hardware as having zero
370floating-point registers.
371As a result, any arguments containing floating-point values will be
372passed on the stack.
373
374*Rationale*: Softfloat mode is about compatibility over performance
375and is not commonly used.
376Hence, we keep the ABI as simple as possible in this case, rather than
377adding additional rules for passing floating-point values in integer
378registers.
379
380## Architecture specifics
381
382This section describes per-architecture register mappings, as well as
383other per-architecture special cases.
384
385### amd64 architecture
386
387The amd64 architecture uses the following sequence of 9 registers for
388integer arguments and results:
389
390 RAX, RBX, RCX, RDI, RSI, R8, R9, R10, R11
391
392It uses X0 – X14 for floating-point arguments and results.
393
394*Rationale*: These sequences are chosen from the available registers
395to be relatively easy to remember.
396
397Registers R12 and R13 are permanent scratch registers.
398R15 is a scratch register except in dynamically linked binaries.
399
400*Rationale*: Some operations such as stack growth and reflection calls
401need dedicated scratch registers in order to manipulate call frames
402without corrupting arguments or results.
403
404Special-purpose registers are as follows:
405
406| Register | Call meaning | Return meaning | Body meaning |
407| --- | --- | --- | --- |
408| RSP | Stack pointer | Same | Same |
409| RBP | Frame pointer | Same | Same |
410| RDX | Closure context pointer | Scratch | Scratch |
411| R12 | Scratch | Scratch | Scratch |
412| R13 | Scratch | Scratch | Scratch |
413| R14 | Current goroutine | Same | Same |
414| R15 | GOT reference temporary if dynlink | Same | Same |
415| X15 | Zero value (*) | Same | Scratch |
416
417(*) Except on Plan 9, where X15 is a scratch register because SSE
418registers cannot be used in note handlers (so the compiler avoids
419using them except when absolutely necessary).
420
421*Rationale*: These register meanings are compatible with Go’s
422stack-based calling convention except for R14 and X15, which will have
423to be restored on transitions from ABI0 code to ABIInternal code.
424In ABI0, these are undefined, so transitions from ABIInternal to ABI0
425can ignore these registers.
426
427*Rationale*: For the current goroutine pointer, we chose a register
428that requires an additional REX byte.
429While this adds one byte to every function prologue, it is hardly ever
430accessed outside the function prologue and we expect making more
431single-byte registers available to be a net win.
432
433*Rationale*: We could allow R14 (the current goroutine pointer) to be
434a scratch register in function bodies because it can always be
435restored from TLS on amd64.
436However, we designate it as a fixed register for simplicity and for
437consistency with other architectures that may not have a copy of the
438current goroutine pointer in TLS.
439
440*Rationale*: We designate X15 as a fixed zero register because
441functions often have to bulk zero their stack frames, and this is more
442efficient with a designated zero register.
443
444*Implementation note*: Registers with fixed meaning at calls but not
445in function bodies must be initialized by "injected" calls such as
446signal-based panics.
447
448#### Stack layout
449
450The stack pointer, RSP, grows down and is always aligned to 8 bytes.
451
452The amd64 architecture does not use a link register.
453
454A function's stack frame is laid out as follows:
455
456 +------------------------------+
457 | return PC |
458 | RBP on entry |
459 | ... locals ... |
460 | ... outgoing arguments ... |
461 +------------------------------+ ↓ lower addresses
462
463The "return PC" is pushed as part of the standard amd64 `CALL`
464operation.
465On entry, a function subtracts from RSP to open its stack frame and
466saves the value of RBP directly below the return PC.
467A leaf function that does not require any stack space may omit the
468saved RBP.
469
470The Go ABI's use of RBP as a frame pointer register is compatible with
471amd64 platform conventions so that Go can inter-operate with platform
472debuggers and profilers.
473
474#### Flags
475
476The direction flag (D) is always cleared (set to the “forward”
477direction) at a call.
478The arithmetic status flags are treated like scratch registers and not
479preserved across calls.
480All other bits in RFLAGS are system flags.
481
482At function calls and returns, the CPU is in x87 mode (not MMX
483technology mode).
484
485*Rationale*: Go on amd64 does not use either the x87 registers or MMX
486registers. Hence, we follow the SysV platform conventions in order to
487simplify transitions to and from the C ABI.
488
489At calls, the MXCSR control bits are always set as follows:
490
491| Flag | Bit | Value | Meaning |
492| --- | --- | --- | --- |
493| FZ | 15 | 0 | Do not flush to zero |
494| RC | 14/13 | 0 (RN) | Round to nearest |
495| PM | 12 | 1 | Precision masked |
496| UM | 11 | 1 | Underflow masked |
497| OM | 10 | 1 | Overflow masked |
498| ZM | 9 | 1 | Divide-by-zero masked |
499| DM | 8 | 1 | Denormal operations masked |
500| IM | 7 | 1 | Invalid operations masked |
501| DAZ | 6 | 0 | Do not zero de-normals |
502
503The MXCSR status bits are callee-save.
504
505*Rationale*: Having a fixed MXCSR control configuration allows Go
506functions to use SSE operations without modifying or saving the MXCSR.
507Functions are allowed to modify it between calls (as long as they
508restore it), but as of this writing Go code never does.
509The above fixed configuration matches the process initialization
510control bits specified by the ELF AMD64 ABI.
511
512The x87 floating-point control word is not used by Go on amd64.
513
514### arm64 architecture
515
516The arm64 architecture uses R0 – R15 for integer arguments and results.
517
518It uses F0 – F15 for floating-point arguments and results.
519
520*Rationale*: 16 integer registers and 16 floating-point registers are
521more than enough for passing arguments and results for practically all
522functions (see Appendix). While there are more registers available,
523using more registers provides little benefit. Additionally, it will add
524overhead on code paths where the number of arguments are not statically
525known (e.g. reflect call), and will consume more stack space when there
526is only limited stack space available to fit in the nosplit limit.
527
528Registers R16 and R17 are permanent scratch registers. They are also
529used as scratch registers by the linker (Go linker and external
530linker) in trampolines.
531
532Register R18 is reserved and never used. It is reserved for the OS
533on some platforms (e.g. macOS).
534
535Registers R19 – R25 are permanent scratch registers. In addition,
536R27 is a permanent scratch register used by the assembler when
537expanding instructions.
538
539Floating-point registers F16 – F31 are also permanent scratch
540registers.
541
542Special-purpose registers are as follows:
543
544| Register | Call meaning | Return meaning | Body meaning |
545| --- | --- | --- | --- |
546| RSP | Stack pointer | Same | Same |
547| R30 | Link register | Same | Scratch (non-leaf functions) |
548| R29 | Frame pointer | Same | Same |
549| R28 | Current goroutine | Same | Same |
550| R27 | Scratch | Scratch | Scratch |
551| R26 | Closure context pointer | Scratch | Scratch |
552| R18 | Reserved (not used) | Same | Same |
553| ZR | Zero value | Same | Same |
554
555*Rationale*: These register meanings are compatible with Go’s
556stack-based calling convention.
557
558*Rationale*: The link register, R30, holds the function return
559address at the function entry. For functions that have frames
560(including most non-leaf functions), R30 is saved to stack in the
561function prologue and restored in the epilogue. Within the function
562body, R30 can be used as a scratch register.
563
564*Implementation note*: Registers with fixed meaning at calls but not
565in function bodies must be initialized by "injected" calls such as
566signal-based panics.
567
568#### Stack layout
569
570The stack pointer, RSP, grows down and is always aligned to 16 bytes.
571
572*Rationale*: The arm64 architecture requires the stack pointer to be
57316-byte aligned.
574
575A function's stack frame, after the frame is created, is laid out as
576follows:
577
578 +------------------------------+
579 | ... locals ... |
580 | ... outgoing arguments ... |
581 | return PC | ← RSP points to
582 | frame pointer on entry |
583 +------------------------------+ ↓ lower addresses
584
585The "return PC" is loaded to the link register, R30, as part of the
586arm64 `CALL` operation.
587
588On entry, a function subtracts from RSP to open its stack frame, and
589saves the values of R30 and R29 at the bottom of the frame.
590Specifically, R30 is saved at 0(RSP) and R29 is saved at -8(RSP),
591after RSP is updated.
592
593A leaf function that does not require any stack space may omit the
594saved R30 and R29.
595
596The Go ABI's use of R29 as a frame pointer register is compatible with
597arm64 architecture requirement so that Go can inter-operate with platform
598debuggers and profilers.
599
600This stack layout is used by both register-based (ABIInternal) and
601stack-based (ABI0) calling conventions.
602
603#### Flags
604
605The arithmetic status flags (NZCV) are treated like scratch registers
606and not preserved across calls.
607All other bits in PSTATE are system flags and are not modified by Go.
608
609The floating-point status register (FPSR) is treated like scratch
610registers and not preserved across calls.
611
612At calls, the floating-point control register (FPCR) bits are always
613set as follows:
614
615| Flag | Bit | Value | Meaning |
616| --- | --- | --- | --- |
617| DN | 25 | 0 | Propagate NaN operands |
618| FZ | 24 | 0 | Do not flush to zero |
619| RC | 23/22 | 0 (RN) | Round to nearest, choose even if tied |
620| IDE | 15 | 0 | Denormal operations trap disabled |
621| IXE | 12 | 0 | Inexact trap disabled |
622| UFE | 11 | 0 | Underflow trap disabled |
623| OFE | 10 | 0 | Overflow trap disabled |
624| DZE | 9 | 0 | Divide-by-zero trap disabled |
625| IOE | 8 | 0 | Invalid operations trap disabled |
626| NEP | 2 | 0 | Scalar operations do not affect higher elements in vector registers |
627| AH | 1 | 0 | No alternate handling of de-normal inputs |
628| FIZ | 0 | 0 | Do not zero de-normals |
629
630*Rationale*: Having a fixed FPCR control configuration allows Go
631functions to use floating-point and vector (SIMD) operations without
632modifying or saving the FPCR.
633Functions are allowed to modify it between calls (as long as they
634restore it), but as of this writing Go code never does.
635
636### loong64 architecture
637
638The loong64 architecture uses R4 – R19 for integer arguments and integer results.
639
640It uses F0 – F15 for floating-point arguments and results.
641
642Registers R20 - R21, R23 – R28, R30 - R31, F16 – F31 are permanent scratch registers.
643
644Register R2 is reserved and never used.
645
646Register R20, R21 is Used by runtime.duffcopy, runtime.duffzero.
647
648Special-purpose registers used within Go generated code and Go assembly code
649are as follows:
650
651| Register | Call meaning | Return meaning | Body meaning |
652| --- | --- | --- | --- |
653| R0 | Zero value | Same | Same |
654| R1 | Link register | Link register | Scratch |
655| R3 | Stack pointer | Same | Same |
656| R20,R21 | Scratch | Scratch | Used by duffcopy, duffzero |
657| R22 | Current goroutine | Same | Same |
658| R29 | Closure context pointer | Same | Same |
659| R30, R31 | used by the assembler | Same | Same |
660
661*Rationale*: These register meanings are compatible with Go’s stack-based
662calling convention.
663
664#### Stack layout
665
666The stack pointer, R3, grows down and is aligned to 8 bytes.
667
668A function's stack frame, after the frame is created, is laid out as
669follows:
670
671 +------------------------------+
672 | ... locals ... |
673 | ... outgoing arguments ... |
674 | return PC | ← R3 points to
675 +------------------------------+ ↓ lower addresses
676
677This stack layout is used by both register-based (ABIInternal) and
678stack-based (ABI0) calling conventions.
679
680The "return PC" is loaded to the link register, R1, as part of the
681loong64 `JAL` operation.
682
683#### Flags
684All bits in CSR are system flags and are not modified by Go.
685
686### ppc64 architecture
687
688The ppc64 architecture uses R3 – R10 and R14 – R17 for integer arguments
689and results.
690
691It uses F1 – F12 for floating-point arguments and results.
692
693Register R31 is a permanent scratch register in Go.
694
695Special-purpose registers used within Go generated code and Go
696assembly code are as follows:
697
698| Register | Call meaning | Return meaning | Body meaning |
699| --- | --- | --- | --- |
700| R0 | Zero value | Same | Same |
701| R1 | Stack pointer | Same | Same |
702| R2 | TOC register | Same | Same |
703| R11 | Closure context pointer | Scratch | Scratch |
704| R12 | Function address on indirect calls | Scratch | Scratch |
705| R13 | TLS pointer | Same | Same |
706| R20,R21 | Scratch | Scratch | Used by duffcopy, duffzero |
707| R30 | Current goroutine | Same | Same |
708| R31 | Scratch | Scratch | Scratch |
709| LR | Link register | Link register | Scratch |
710*Rationale*: These register meanings are compatible with Go’s
711stack-based calling convention.
712
713The link register, LR, holds the function return
714address at the function entry and is set to the correct return
715address before exiting the function. It is also used
716in some cases as the function address when doing an indirect call.
717
718The register R2 contains the address of the TOC (table of contents) which
719contains data or code addresses used when generating position independent
720code. Non-Go code generated when using cgo contains TOC-relative addresses
721which depend on R2 holding a valid TOC. Go code compiled with -shared or
722-dynlink initializes and maintains R2 and uses it in some cases for
723function calls; Go code compiled without these options does not modify R2.
724
725When making a function call R12 contains the function address for use by the
726code to generate R2 at the beginning of the function. R12 can be used for
727other purposes within the body of the function, such as trampoline generation.
728
729R20 and R21 are used in duffcopy and duffzero which could be generated
730before arguments are saved so should not be used for register arguments.
731
732The Count register CTR can be used as the call target for some branch instructions.
733It holds the return address when preemption has occurred.
734
735On PPC64 when a float32 is loaded it becomes a float64 in the register, which is
736different from other platforms and that needs to be recognized by the internal
737implementation of reflection so that float32 arguments are passed correctly.
738
739Registers R18 - R29 and F13 - F31 are considered scratch registers.
740
741#### Stack layout
742
743The stack pointer, R1, grows down and is aligned to 8 bytes in Go, but changed
744to 16 bytes when calling cgo.
745
746A function's stack frame, after the frame is created, is laid out as
747follows:
748
749 +------------------------------+
750 | ... locals ... |
751 | ... outgoing arguments ... |
752 | 24 TOC register R2 save | When compiled with -shared/-dynlink
753 | 16 Unused in Go | Not used in Go
754 | 8 CR save | nonvolatile CR fields
755 | 0 return PC | ← R1 points to
756 +------------------------------+ ↓ lower addresses
757
758The "return PC" is loaded to the link register, LR, as part of the
759ppc64 `BL` operations.
760
761On entry to a non-leaf function, the stack frame size is subtracted from R1 to
762create its stack frame, and saves the value of LR at the bottom of the frame.
763
764A leaf function that does not require any stack space does not modify R1 and
765does not save LR.
766
767*NOTE*: We might need to save the frame pointer on the stack as
768in the PPC64 ELF v2 ABI so Go can inter-operate with platform debuggers
769and profilers.
770
771This stack layout is used by both register-based (ABIInternal) and
772stack-based (ABI0) calling conventions.
773
774#### Flags
775
776The condition register consists of 8 condition code register fields
777CR0-CR7. Go generated code only sets and uses CR0, commonly set by
778compare functions and use to determine the target of a conditional
779branch. The generated code does not set or use CR1-CR7.
780
781The floating point status and control register (FPSCR) is initialized
782to 0 by the kernel at startup of the Go program and not changed by
783the Go generated code.
784
785### riscv64 architecture
786
787The riscv64 architecture uses X10 – X17, X8, X9, X18 – X23 for integer arguments
788and results.
789
790It uses F10 – F17, F8, F9, F18 – F23 for floating-point arguments and results.
791
792Special-purpose registers used within Go generated code and Go
793assembly code are as follows:
794
795| Register | Call meaning | Return meaning | Body meaning |
796| --- | --- | --- | --- |
797| X0 | Zero value | Same | Same |
798| X1 | Link register | Link register | Scratch |
799| X2 | Stack pointer | Same | Same |
800| X3 | Global pointer | Same | Used by dynamic linker |
801| X4 | TLS (thread pointer) | TLS | Scratch |
802| X24,X25 | Scratch | Scratch | Used by duffcopy, duffzero |
803| X26 | Closure context pointer | Scratch | Scratch |
804| X27 | Current goroutine | Same | Same |
805| X31 | Scratch | Scratch | Scratch |
806
807*Rationale*: These register meanings are compatible with Go’s
808stack-based calling convention. Context register X20 will change to X26,
809duffcopy, duffzero register will change to X24, X25 before this register ABI been adopted.
810X10 – X17, X8, X9, X18 – X23, is the same order as A0 – A7, S0 – S7 in platform ABI.
811F10 – F17, F8, F9, F18 – F23, is the same order as FA0 – FA7, FS0 – FS7 in platform ABI.
812X8 – X23, F8 – F15 are used for compressed instruction (RVC) which will benefit code size in the future.
813
814#### Stack layout
815
816The stack pointer, X2, grows down and is aligned to 8 bytes.
817
818A function's stack frame, after the frame is created, is laid out as
819follows:
820
821 +------------------------------+
822 | ... locals ... |
823 | ... outgoing arguments ... |
824 | return PC | ← X2 points to
825 +------------------------------+ ↓ lower addresses
826
827The "return PC" is loaded to the link register, X1, as part of the
828riscv64 `CALL` operation.
829
830#### Flags
831
832The riscv64 has Zicsr extension for control and status register (CSR) and
833treated as scratch register.
834All bits in CSR are system flags and are not modified by Go.
835
836## Future directions
837
838### Spill path improvements
839
840The ABI currently reserves spill space for argument registers so the
841compiler can statically generate an argument spill path before calling
842into `runtime.morestack` to grow the stack.
843This ensures there will be sufficient spill space even when the stack
844is nearly exhausted and keeps stack growth and stack scanning
845essentially unchanged from ABI0.
846
847However, this wastes stack space (the median wastage is 16 bytes per
848call), resulting in larger stacks and increased cache footprint.
849A better approach would be to reserve stack space only when spilling.
850One way to ensure enough space is available to spill would be for
851every function to ensure there is enough space for the function's own
852frame *as well as* the spill space of all functions it calls.
853For most functions, this would change the threshold for the prologue
854stack growth check.
855For `nosplit` functions, this would change the threshold used in the
856linker's static stack size check.
857
858Allocating spill space in the callee rather than the caller may also
859allow for faster reflection calls in the common case where a function
860takes only register arguments, since it would allow reflection to make
861these calls directly without allocating any frame.
862
863The statically-generated spill path also increases code size.
864It is possible to instead have a generic spill path in the runtime, as
865part of `morestack`.
866However, this complicates reserving the spill space, since spilling
867all possible register arguments would, in most cases, take
868significantly more space than spilling only those used by a particular
869function.
870Some options are to spill to a temporary space and copy back only the
871registers used by the function, or to grow the stack if necessary
872before spilling to it (using a temporary space if necessary), or to
873use a heap-allocated space if insufficient stack space is available.
874These options all add enough complexity that we will have to make this
875decision based on the actual code size growth caused by the static
876spill paths.
877
878### Clobber sets
879
880As defined, the ABI does not use callee-save registers.
881This significantly simplifies the garbage collector and the compiler's
882register allocator, but at some performance cost.
883A potentially better balance for Go code would be to use *clobber
884sets*: for each function, the compiler records the set of registers it
885clobbers (including those clobbered by functions it calls) and any
886register not clobbered by function F can remain live across calls to
887F.
888
889This is generally a good fit for Go because Go's package DAG allows
890function metadata like the clobber set to flow up the call graph, even
891across package boundaries.
892Clobber sets would require relatively little change to the garbage
893collector, unlike general callee-save registers.
894One disadvantage of clobber sets over callee-save registers is that
895they don't help with indirect function calls or interface method
896calls, since static information isn't available in these cases.
897
898### Large aggregates
899
900Go encourages passing composite values by value, and this simplifies
901reasoning about mutation and races.
902However, this comes at a performance cost for large composite values.
903It may be possible to instead transparently pass large composite
904values by reference and delay copying until it is actually necessary.
905
906## Appendix: Register usage analysis
907
908In order to understand the impacts of the above design on register
909usage, we
910[analyzed](https://github.com/aclements/go-misc/tree/master/abi) the
911impact of the above ABI on a large code base: cmd/kubelet from
912[Kubernetes](https://github.com/kubernetes/kubernetes) at tag v1.18.8.
913
914The following table shows the impact of different numbers of available
915integer and floating-point registers on argument assignment:
916
917```
918| | | | stack args | spills | stack total |
919| ints | floats | % fit | p50 | p95 | p99 | p50 | p95 | p99 | p50 | p95 | p99 |
920| 0 | 0 | 6.3% | 32 | 152 | 256 | 0 | 0 | 0 | 32 | 152 | 256 |
921| 0 | 8 | 6.4% | 32 | 152 | 256 | 0 | 0 | 0 | 32 | 152 | 256 |
922| 1 | 8 | 21.3% | 24 | 144 | 248 | 8 | 8 | 8 | 32 | 152 | 256 |
923| 2 | 8 | 38.9% | 16 | 128 | 224 | 8 | 16 | 16 | 24 | 136 | 240 |
924| 3 | 8 | 57.0% | 0 | 120 | 224 | 16 | 24 | 24 | 24 | 136 | 240 |
925| 4 | 8 | 73.0% | 0 | 120 | 216 | 16 | 32 | 32 | 24 | 136 | 232 |
926| 5 | 8 | 83.3% | 0 | 112 | 216 | 16 | 40 | 40 | 24 | 136 | 232 |
927| 6 | 8 | 87.5% | 0 | 112 | 208 | 16 | 48 | 48 | 24 | 136 | 232 |
928| 7 | 8 | 89.8% | 0 | 112 | 208 | 16 | 48 | 56 | 24 | 136 | 232 |
929| 8 | 8 | 91.3% | 0 | 112 | 200 | 16 | 56 | 64 | 24 | 136 | 232 |
930| 9 | 8 | 92.1% | 0 | 112 | 192 | 16 | 56 | 72 | 24 | 136 | 232 |
931| 10 | 8 | 92.6% | 0 | 104 | 192 | 16 | 56 | 72 | 24 | 136 | 232 |
932| 11 | 8 | 93.1% | 0 | 104 | 184 | 16 | 56 | 80 | 24 | 128 | 232 |
933| 12 | 8 | 93.4% | 0 | 104 | 176 | 16 | 56 | 88 | 24 | 128 | 232 |
934| 13 | 8 | 94.0% | 0 | 88 | 176 | 16 | 56 | 96 | 24 | 128 | 232 |
935| 14 | 8 | 94.4% | 0 | 80 | 152 | 16 | 64 | 104 | 24 | 128 | 232 |
936| 15 | 8 | 94.6% | 0 | 80 | 152 | 16 | 64 | 112 | 24 | 128 | 232 |
937| 16 | 8 | 94.9% | 0 | 16 | 152 | 16 | 64 | 112 | 24 | 128 | 232 |
938| ∞ | 8 | 99.8% | 0 | 0 | 0 | 24 | 112 | 216 | 24 | 120 | 216 |
939```
940
941The first two columns show the number of available integer and
942floating-point registers.
943The first row shows the results for 0 integer and 0 floating-point
944registers, which is equivalent to ABI0.
945We found that any reasonable number of floating-point registers has
946the same effect, so we fixed it at 8 for all other rows.
947
948The “% fit” column gives the fraction of functions where all arguments
949and results are register-assigned and no arguments are passed on the
950stack.
951The three “stack args” columns give the median, 95th and 99th
952percentile number of bytes of stack arguments.
953The “spills” columns likewise summarize the number of bytes in
954on-stack spill space.
955And “stack total” summarizes the sum of stack arguments and on-stack
956spill slots.
957Note that these are three different distributions; for example,
958there’s no single function that takes 0 stack argument bytes, 16 spill
959bytes, and 24 total stack bytes.
960
961From this, we can see that the fraction of functions that fit entirely
962in registers grows very slowly once it reaches about 90%, though
963curiously there is a small minority of functions that could benefit
964from a huge number of registers.
965Making 9 integer registers available on amd64 puts it in this realm.
966We also see that the stack space required for most functions is fairly
967small.
968While the increasing space required for spills largely balances out
969the decreasing space required for stack arguments as the number of
970available registers increases, there is a general reduction in the
971total stack space required with more available registers.
972This does, however, suggest that eliminating spill slots in the future
973would noticeably reduce stack requirements.
View as plain text