1 // Copyright 2019 The Go Authors. All rights reserved. 2 // Use of this source code is governed by a BSD-style 3 // license that can be found in the LICENSE file. 4 5 /* 6 Package ppc64 implements a PPC64 assembler that assembles Go asm into 7 the corresponding PPC64 instructions as defined by the Power ISA 3.0B. 8 9 This document provides information on how to write code in Go assembler 10 for PPC64, focusing on the differences between Go and PPC64 assembly language. 11 It assumes some knowledge of PPC64 assembler. The original implementation of 12 PPC64 in Go defined many opcodes that are different from PPC64 opcodes, but 13 updates to the Go assembly language used mnemonics that are mostly similar if not 14 identical to the PPC64 mneumonics, such as VMX and VSX instructions. Not all detail 15 is included here; refer to the Power ISA document if interested in more detail. 16 17 Starting with Go 1.15 the Go objdump supports the -gnu option, which provides a 18 side by side view of the Go assembler and the PPC64 assembler output. This is 19 extremely helpful in determining what final PPC64 assembly is generated from the 20 corresponding Go assembly. 21 22 In the examples below, the Go assembly is on the left, PPC64 assembly on the right. 23 24 1. Operand ordering 25 26 In Go asm, the last operand (right) is the target operand, but with PPC64 asm, 27 the first operand (left) is the target. The order of the remaining operands is 28 not consistent: in general opcodes with 3 operands that perform math or logical 29 operations have their operands in reverse order. Opcodes for vector instructions 30 and those with more than 3 operands usually have operands in the same order except 31 for the target operand, which is first in PPC64 asm and last in Go asm. 32 33 Example: 34 35 ADD R3, R4, R5 <=> add r5, r4, r3 36 37 2. Constant operands 38 39 In Go asm, an operand that starts with '$' indicates a constant value. If the 40 instruction using the constant has an immediate version of the opcode, then an 41 immediate value is used with the opcode if possible. 42 43 Example: 44 45 ADD $1, R3, R4 <=> addi r4, r3, 1 46 47 3. Opcodes setting condition codes 48 49 In PPC64 asm, some instructions other than compares have variations that can set 50 the condition code where meaningful. This is indicated by adding '.' to the end 51 of the PPC64 instruction. In Go asm, these instructions have 'CC' at the end of 52 the opcode. The possible settings of the condition code depend on the instruction. 53 CR0 is the default for fixed-point instructions; CR1 for floating point; CR6 for 54 vector instructions. 55 56 Example: 57 58 ANDCC R3, R4, R5 <=> and. r5, r3, r4 (set CR0) 59 60 4. Loads and stores from memory 61 62 In Go asm, opcodes starting with 'MOV' indicate a load or store. When the target 63 is a memory reference, then it is a store; when the target is a register and the 64 source is a memory reference, then it is a load. 65 66 MOV{B,H,W,D} variations identify the size as byte, halfword, word, doubleword. 67 68 Adding 'Z' to the opcode for a load indicates zero extend; if omitted it is sign extend. 69 Adding 'U' to a load or store indicates an update of the base register with the offset. 70 Adding 'BR' to an opcode indicates byte-reversed load or store, or the order opposite 71 of the expected endian order. If 'BR' is used then zero extend is assumed. 72 73 Memory references n(Ra) indicate the address in Ra + n. When used with an update form 74 of an opcode, the value in Ra is incremented by n. 75 76 Memory references (Ra+Rb) or (Ra)(Rb) indicate the address Ra + Rb, used by indexed 77 loads or stores. Both forms are accepted. When used with an update then the base register 78 is updated by the value in the index register. 79 80 Examples: 81 82 MOVD (R3), R4 <=> ld r4,0(r3) 83 MOVW (R3), R4 <=> lwa r4,0(r3) 84 MOVWZU 4(R3), R4 <=> lwzu r4,4(r3) 85 MOVWZ (R3+R5), R4 <=> lwzx r4,r3,r5 86 MOVHZ (R3), R4 <=> lhz r4,0(r3) 87 MOVHU 2(R3), R4 <=> lhau r4,2(r3) 88 MOVBZ (R3), R4 <=> lbz r4,0(r3) 89 90 MOVD R4,(R3) <=> std r4,0(r3) 91 MOVW R4,(R3) <=> stw r4,0(r3) 92 MOVW R4,(R3+R5) <=> stwx r4,r3,r5 93 MOVWU R4,4(R3) <=> stwu r4,4(r3) 94 MOVH R4,2(R3) <=> sth r4,2(r3) 95 MOVBU R4,(R3)(R5) <=> stbux r4,r3,r5 96 97 4. Compares 98 99 When an instruction does a compare or other operation that might 100 result in a condition code, then the resulting condition is set 101 in a field of the condition register. The condition register consists 102 of 8 4-bit fields named CR0 - CR7. When a compare instruction 103 identifies a CR then the resulting condition is set in that field 104 to be read by a later branch or isel instruction. Within these fields, 105 bits are set to indicate less than, greater than, or equal conditions. 106 107 Once an instruction sets a condition, then a subsequent branch, isel or 108 other instruction can read the condition field and operate based on the 109 bit settings. 110 111 Examples: 112 113 CMP R3, R4 <=> cmp r3, r4 (CR0 assumed) 114 CMP R3, R4, CR1 <=> cmp cr1, r3, r4 115 116 Note that the condition register is the target operand of compare opcodes, so 117 the remaining operands are in the same order for Go asm and PPC64 asm. 118 When CR0 is used then it is implicit and does not need to be specified. 119 120 5. Branches 121 122 Many branches are represented as a form of the BC instruction. There are 123 other extended opcodes to make it easier to see what type of branch is being 124 used. 125 126 The following is a brief description of the BC instruction and its commonly 127 used operands. 128 129 BC op1, op2, op3 130 131 op1: type of branch 132 16 -> bctr (branch on ctr) 133 12 -> bcr (branch if cr bit is set) 134 8 -> bcr+bctr (branch on ctr and cr values) 135 4 -> bcr != 0 (branch if specified cr bit is not set) 136 137 There are more combinations but these are the most common. 138 139 op2: condition register field and condition bit 140 141 This contains an immediate value indicating which condition field 142 to read and what bits to test. Each field is 4 bits long with CR0 143 at bit 0, CR1 at bit 4, etc. The value is computed as 4*CR+condition 144 with these condition values: 145 146 0 -> LT 147 1 -> GT 148 2 -> EQ 149 3 -> OVG 150 151 Thus 0 means test CR0 for LT, 5 means CR1 for GT, 30 means CR7 for EQ. 152 153 op3: branch target 154 155 Examples: 156 157 BC 12, 0, target <=> blt cr0, target 158 BC 12, 2, target <=> beq cr0, target 159 BC 12, 5, target <=> bgt cr1, target 160 BC 12, 30, target <=> beq cr7, target 161 BC 4, 6, target <=> bne cr1, target 162 BC 4, 1, target <=> ble cr1, target 163 164 The following extended opcodes are available for ease of use and readability: 165 166 BNE CR2, target <=> bne cr2, target 167 BEQ CR4, target <=> beq cr4, target 168 BLT target <=> blt target (cr0 default) 169 BGE CR7, target <=> bge cr7, target 170 171 Refer to the ISA for more information on additional values for the BC instruction, 172 how to handle OVG information, and much more. 173 174 5. Align directive 175 176 Starting with Go 1.12, Go asm supports the PCALIGN directive, which indicates 177 that the next instruction should be aligned to the specified value. Currently 178 8 and 16 are the only supported values, and a maximum of 2 NOPs will be added 179 to align the code. That means in the case where the code is aligned to 4 but 180 PCALIGN $16 is at that location, the code will only be aligned to 8 to avoid 181 adding 3 NOPs. 182 183 The purpose of this directive is to improve performance for cases like loops 184 where better alignment (8 or 16 instead of 4) might be helpful. This directive 185 exists in PPC64 assembler and is frequently used by PPC64 assembler writers. 186 187 PCALIGN $16 188 PCALIGN $8 189 190 By default, functions in Go are aligned to 16 bytes, as is the case in all 191 other compilers for PPC64. If there is a PCALIGN directive requesting alignment 192 greater than 16, then the alignment of the containing function must be 193 promoted to that same alignment or greater. 194 195 The behavior of PCALIGN is changed in Go 1.21 to be more straightforward to 196 ensure the alignment required for some instructions in power10. The acceptable 197 values are 8, 16, 32 and 64, and the use of those values will always provide the 198 specified alignment. 199 200 6. Shift instructions 201 202 The simple scalar shifts on PPC64 expect a shift count that fits in 5 bits for 203 32-bit values or 6 bit for 64-bit values. If the shift count is a constant value 204 greater than the max then the assembler sets it to the max for that size (31 for 205 32 bit values, 63 for 64 bit values). If the shift count is in a register, then 206 only the low 5 or 6 bits of the register will be used as the shift count. The 207 Go compiler will add appropriate code to compare the shift value to achieve the 208 correct result, and the assembler does not add extra checking. 209 210 Examples: 211 212 SRAD $8,R3,R4 => sradi r4,r3,8 213 SRD $8,R3,R4 => rldicl r4,r3,56,8 214 SLD $8,R3,R4 => rldicr r4,r3,8,55 215 SRAW $16,R4,R5 => srawi r5,r4,16 216 SRW $40,R4,R5 => rlwinm r5,r4,0,0,31 217 SLW $12,R4,R5 => rlwinm r5,r4,12,0,19 218 219 Some non-simple shifts have operands in the Go assembly which don't map directly 220 onto operands in the PPC64 assembly. When an operand in a shift instruction in the 221 Go assembly is a bit mask, that mask is represented as a start and end bit in the 222 PPC64 assembly instead of a mask. See the ISA for more detail on these types of shifts. 223 Here are a few examples: 224 225 RLWMI $7,R3,$65535,R6 => rlwimi r6,r3,7,16,31 226 RLDMI $0,R4,$7,R6 => rldimi r6,r4,0,61 227 228 More recently, Go opcodes were added which map directly onto the PPC64 opcodes. It is 229 recommended to use the newer opcodes to avoid confusion. 230 231 RLDICL $0,R4,$15,R6 => rldicl r6,r4,0,15 232 RLDICR $0,R4,$15,R6 => rldicr r6.r4,0,15 233 234 # Register naming 235 236 1. Special register usage in Go asm 237 238 The following registers should not be modified by user Go assembler code. 239 240 R0: Go code expects this register to contain the value 0. 241 R1: Stack pointer 242 R2: TOC pointer when compiled with -shared or -dynlink (a.k.a position independent code) 243 R13: TLS pointer 244 R30: g (goroutine) 245 246 Register names: 247 248 Rn is used for general purpose registers. (0-31) 249 Fn is used for floating point registers. (0-31) 250 Vn is used for vector registers. Slot 0 of Vn overlaps with Fn. (0-31) 251 VSn is used for vector-scalar registers. V0-V31 overlap with VS32-VS63. (0-63) 252 CTR represents the count register. 253 LR represents the link register. 254 CR represents the condition register 255 CRn represents a condition register field. (0-7) 256 CRnLT represents CR bit 0 of CR field n. (0-7) 257 CRnGT represents CR bit 1 of CR field n. (0-7) 258 CRnEQ represents CR bit 2 of CR field n. (0-7) 259 CRnSO represents CR bit 3 of CR field n. (0-7) 260 261 # GOPPC64 >= power10 and its effects on Go asm 262 263 When GOPPC64=power10 is used to compile a Go program for ppc64le/linux, MOV*, FMOV*, and ADD 264 opcodes which would require 2 or more machine instructions to emulate a 32 bit constant, or 265 symbolic reference are implemented using prefixed instructions. 266 267 A user who wishes granular control over the generated machine code is advised to use Go asm 268 opcodes which explicitly translate to one PPC64 machine instruction. Most common opcodes 269 are supported. 270 271 Some examples of how pseudo-op assembly changes with GOPPC64: 272 273 Go asm GOPPC64 <= power9 GOPPC64 >= power10 274 MOVD mypackage·foo(SB), R3 addis r2, r3, ... pld r3, ... 275 ld r3, r3, ... 276 277 MOVD 131072(R3), R4 addis r31, r4, 2 pld r4, 131072(r3) 278 ld r4, 0(R3) 279 280 ADD $131073, R3 lis r31, 2 paddi r3, r3, 131073 281 addi r31, 1 282 add r3,r31,r3 283 284 MOVD $131073, R3 lis r3, 2 pli r3, 131073 285 addi r3, 1 286 287 MOVD $mypackage·foo(SB), R3 addis r2, r3, ... pla r3, ... 288 addi r3, r3, ... 289 */ 290 package ppc64 291