Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

STP (pre-index, Q)

Test 1: uops

Code:

  stp q0, q1, [x6, #0x10]!
  nop ; nop ; nop ; nop ; nop ; nop ; nop

(no loop instructions)

1000 unrolls and 1 iteration

Retires (minus 7 nops): 2.000

Issues: 3.000

Integer unit issues: 1.001

Load/store unit issues: 2.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
90054270303310152018101420003000996130002000400010012000
900424153001100120001000200030001093630002000400010012000
900423163001100120001000200030001136830002000400010012000
900423073001100120001000200030001095830002000400010012000
900423073001100120001000200030001075030002000400010012000
900423163001100120001000200030001081030002000400010012000
900423213001100120001000200030001102930002000400010012000
900423153001100120001000200030001084730002000400010012000
900423113001100120001000200030001131230002000400010012000
900423133001100120001000200030001120730002000400010012000

Test 2: Latency 3->3

Code:

  stp q0, q1, [x6, #0x10]!

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 2.0136

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? simd retires (ee)? int retires (ef)
202232262530499103102018910309200033528334234330106200200102004002010004200000100
202042016930103101032000010102200013527534202830103200200082004001610003200000100
202042016330103101032000010102200013527534199230103200200082004001610003200000100
202042015830103101032000010102200013527534202830103200200082004001610003200000100
202042015630103101032000010102200013527534199230103200200082004001610003200000100
202042015830103101032000010102200013527534202830103200200082004001610003200000100
202042015630103101032000010102200013527534345030103200200082004001610003200000100
20204203253010310103200001010220001352753431803010320020008921635910116541839545460
202042047630102101022000010103200013527534159630103200200082004002010004200000100
202042013430103101032000010102200013527534161430103200200082004001610003200000100

1000 unrolls and 10 iterations

Result (median cycles for code): 2.0141

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20043230063040710218201891021720003350583420733001620200102040000100012000010
20024201253001110011200001001020000350413414293001020200002040000100012000010
20024201243001110011200001001020000350413414293001020200002040000100012000010
20024201243001110011200001001020000350413414293001020200002040000100012000010
20024201243001110011200001001020000350413414293001020200002040000100012000010
20024201243001110011200001001020000350413414293001020200002040000100012000010
20024201243001110011200001001020000350413414293001020200002040000100012000010
20024201243001110011200001001020000350413414293001020200002040000100012000010
20024201243001110011200001001020000350413414293001020200002040000100012000010
20024201243001110011200001001020000350413414293001020200002040000100012000010

Test 3: throughput

Count: 8

Code:

  stp q0, q1, [x6, #0x10]!
  stp q0, q1, [x7, #0x10]!
  stp q0, q1, [x8, #0x10]!
  stp q0, q1, [x9, #0x10]!
  stp q0, q1, [x10, #0x10]!
  stp q0, q1, [x11, #0x10]!
  stp q0, q1, [x12, #0x10]!
  stp q0, q1, [x13, #0x10]!
  mov x7, x6
  mov x8, x6
  mov x9, x6
  mov x10, x6
  mov x11, x6
  mov x12, x6
  mov x13, x6

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 2.0007

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)ldst uops in schedulers (5b)dispatch uop (78)map int uop (7c)map ldst uop (7d)map simd uop (7e)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
160224161969240527803211602068032216000124030627200820240103200160008020032001680003160000100
160204160051240103801031600008010216003424036027205160240154200160048020032001680003160000100
160204160051240103801031600008010216003424036027208040240154200160048020032001680003160000100
160205160127240136801191600178012016000124030627201180240103200160008020032001680003160000100
160204160053240103801031600008010216000124030627201540240103200160008020032001680003160000100
160204160053240103801031600008010216000124030627201540240103200160008020032001680003160000100
160204160053240103801031600008010216000124030627201540240103200160008020032001680003160000100
160205160100240136801191600178012016000124030627201540240103200160008020032001680003160000100
160204160053240103801031600008010216000124030627201540240103200160008020032001680003160000100
160204160053240103801031600008010216000124030627201540240103200160008020032001680003160000100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 2.0006

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? simd retires (ee)? int retires (ef)
16002416012624001380013160000800121600002400302720078240010201600002032000080001160000010
16002416005124001180011160000800101600002400302720115240010201600002032000080001160000010
16002416005124001180011160000800101600002400302720115240010201600002032000080001160000010
16002416005124001180011160000800101600002400302720115240010201600002032000080001160000010
16002416005124001180011160000800101600002400302720115240010201600002032000080001160000010
16002416005124001180011160000800101600002400302720115240010201600002032000080001160000010
16002416005124001180011160000800101600002400302720115240010201600002032000080001160000010
16002516009824004680029160017800301600002400302719955240010201600002032000080001160000010
16002416005124001180011160000800101600002400302720115240010201600002032000080001160000010
16002416005124001180011160000800101600012400362720477240013201600082032001680003160000010