Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

STR (register, lsl, Q)

Test 1: uops

Code:

  str q0, [x6, x7, lsl #4]
  mov x0, 0
  mov x7, 8

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 2.000

Issues: 2.000

Integer unit issues: 1.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
200612162051103310181032100030001706720001000100010003000100110001000
200410482001100110001000100030001706720001000100010003000100110001000
200410482001100110001000100030001706720001000100010003000100110001000
200410482001100110001000100030001706720001000100010003000100110001000
200410482001100110001000100030001706720001000100010003000100110001000
200410482001100110001000100030001706720001000100010003000100110001000
200410482001100110001000100030001706720001000100010003000100110001000
200410482001100110001000100030001706720001000100010003000100110001000
200410482001100110001000100030001706720001000100010003000100110001000
200410482001100110001000100030001706720001000100010003000100110001000

Test 2: throughput

Count: 8

Code:

  str q0, [x6, x7, lsl #4]
  str q0, [x6, x7, lsl #4]
  str q0, [x6, x7, lsl #4]
  str q0, [x6, x7, lsl #4]
  str q0, [x6, x7, lsl #4]
  str q0, [x6, x7, lsl #4]
  str q0, [x6, x7, lsl #4]
  str q0, [x6, x7, lsl #4]
  mov x7, 8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 1.0006

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)ldst uops in schedulers (5b)dispatch uop (78)map int uop (7c)map ldst uop (7d)map simd uop (7e)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1602068021616015580137800188013880001240318135999801601078020680006080206240018800058000080100
1602048004716010580105800008010680001240318135999801601078020680006080206240018800058000080100
1602048004616010580105800008010680001240318136003401601078020680006080206240018800058000080100
1602048004616010580105800008010680001240318136003401601078020680006080206240018800058000080100
1602048004616010580105800008010680001240318136003401601078020680006080206240018800058000080100
1602048004616010580105800008010680001240318136003401601078020680006080206240018800058000080100
1602048004616010580105800008010680001240318136003401601078020680006080206240018800058000080100
1602048004616010580105800008010680001240318136003401601078020680006080206240018800058000080100
1602048004616010580105800008010680001240318136003401601078020680006080206240018800058000080100
1602048004616010580105800008010680001240318136003401601078020680006080206240018800058000080100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 1.0005

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
16002680245160065800478001880048800012400481359886160017800268000680026240018800058000080010
16002480040160015800158000080016800012400481360392160017800268000680020240000800018000080010
16002480040160011800118000080010800002400301359919160010800208000080020240000800018000080010
16002480040160011800118000080010800302401201361078160070800508003080020240000800018000080010
16002480040160011800118000080010800002400301359919160010800208000080020240000800018000080010
16002480040160011800118000080010800002400301359919160010800208000080020240000800018000080010
16002480040160011800118000080010800002400301359919160010800208000080020240000800018000080010
16002480040160011800118000080010800002400301359919160010800208000080020240000800018000080010
16002480040160011800118000080010800002400301359919160010800208000080020240000800018000080010
16002480040160011800118000080010800002400301359919160010800208000080020240000800018000080010