Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

TBNZ (taken)

Test 1: uops

Code:

  tbnz x0, #1, .+4

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
10044942332633264342116133871445211481
100463110091009101230001000100010001
100461210011001100030001000100010001
100461210011001100030001000100010001
100461210011001100030001000100010001
100461210011001100030001000100010001
100461210011001100030001000100010001
100461210011001100030001000100010001
100461210011001100030001000100010001
100461210011001100030001000100010001

Test 2: throughput

Count: 8

Code:

  tbnz x0, #1, .+4
  tbnz x0, #1, .+4
  tbnz x0, #1, .+4
  tbnz x0, #1, .+4
  tbnz x0, #1, .+4
  tbnz x0, #1, .+4
  tbnz x0, #1, .+4
  tbnz x0, #1, .+4
  mov x0, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 1.0615

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule simd uop (54)schedule ldst uop (55)dispatch int uop (56)dispatch simd uop (57)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80204945068358683586008507802403188010680206802061100
80204847288010580105008010602403188010680206802061100
80204850448010580105008010602403188010680206802061100
80204849348010580105008010602403188010680206802681100
80204850118010580105008010602403188010680206802061100
80204849258010580105008010602540438468185361802061100
80204851718010580105008010602403188010680206802061100
80204848958010580105008010602403188010680206802061100
80204849068010580105008010602403188010680206802061100
80204849188010580105008010602403188010680206802061100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 3.9560

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80024359458109545109545120420240141800478005780021110
80024317809800128001280011240033800118002180039110
80024316601800128001280011240033800118002180021110
80024317101800128001280011240124800428005280021110
80024317105800128001280011240033800118002180021110
80024316688800128001280011240080800278003780021110
80024316689800128001280011240033800118002180021110
80024316603800128001280011240105800368004680024110
80025316617800248002480026240033800118002180021110
80024316414800128001280011240050800178002780021110