Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

TBZ (taken)

Test 1: uops

Code:

  tbz x0, #1, .+4

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
1004994258925892810065042168222710091
1004339110011001100030001000100010001
1004261410011001100030001000100010001
1004255110011001100030001000100010001
1004275210011001100030001000100089811
1004408112541254132330091003100310001
1004281510011001100030001000100010001
1004262910011001100030001000100010001
1004268310011001100030001000100010001
1004273710011001100030001000100093781

Test 2: throughput

Count: 8

Code:

  tbz x0, #1, .+4
  tbz x0, #1, .+4
  tbz x0, #1, .+4
  tbz x0, #1, .+4
  tbz x0, #1, .+4
  tbz x0, #1, .+4
  tbz x0, #1, .+4
  tbz x0, #1, .+4
  mov x0, 0

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 1.0614

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80204952238348483484849102403188010680206802091100
80204852598010580105801062403188010680206802231100
80204850068010580105801062403188010680206802061100
80204848968010580105801062403188010680206802061100
80204849088010580105801062403188010680206802401100
80204843708010580105801062403188010680206802061100
80204849308010580105801062403188010680206802061100
80204849168010580105801062403188010680206802061100
80204849288010580105801062403188010680206802061100
80204849348010580105801062403188010680206802061100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 3.9552

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80024369959118049118049132112240336801128012280030110
80024317327800228002280022240068800238003380021110
80024316878800128001280011240033800118002180021110
80025316593800258002580032240033800118002180021110
80024316875800128001280011240033800118002180021110
80025316893800158001580015240033800118002180021110
80024316195800128001280011240033800118002180021110
80025316231800208002080020240033800118002180021110
80024316788800128001280011240033800118002180021110
80025316826800158001580015240033800118002180039110