Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

TBNZ (not taken)

Test 1: uops

Code:

  tbnz x0, #1, .+4

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
10044942332633264342116133871445211481
100463110091009101230001000100010001
100461210011001100030001000100010001
100461210011001100030001000100010001
100461210011001100030001000100010001
100461210011001100030001000100010001
100461210011001100030001000100010001
100461210011001100030001000100010001
100461210011001100030001000100010001
100461210011001100030001000100010001

Test 2: throughput

Count: 8

Code:

  tbnz x0, #1, .+4
  tbnz x0, #1, .+4
  tbnz x0, #1, .+4
  tbnz x0, #1, .+4
  tbnz x0, #1, .+4
  tbnz x0, #1, .+4
  tbnz x0, #1, .+4
  tbnz x0, #1, .+4
  mov x0, 0

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5836

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80204546268437684376862722412878042980610803251100
80205469788028980289803642404178013980252802121100
80204466868010780107801102403308011080212802121100
80204466868010780107801102403308011080212802121100
80204466868010780107801102403308011080212802121100
80204466868010780107801102403308011080212802121100
80204466868010780107801102403308011080212802121100
80204466868010780107801102403308011080212802121100
80204466868010780107801102403308011080212802121100
80204466868010780107801102403308011080212802121100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5837

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80024122622119657119657136403249075830258377282433110
8002447937807618076181094241935806458083580158110
8002446806800798007980106240237800798010180087110
8002446752800458004580054240162800548007080070110
8002446751800458004580054240162800548007280124110
8002446751800458004580054240138800468006080060110
8002446742800378003780046240309801038011980060110
8002446742800378003780046240279800938010580060110
8002446742800378003780046240138800468006080060110
8002446742800378003780046240138800468006080060110