Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

CBNZ (not taken)

Test 1: uops

Code:

  cbnz x0, .+4

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
1004940153065306718245091503150910031
1004408210321032103830001000100010001
1004245910011001100030001000100010001
1004250310011001100030001000100010001
1004243810011001100030001000100010001
1004280510011001100030001000100010001
1004277610011001100030001000100010001
1004261810011001100030001000100010001
1004249710011001100030001000100010001
1004273810011001100030001000100010001

Test 2: throughput

Count: 8

Code:

  cbnz x0, .+4
  cbnz x0, .+4
  cbnz x0, .+4
  cbnz x0, .+4
  cbnz x0, .+4
  cbnz x0, .+4
  cbnz x0, .+4
  cbnz x0, .+4
  mov x0, 0

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5836

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80204528838351383513850112418488061680882803781100
80204467648015480154801772404568015280264802121100
80204466868010780107801102403308011080212806261100
80204466868010780107801102403308011080212802121100
80204466868010780107801102403308011080212802121100
80204466868010780107801102403308011080212802121100
80204466868010780107801102403308011080212802121100
80204466868010780107801102403308011080212802121100
80204466868010780107801102403308011080212802121100
80204466868010780107801102403308011080212802121100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5837

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)map ldst uop inputs (80)map simd uop inputs (81)? int output thing (e9)? ldst retires (ed)? simd retires (ee)? int retires (ef)
800241186971171211171211329922491928306483869818880010010
800241244261199631199631371552465468218282793814940010010
80024476798061180611808792401868006280092800200010010
80024466958001180011800102400308001080020800200010010
80024466958001180011800102400308001080020800200010010
80024466958001180011800102400308001080020800200010010
80024466958001180011800102400308001080020800200010010
80024466958001180011800102400308001080020800200010010
80024466958001180011800102400308001080020800200010010
80024466958001180011800102400308001080020800200010010