Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

CBNZ (taken)

Test 1: uops

Code:

  cbnz x0, .+4

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
1004940153065306718245091503150910031
1004408210321032103830001000100010001
1004245910011001100030001000100010001
1004250310011001100030001000100010001
1004243810011001100030001000100010001
1004280510011001100030001000100010001
1004277610011001100030001000100010001
1004261810011001100030001000100010001
1004249710011001100030001000100010001
1004273810011001100030001000100010001

Test 2: throughput

Count: 8

Code:

  cbnz x0, .+4
  cbnz x0, .+4
  cbnz x0, .+4
  cbnz x0, .+4
  cbnz x0, .+4
  cbnz x0, .+4
  cbnz x0, .+4
  cbnz x0, .+4
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 1.0616

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80204947068331983319847312403188010680206802251100
80204858358010580105801062403188010680206802061100
80204849068010580105801062403188010680206802061100
80204848968010580105801062403188010680206802061100
80204849028010580105801062403188010680206802061100
80204848998010580105801062403188010680206802061100
80204848998010580105801062403188010680206802061100
80204849028010580105801062404478015080250802061100
80204848938010580105801062403188010680206802061100
80204849238010580105801062403188010680206802061100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 3.9612

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80024358614108673108673118886240296800998010980037110
80024316874800128001280011240033800118002180021110
80025316752800158001580015240050800178002780021110
80024316745800128001280011240033800118002180037110
80025316762800278002780033240083800288003880021110
80024316861800128001280011240033800118002180021110
80025316928800298002980031240033800118002180021110
80024316572800128001280011240053800188002880021110
80025316648800308003080034240033800118002180021110
80024316577800128001280011240033800118002180021110