Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

B.cc (taken)

Test 1: uops

Code:

  b.ne .+4

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
1004977156755675777941251375138410041
1004319610011001100030001000100010001
1004277010011001100030001000100010001
1004283610011001100030001000100010001
1004276110011001100030001000100010001
1004269810011001100030001000100010001
1004268310011001100030001000100010001
1004274310011001100030001000100010001
1004297410011001100030001000100010001
1004277610011001100030001000100010001

Test 2: throughput

Count: 8

Code:

  b.ne .+4
  b.ne .+4
  b.ne .+4
  b.ne .+4
  b.ne .+4
  b.ne .+4
  b.ne .+4
  b.ne .+4

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 1.1544

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)map ldst uop inputs (80)map simd uop inputs (81)? int output thing (e9)? ldst retires (ed)? simd retires (ee)? int retires (ef)
8020410212783182831828448924031880106802068020300100100
802049271080102801028010324030980103802038020600100100
802049235180102801028010324030980103802038020300100100
802049234780102801028010324030980103802038020300100100
802049234480102801028010324030980103802038020300100100
802059269180139801398014824031880106802068020300100100
802049234780102801028010324030980103802038020300100100
802049235680102801028010324030980103802038020300100100
802049235380102801028010324030980103802038020300100100
802049235380102801028010324030980103802038020300100100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 3.9602

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80025366887115785115785128783240218800738008380023110
80024318313800128001280011240033800118002180033110
80024317735800128001280011240033800118002180021110
80024317755800128001280011240033800118002180029110
80024316056800128001280011240033800118002180027110
80024316680800128001280011240033800118002180029110
80024316520800128001280011240033800118002180021110
80024316588800128001280011240033800118002180031110
80024316606800128001280011240033800118002180021110
80024316610800128001280011240033800118002180021110