Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

CMN (shifted immediate, 32-bit)

Test 1: uops

Code:

  cmn w0, #3, lsl #12
  mov x0, 1

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
100455610011001100030001000100010001001
100439610011001100030001000100010001001
100439110011001100030001000100010001001
100438910011001100030001000100010001001
100439110011001100030001000100010001001
100439310011001100030001000100010001001
100439010011001100030001000100010001001
100439110011001100030001000100010001001
100439110011001100030001000100010001001
100439110011001100030001000100010001001

Test 2: Latency 2->1

Chain cycles: 1

Code:

  cmn w0, #3, lsl #12
  cset x0, cc
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201075194422010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185196382001820036200322000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20025200602002520025200505195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010

Test 3: throughput

Count: 8

Code:

  cmn w0, #3, lsl #12
  cmn w0, #3, lsl #12
  cmn w0, #3, lsl #12
  cmn w0, #3, lsl #12
  cmn w0, #3, lsl #12
  cmn w0, #3, lsl #12
  cmn w0, #3, lsl #12
  cmn w0, #3, lsl #12
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.3634

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
802042922680113801138011824035480118802208022080015100
802042907080114801148011824035480118802208022080015100
802042907180113801138011824035780119802208022080013100
802042905180115801158011924035480118802188022080015100
802042907280112801128011724035780119802208022080013100
802042908280115801158011924035180117802208022080013100
802042911880115801158011924035480118802208022080013100
802042907580112801128011724035480118802208022080015100
802042905880115801158012024035780119802208022080013100
802042905180115801158011924036080120802208022080013100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.3628

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80024299778003580035800392400908002080020800208001110
80024290998002180021800202400708002080020800208001110
80024289798002180021800202400778002080020800208001110
80024289558002180021800202400848002080020800208001110
80024290848002180021800202400808002080020800208001110
80024290288002180021800202401008002080020800208001110
80024289798002180021800202400778002080020800208001110
80024288768002180021800202400798002080020800208001110
80024298608003580035800392401848005880058800448002710
80025291018007580075800792400668002080020800208001110