Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

CMN (immediate, 32-bit)

Test 1: uops

Code:

  cmn w0, #3
  mov x0, 1

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
100450110011001100030001000100010001001
100439610011001100030001000100010001001
100439510011001100030001000100010001001
100439210011001100030001000100010001001
100439210011001100030001000100010001001
100439110011001100030001000100010001001
100439510011001100030001000100010001001
100439210011001100030001000100010001001
100439210011001100030001000100010001001
100440010011001100030001000100010001001

Test 2: Latency 2->1

Chain cycles: 1

Code:

  cmn w0, #3
  cset x0, cc
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201085193392010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185194542001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020202092008510010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020202052008510010

Test 3: throughput

Count: 8

Code:

  cmn w0, #3
  cmn w0, #3
  cmn w0, #3
  cmn w0, #3
  cmn w0, #3
  cmn w0, #3
  cmn w0, #3
  cmn w0, #3
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.3636

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
802042918580114801148011824035780119802208021880014100
802042917780113801138011824035780119802208022080013100
802042908280115801158011924035480118802208022080013100
802042912080112801128011724035480118802208022080015100
802042908480112801128011724035180117802208022080013100
802042907280112801128011724035780119802208022080013100
802042905180115801158011924035480118802208022080013100
802042916680113801138011824035480118802208022080012100
802042906680113801138011824035180117802208022080015100
802042904680112801128011724035780119802208022080013100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.3631

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80024300978003580035800392401228003980040800428002410
80024292388002180021800202400708002080020800208001110
80024297078003480034800382401358003980040800428002710
80024290948003580035800402400738002080020800208001110
80025290758007480074800792401078002080020800808006510
80024290868003580035800392400608002080020800208001110
80024290388002180021800202400768002080020800208001110
80024289928002180021800202400608002080020800208001110
80024289218002180021800202400848002080020800848006510
80024290128002180021800202400708002080020800208001110