Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

CMN (immediate, 64-bit)

Test 1: uops

Code:

  cmn x0, #3
  mov x0, 1

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
100453410011001100030001000100010001001
100439810011001100030001000100010001001
100439510011001100030001000100010001001
100439510011001100030001000100010001001
100439110011001100030001000100010001001
100439010011001100030001000100010001001
100439210011001100030001000100010001001
100438910011001100030001000100010001001
100439210011001100030001000100010001001
100438910011001100030001000100010001001

Test 2: Latency 2->1

Chain cycles: 1

Code:

  cmn x0, #3
  cset x0, cc
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201085193122010720212202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185194952001820036200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200362000110010
20024200302001120011200185194762001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010

Test 3: throughput

Count: 8

Code:

  cmn x0, #3
  cmn x0, #3
  cmn x0, #3
  cmn x0, #3
  cmn x0, #3
  cmn x0, #3
  cmn x0, #3
  cmn x0, #3
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.3635

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
802042928780113801138011824035480118802208021680012100
802042915780114801148011824035480118802208022080013100
802042916680113801138011824036080120802208022080013100
802042905280113801138011824035480118802208022080012100
802042916680113801138011824035480118802208022080013100
802042907780113801138011824035480118802208022080013100
802042916880115801158012024035480118802208022080013100
802042906880113801138011824035480118802208022080013100
802042916680113801138011824035480118802208022080013100
802042907280112801128011724047480158802608022080013100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.3631

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80024300778003580035800392401438003980042800208001110
80024291688002180021800202400998002080020800208001110
80024290868002180021800202400688002080020800208001110
80024291308002180021800202400638002080020800408002410
80024291138006080060800592400758002080020800208001110
80024291328002180021800202401908005880058800208001110
80024290458002180021800202400828002080020800208001110
80024289928002180021800202403078009980099800208001110
80024289418002180021800202402998009580095800208001110
80024291198002180021800202400668002080020800208001110