Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

CMN (shifted immediate, 64-bit)

Test 1: uops

Code:

  cmn x0, #3, lsl #12
  mov x0, 1

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
100450110011001100030001000100010001001
100439610011001100030001000100010001001
100439110011001100030001000100010001001
100439510011001100030001000100010001001
100439110011001100030001000100010001001
100438810011001100030001000100010001001
100439310011001100030001000100010001001
100439110011001100030001000100010001001
100439010011001100030001000100010001001
100439210011001100030001000100010001001

Test 2: Latency 2->1

Chain cycles: 1

Code:

  cmn x0, #3, lsl #12
  cset x0, cc
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)ldst uops in schedulers (5b)dispatch uop (78)map int uop (7c)map ldst uop (7d)map simd uop (7e)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
2020420030201012010120107051944200201082021600202162000110100
2020420030201012010120108051954800201082021600202162000110100
2020420030201012010120108051954800201082021600202162000110100
2020420030201012010120108051954800201082021600202162000110100
2020420030201012010120108051954800201082021600202162000110100
2020420030201012010120108051954800201082021600202162000110100
2020420030201012010120108051954800201082021600202162000110100
2020420030201012010120108051954800201082021600202162000110100
2020420030201012010120108051954800201082021600202162000110100
2020420030201012010120108051954800201082021600202162000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105199562005820080200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010

Test 3: throughput

Count: 8

Code:

  cmn x0, #3, lsl #12
  cmn x0, #3, lsl #12
  cmn x0, #3, lsl #12
  cmn x0, #3, lsl #12
  cmn x0, #3, lsl #12
  cmn x0, #3, lsl #12
  cmn x0, #3, lsl #12
  cmn x0, #3, lsl #12
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.3634

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
802042917680114801148011824035480118802208022080015100
802042903080114801148011824035480118802208022080015100
802042906780113801138011824035780119802208022080013100
802042903580115801158011924036080120802208022080013100
802042911380114801148011924035180117802208022080013100
802042908680112801128011624035480118802208022080015100
802042906080115801158012024035480118802208022080013100
802042904080115801158012024035480118802208021680012100
802042912380113801138011824035180117802208022080013100
802042911780113801138011824035480118802208022080015100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.3631

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80024301158003480034800382401258003980042800208001110
80024290928002180021800202400898002080020800208001110
80024290528002180021800202400628002080020800208001110
80024289808002180021800202400718002080020800208001110
80024289498002180021800202400688002080020800208001110
80024290998002180021800202400718002080020800208001110
80024290658002180021800202400678002080020800208001110
80024289808002180021800202400708002080020800208001110
80024289418002180021800202400858002080020800208001110
80024291178002180021800202400718002080020800208001110