Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

CMN (uxtx, 64-bit)

Test 1: uops

Code:

  cmn x0, x1, uxtx
  mov x0, 1
  mov x1, 2

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
100452210011001100030001000100020001001
100439410011001100030001000100020001001
100439210011001100030001000100020001001
100439210011001100030001000100020001001
100439310011001100030001000100020001001
100439210011001100030001000100020001001
100439510011001100030001000100020001001
100439210011001100030001000100020001001
100438910011001100030001000100020001001
100439310011001100030001000100020001001

Test 2: Latency 3->1

Chain cycles: 1

Code:

  cmn x0, x1, uxtx
  cset x0, cc
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
2020420030201012010120107519435201072021430292020015010100
2020420030201012010120107519548201082021630224020001010100
2020420030201012010120108519548201082021630224020001010100
2020420030201012010120108519548201082021630224020001010100
2020420030201012010120108519548201082021630224020001010100
2020420030201012010120108519548201082021630224020001010100
2020420030201012010120108519548201082021630224020001010100
2020420030201012010120108519548201082021630224020001010100
2020420030201012010120108519548201082021630224020001010100
2020420030201012010120108519548201082021630224020001010100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185194762001020020300202000110010
20024200302001120011200105195982001020020301162001510010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010

Test 3: Latency 3->2

Chain cycles: 1

Code:

  cmn x0, x1, uxtx
  cset x1, cc
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201085193112010720214302212000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185194762001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010

Test 4: throughput

Count: 8

Code:

  cmn x0, x1, uxtx
  cmn x0, x1, uxtx
  cmn x0, x1, uxtx
  cmn x0, x1, uxtx
  cmn x0, x1, uxtx
  cmn x0, x1, uxtx
  cmn x0, x1, uxtx
  cmn x0, x1, uxtx
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.3635

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
8020429237801128011280117240354801188021816024280014100
8020429112801148011480119240351801178022016024080013100
8020429086801138011380118240357801198022116024080013100
8020429170801138011380118240357801198022016024080013100
8020429059801158011580119240354801188022016024080012100
8020429060801158011580120240354801188022016024080015100
8020429082801138011380118240351801178022016024080015100
8020429160801158011580119240357801198022016024080013100
8020429051801158011580119240351801178022016024080013100
8020429060801158011580120240354801188022016024080013100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.3630

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
800243023880035800358003924015580042800461600208001110
800242901580021800218002024011380020800201600208001110
800242906480021800218002024010880020800201600208001110
800242909880021800218002024028980081800821600208001110
800242890680021800218002024010380020800201600208001110
800242898080021800218002024010880020800201600208001110
800242907080021800218002024009680020800201600208001110
800242905980021800218002024011680020800201600208001110
800242911280021800218002024009780020800201600208001110
800242893380021800218002024011380020800201600208001110