Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

CMN (register, 32-bit)

Test 1: uops

Code:

  cmn w0, w1
  mov x0, 1
  mov x1, 2

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
100450310011001100030001000100020001001
100440310011001100030001000100020001001
100439910011001100030001000100020001001
100440210011001100030001000100020001001
100439710011001100030001000100020001001
100439710011001100030001000100020001001
100439710011001100030001000100020001001
100439710011001100030001000100020001001
100439710011001100030001000100020001001
100440210011001100030001000100020001001

Test 2: Latency 3->1

Chain cycles: 1

Code:

  cmn w0, w1
  cset x0, cc
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201075194642011020218302212000110100
20204200302010120101201075195482010820216302242000110100
20204200302010120101201075195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185196382001820036300412000110010
20024200302001120011200185196382001820036300442000110010
20024200302001120011200185196382001820036300442000110010
20024200302001120011200185196382001820036300442000110010
20024200302001120011200185196382001820036300442000110010
20024200302001120011200185196382001820036301102001510010
20024200302001120011200185196382001820036300442000110010
20024200302001120011200185196382001820036300442000110010
20024200302001120011200185196382001820036300202000110010
20024200302001120011200105195982001020020300202000110010

Test 3: Latency 3->2

Chain cycles: 1

Code:

  cmn w0, w1
  cset x1, cc
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201075193272010720214302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185195162001820034300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010

Test 4: throughput

Count: 8

Code:

  cmn w0, w1
  cmn w0, w1
  cmn w0, w1
  cmn w0, w1
  cmn w0, w1
  cmn w0, w1
  cmn w0, w1
  cmn w0, w1
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.3635

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
8020429112801148011480118240351801178021816024280014100
8020429019801148011480119240351801178021716024080015100
8020429082801158011580119240360801208022016023480013100
8020429004801138011380118240351801178022016024080013100
8020429056801138011380118240360801208022016024080013100
8020429092801138011380118240354801188022016024080015100
8020429118801158011580119240357801198022016024080013100
8020429052801158011580119240351801178022016031280052100
8020429077801138011380118240354801188022016024080013100
8020429092801138011380118240354801188022016024080013100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.3631

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
800242998380035800358003924016680040800411600628002610
800242909080034800348003924015980039800411600208001110
800242889180021800218002024011280020800201600208001110
800242898080021800218002024010180020800201600208001110
800242905080021800218002024010380020800201600208001110
800242911780021800218002024011080020800201600208001110
800242893780021800218002024011280020800201600208001110
800242898480021800218002024009580020800201600208001110
800242904980021800218002024010380020800201600208001110
800242910080021800218002024011480020800201600208001110