Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

CMN (uxtw, 32-bit)

Test 1: uops

Code:

  cmn w0, w1, uxtw
  mov x0, 1
  mov x1, 2

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
100451710011001100030001000100020001001
100439310011001100030001000100020001001
100439210011001100030001000100020001001
100439110011001100030001000100020001001
100439310011001100030001000100020001001
100439010011001100030001000100020001001
100439210011001100030001000100020001001
100439110011001100030001000100020001001
100439510011001100030001000100020001001
100439110011001100030001000100020001001

Test 2: Latency 3->1

Chain cycles: 1

Code:

  cmn w0, w1, uxtw
  cset x0, cc
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201070519338020108202140302212000110100
20204200302010120101201080519548020108202160302242000110100
20204200302010120101201080519548020108202160302242000110100
20205200602011520115201470519443020108202160302242000110100
20204200302010120101201080519548020108202160302242000110100
20204200302010120101201080519548020108202160302242000110100
20204200302010120101201080519548020108202160302242000110100
20204200302010120101201080519548020108202160302242000110100
20204200302010120101201080519548020108202160302242000110100
20204200302010120101201080519548020108202160302242000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185194962001820034300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010

Test 3: Latency 3->2

Chain cycles: 1

Code:

  cmn w0, w1, uxtw
  cset x1, cc
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201075194642011020218302242000110100
20204200302010120101201075194352010720214302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185196382001820036300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105199562005820084300202000110010
20024200302001120011200105199162005020068300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020301582004310010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300922002210010
20024200302001120011200105195982001020020300202000110010

Test 4: throughput

Count: 8

Code:

  cmn w0, w1, uxtw
  cmn w0, w1, uxtw
  cmn w0, w1, uxtw
  cmn w0, w1, uxtw
  cmn w0, w1, uxtw
  cmn w0, w1, uxtw
  cmn w0, w1, uxtw
  cmn w0, w1, uxtw
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.3634

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
8020429274801148011480119240351801178021816023680014100
8020428992801138011380117240360801208022016024080013100
8020429056801138011380118240360801208022016024080012100
8020429157801138011380118240354801188022016024080015100
8020429049801138011380118240354801188022016024080013100
8020429113801148011480119240351801178022016024080015100
8020429060801158011580120240351801178022016024080015100
8020429123801138011380118240360801208022016024080015100
8020429058801158011580120240360801208022016024080013100
8020429128801158011580120240354801188022016024080015100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.3631

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
800243019580034800348003824014280040800421600208001110
800242902280021800218002024012780020800201600208001110
800252915580073800738007824011880020800201600208001110
800242908280021800218002024011880020800201600208001110
800242909880021800218002024011380020800201600208001110
800242900680021800218002024010380020800201600208001110
800242910680021800218002024010980020800201600208001110
800242899680021800218002024011380020800201600208001110
800242910680021800218002024010180020800201600208001110
800242899080021800218002024010680020800201600208001110