Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

CMP (register, 64-bit)

Test 1: uops

Code:

  cmp x0, x1
  mov x0, 1
  mov x1, 2

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
100453410011001100030001000100020001001
100439410011001100030001000100020001001
100439110011001100030001000100020001001
100439110011001100030001000100020001001
100439110011001100030001000100020001001
100439410011001100030001000100020001001
100439210011001100030001000100020001001
100439310011001100030001000100020001001
100439210011001100030001000100020001001
100439110011001100030001000100020001001

Test 2: Latency 3->1

Chain cycles: 1

Code:

  cmp x0, x1
  cset x0, cc
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201075193112010720214302242000110100
20204200302010120101201075195482010820216302202000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302202000110100
20204200302010120101201085195482010820213302242000110100
20204200302010120101201085195482010820213302242000110100
20204200302010120101201085195482010820213302202000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185196382001820036300202000110010
20025200602002520025200585195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010

Test 3: Latency 3->2

Chain cycles: 1

Code:

  cmp x0, x1
  cset x1, cc
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201075193382010820214302242000110100
20204200302010120101201075198872015020260302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185196382001820036300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020301102001510010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010

Test 4: throughput

Count: 8

Code:

  cmp x0, x1
  cmp x0, x1
  cmp x0, x1
  cmp x0, x1
  cmp x0, x1
  cmp x0, x1
  cmp x0, x1
  cmp x0, x1
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.3634

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
8020429214801138011380117240474801588025816031480049100
8020429165801148011480119240351801178021816023280012100
8020429082801158011580119240357801198022016024080015100
8020429066801138011380118240351801178022016024080013100
8020429039801158011580119240357801198022016024080015100
8020429085801128011280117240354801188022016024080014100
8020429082801158011580119240354801188022016024080015100
8020429066801138011380118240351801178022016024080013100
8020429034801158011580119240357801198022016024080015100
8020429117801138011380118240477801598026016024080015100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.3619

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
800243007480035800358003924010580020800201600208001110
800242910880021800218002024011280020800201600208001110
800242903480021800218002024010680020800201600208001110
800242904680021800218002024011980020800201600208001110
800242905080021800218002024011180020800201600208001110
800242907080021800218002024012080020800201600208001110
800242905080021800218002024009580020800201600208001110
800242908980021800218002024011980020800201600208001110
800242903480021800218002024012580020800201600208001110
800252912380075800758007924011280020800201600208001110