Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

CMP (uxtw, 32-bit)

Test 1: uops

Code:

  cmp w0, w1, uxtw
  mov x0, 1
  mov x1, 2

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
100453810011001100030001000100020001001
100439210011001100030001000100020001001
100439210011001100030001000100020001001
100439110011001100030001000100020001001
100439010011001100030001000100020001001
100439310011001100030001000100020001001
100439210011001100030001000100020001001
100439010011001100030001000100020001001
100439310011001100030001000100020001001
100439310011001100030001000100020001001

Test 2: Latency 3->1

Chain cycles: 1

Code:

  cmp w0, w1, uxtw
  cset x0, cc
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201075193112010720214302212000110100
20204200302010120101201075195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)ldst uops in schedulers (5b)dispatch uop (78)map int uop (7c)map ldst uop (7d)map simd uop (7e)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
2002420030200112001120018051949600200182003400300202000110010
2002420030200112001120010051995600200582007900300442000110010
2002420030200112001120010051952600200192003400300202000110010
20024200302001120011200104250297697998511581466112630495920300202000110010
2002420030200112001120010051959800200102002000300202000110010
2002420030200112001120010051959800200102002000300202000110010
2002420030200112001120010051959800200102002000300202000110010
2002420030200112001120010051959800200102002000300202000110010
2002420030200112001120010051959800200102002000300202000110010
2002420030200112001120010051959800200102002000300202000110010

Test 3: Latency 3->2

Chain cycles: 1

Code:

  cmp w0, w1, uxtw
  cset x1, cc
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201075194342010720214302242000110100
20204200302010120101201075195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185194762001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010

Test 4: throughput

Count: 8

Code:

  cmp w0, w1, uxtw
  cmp w0, w1, uxtw
  cmp w0, w1, uxtw
  cmp w0, w1, uxtw
  cmp w0, w1, uxtw
  cmp w0, w1, uxtw
  cmp w0, w1, uxtw
  cmp w0, w1, uxtw
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.3635

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
8020429279801138011380118240354801188021816024280014100
8020529102801548015480158240357801198022016024080015100
8020429095801138011380118240351801178022016024080014100
8020429067801158011580119240357801198022016024080013100
8020429092801138011380118240357801198022016024080013100
8020429075801158011580119240357801198022016024080015100
8020429114801148011480119240351801178022016023280012100
8020429004801138011380118240357801198022016024080015100
8020429114801148011480119240354801188022016024080013100
8020429051801158011580119240357801198022016024080012100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.3631

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
800243018280034800348003924013880038800381600208001110
800242910080021800218002024010180020800201600208001110
800242913280021800218002024009780020800201600208001110
800242893680021800218002024011180020800201600208001110
800242897480021800218002024008180020800201600208001110
800242907080021800218002024008780020800201601408006210
800242900580021800218002024014080020800201600208001110
800242905880021800218002024010680020800201600208001110
800242910980021800218002024011180020800201600208001110
800242893380021800218002024011180020800201600208001110