Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

TST (register, 32-bit)

Test 1: uops

Code:

  tst w0, w1
  mov x0, 1
  mov x1, 2

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
100450110011001100030001000100020001001
100439610011001100030001000100020001001
100439210011001100030001000100020001001
100439210011001100030001000100020001001
100439210011001100030001000100020001001
100439110011001100030001000100020001001
100438910011001100030001000100020001001
100439210011001100030001000100020001001
100439510011001100030001000100020001001
100439110011001100030001000100020001001

Test 2: Latency 3->1

Chain cycles: 1

Code:

  tst w0, w1
  cset x0, cc
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201085193112010720214302212000110100
20204200302010120101201075195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185194962001820034300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020301162001510010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010

Test 3: Latency 3->2

Chain cycles: 1

Code:

  tst w0, w1
  cset x1, cc
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201075193112010720214302212000110100
20204200302010120101201075195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085198572014720260302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100
20204200302010120101201085195482010820216302242000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185195162001820034300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105199662005920084300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20024200302001120011200105195982001020020300202000110010
20025200602002520025200595195982001020020300202000110010
20024200302001120011200105199562005820080301162001610010
20024200302001120011200105195982001020020300202000110010

Test 4: throughput

Count: 8

Code:

  tst w0, w1
  tst w0, w1
  tst w0, w1
  tst w0, w1
  tst w0, w1
  tst w0, w1
  tst w0, w1
  tst w0, w1
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.3635

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
8020429106801138011380118240468801568025616024080012100
8020429162801148011480119240354801188021816024080013100
8020429076801158011580119240351801178021716024080015100
8020429081801128011280117240354801188022016024080013100
8020429051801158011580119240357801198022016024080015100
8020429078801128011280117240354801188022016023280012100
8020429026801158011580119240360801208022016024080015100
8020429092801138011380118240354801188022016024080013100
8020429060801158011580119240357801198022016024080015100
8020429066801138011380118240351801178022016024080013100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.3630

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
800243009780037800378004124016880041800441600208001110
800242914880021800218002024010480020800201600208001110
800242911780021800218002024009780020800201600208001110
800242893280021800218002024010680020800201600208001110
800242897280021800218002024010180020800201600208001110
800242907080021800218002024009180020800201600208001110
800242912780021800218002024011480020800201600208001110
800242891380021800218002024011280020800201600208001110
800242897380021800218002024009980020800201601368006210
800242898980021800218002024010580020800201600208001110