Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

TST (immediate, 32-bit)

Test 1: uops

Code:

  tst w0, #3
  mov x0, 1

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
100452110011001100030001000100010001001
100439510011001100030001000100010001001
100438910011001100030001000100010001001
100439610011001100030001000100010001001
100439210011001100030001000100010001001
100439110011001100030001000100010001001
100439610011001100030001000100010001001
100439210011001100030001000100010001001
100439110011001100030001000100010001001
100439110011001100030001000100010001001

Test 2: Latency 2->1

Chain cycles: 1

Code:

  tst w0, #3
  cset x0, cc
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201075193122010720212202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185194542001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105199302006020081200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200802001510010

Test 3: throughput

Count: 8

Code:

  tst w0, #3
  tst w0, #3
  tst w0, #3
  tst w0, #3
  tst w0, #3
  tst w0, #3
  tst w0, #3
  tst w0, #3
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.3635

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
8020429215801148011408011824035180117802188022080015100
8020429128801128011208011724035480118802208022080013100
8020429087801158011508011924035480118802208022080015100
8020429063801138011308011824035780119802208022080013100
8020429051801158011508011924034880116802168022080012100
8020429166801138011308011824035480118802208022080013100
8020429107801158011508011924035480118802208022080015100
8020429072801128011208011724035780119802208022080015100
8020429052801138011308011824035780119802208022080015100
8020429051801158011508011924035480118802208022080013100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.3630

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80024301038003580035800392400888002080020800208001110
80024290768002180021800202400848002080020800208001110
80024289868002180021800202400898002080020800208001110
80024291198002180021800202400828002080020800208001110
80024289838002180021800202400698002080020800208001110
80024290988002180021800202400728002080020800208001110
80024289818002180021800202400728002080020800208001110
80024290988002180021800202400958002080020800208001110
80024289918002180021800202400668002080020800208001110
80024291328002180021800202400848002080020800208001110