Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

TST (immediate, 64-bit)

Test 1: uops

Code:

  tst x0, #3
  mov x0, 1

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
100452510011001100030001000100010001001
100439110011001100030001000100010001001
100439210011001100030001000100010001001
100439210011001100030001000100010001001
100439610011001100030001000100010001001
100439110011001100030001000100010001001
100439010011001100030001000100010001001
100439110011001100030001000100010001001
100439210011001100030001000100010001001
100439510011001100030001000100010001001

Test 2: Latency 2->1

Chain cycles: 1

Code:

  tst x0, #3
  cset x0, cc
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201085193182010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185194952001820036200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105199242005720077200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010

Test 3: throughput

Count: 8

Code:

  tst x0, #3
  tst x0, #3
  tst x0, #3
  tst x0, #3
  tst x0, #3
  tst x0, #3
  tst x0, #3
  tst x0, #3
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.3635

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
802042917680115801158011924035180117802208021880013100
802042905780114801148011824035180117802208022080013100
802042907580112801128011724035480118802208022080015100
802042907780113801138011824035780119802208022080013100
802042906780115801158011924035480118802208022080013100
802042916480114801148011824035480118802208022080012100
802042906680115801158011924035480118802208022080012100
802042916680113801138011824035780119802208022080012100
802042908580112801128011724035480118802208022080015100
802042905880115801158012024035780119802208022080015100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.3630

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule simd uop (54)schedule ldst uop (55)dispatch int uop (56)dispatch simd uop (57)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80024301448003580035008003902401378004180042800208001110
80024289988002180021008002002400968002080020800208001110
80024291328002180021008002002400698002080020800208001110
80024290388002180021008002002400848002080020800208001110
80024289298002180021008002002400698002080020800208001110
80024290838002180021008002002400698002080020800208001110
80024290428002180021008002002400778002080020800208001110
80024289648002180021008002002400688002080020800208001110
80024289568002180021008002002400868002080020800208001110
80024289318002180021008002002401008002080020800208001110