Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

CMP (shifted immediate, 64-bit)

Test 1: uops

Code:

  cmp x0, #3, lsl #12
  mov x0, 1

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
100451210011001100030001000100010001001
100439010011001100030001000100010001001
100438810011001100030001000100010001001
100439110011001100030001000100010001001
100439310011001100030001000100010001001
100439510011001100030001000100010001001
100439210011001100030001000100010001001
100439110011001100030001000100010001001
100439510011001100030001000100010001001
100439110011001100030001000100010001001

Test 2: Latency 2->1

Chain cycles: 1

Code:

  cmp x0, #3, lsl #12
  cset x0, cc
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule simd uop (54)schedule ldst uop (55)dispatch int uop (56)dispatch simd uop (57)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101002010805194342010720212202122000110100
20204200302010120101002010805195482010820216202162000110100
20204200302010120101002010805195482010820216202162000110100
20204200302010120101002010805195482010820216202162000110100
20204200302010120101002010805195482010820216202552001510100
20204200302010120101002010805195482010820216202162000110100
20204200302010120101002010805195482010820216202162000110100
20204200302010120101002010805195482010820216204022008510100
20204200302010120101002010805195482010820216202162000110100
20204200302010120101002010805195482010820216202162000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185196382001820036200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010

Test 3: throughput

Count: 8

Code:

  cmp x0, #3, lsl #12
  cmp x0, #3, lsl #12
  cmp x0, #3, lsl #12
  cmp x0, #3, lsl #12
  cmp x0, #3, lsl #12
  cmp x0, #3, lsl #12
  cmp x0, #3, lsl #12
  cmp x0, #3, lsl #12
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.3634

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
802042921780114801148011824035180117802208021880013100
802042911980113801138011824035780119802208022080012100
802042909880113801138011824036080120802208021880013100
802042907280112801128011724035780119802208022080012100
802042907580115801158011924036080120802208022080013100
802042915180113801138011824035480118802208022080013100
802042909280113801138011824035180117802208022080015100
802042904980113801138011824035780119802208022080015100
802042905180115801158011924035480118802208022080015100
802042916680113801138011824034880116802168025880051100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.3628

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80024300548003480034800392400668002080020800208001110
80024289968002180021800202400688002080020800208001110
80024289728002180021800202400858002080020800208001110
80024290558002180021800202400918002080020800208001110
80024290868002180021800202400888002080020800208001110
80024289298002180021800202400728002080020800208001110
80024289818002180021800202400808002080020800208001110
80024290548002180021800202400718002080020800208001110
80024290898002180021800202400718002080020800208001110
80024289438002180021800202400788002080020800208001110