Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

CMP (shifted immediate, 32-bit)

Test 1: uops

Code:

  cmp w0, #3, lsl #12
  mov x0, 1

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
100453910011001100030001000100010001001
100439610011001100030001000100010001001
100439310011001100030001000100010001001
100439110011001100030001000100010001001
100439210011001100030001000100010001001
100439410011001100030001000100010001001
100439210011001100030001000100010001001
100439410011001100030001000100010001001
100439110011001100030001000100010001001
100439110011001100030001000100010001001

Test 2: Latency 2->1

Chain cycles: 1

Code:

  cmp w0, #3, lsl #12
  cset x0, cc
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201085193122010720212202162000110100
20204200302010120101201085194422010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185194542001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010

Test 3: throughput

Count: 8

Code:

  cmp w0, #3, lsl #12
  cmp w0, #3, lsl #12
  cmp w0, #3, lsl #12
  cmp w0, #3, lsl #12
  cmp w0, #3, lsl #12
  cmp w0, #3, lsl #12
  cmp w0, #3, lsl #12
  cmp w0, #3, lsl #12
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.3635

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
802042933380115801158011924035480118802208022080013100
802042908680113801138011824035780119802208022080013100
802042909580113801138011824035480118802208022080012100
802042903580115801158011924035780119802208022080015100
802042910780115801158011924035480118802208022080012100
802042907580115801158011924035780119802208022080015100
802042909280113801138011824035480118802208022080013100
802042908280115801158011924035780119802208022080015100
802042908480112801128011724035480118802208026080053100
802042907280112801128011724035480118802208022080013100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.3624

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80024300848003580035800392401448003980040800208001110
80024289708002180021800202400958002080020800208001110
80024292018002180021800202400878002080020800208001110
80024290698002180021800202400728002080020800208001110
80024289818002180021800202400688002080020800208001110
80024289328002180021800202400888002080020800208001110
80024291328002180021800202400858002080020800208001110
80024290368002180021800202400888002080020800208001110
80024289778002180021800202400688002080020800208001110
80024289348002180021800202400798002080020800208001110