Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

CMP (immediate, 64-bit)

Test 1: uops

Code:

  cmp x0, #3
  mov x0, 1

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
100456310011001100030001000100010001001
100439110011001100030001000100010001001
100439010011001100030001000100010001001
100439210011001100030001000100010001001
100439010011001100030001000100010001001
100439110011001100030001000100010001001
100439410011001100030001000100010001001
100439310011001100030001000100010001001
100438910011001100030001000100010001001
100439210011001100030001000100010001001

Test 2: Latency 2->1

Chain cycles: 1

Code:

  cmp x0, #3
  cset x0, cc
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201085193392010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20205200602011520115201485195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185194542001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105196382001820034200362000110010

Test 3: throughput

Count: 8

Code:

  cmp x0, #3
  cmp x0, #3
  cmp x0, #3
  cmp x0, #3
  cmp x0, #3
  cmp x0, #3
  cmp x0, #3
  cmp x0, #3
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.3636

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
802042921680115801158011924035480118802208021880014100
802042906480113801138011724036080120802208022080013100
802042906680113801138011824035480118802208022080015100
802042905880115801158012024035780119802208022080012100
802042929180233802338023824035480118802208029980092100
802042909380153801538015824035180117802208029280088100
802042924380190801908019424057980193802948022080015100
802042907280112801128011724035780119802208022080013100
802042903580115801158011924035480118802208022080013100
802042916380113801138011824035780119802208022080013100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.3626

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80024299898003480034800392400718002080020800208001110
80024290438002180021800202400718002080020800208001110
80024289258002180021800202400848002080020800208001110
80024290828002180021800202400868002080020800208001110
80024290618002180021800202400778002080020800208001110
80024289878002180021800202400718002080020800208001110
80024289488002180021800202400938002080020800208001110
80024291168002180021800202400778002080020800208001110
80024290278002180021800202400818002080020800208001110
80024289798002180021800202400808002080020800208001110