Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

CMP (immediate, 32-bit)

Test 1: uops

Code:

  cmp w0, #3
  mov x0, 1

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
100452010011001100030001000100010001001
100439810011001100030001000100010001001
100439010011001100030001000100010001001
100439210011001100030001000100010001001
100439110011001100030001000100010001001
100439110011001100030001000100010001001
100439110011001100030001000100010001001
100439110011001100030001000100010001001
100439010011001100030001000100010001001
100439310011001100030001000100010001001

Test 2: Latency 2->1

Chain cycles: 1

Code:

  cmp w0, #3
  cset x0, cc
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085194422010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100
20204200302010120101201085195482010820216202162000110100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200185195072001720032200362000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010
20024200302001120011200105195982001020020200202000110010

Test 3: throughput

Count: 8

Code:

  cmp w0, #3
  cmp w0, #3
  cmp w0, #3
  cmp w0, #3
  cmp w0, #3
  cmp w0, #3
  cmp w0, #3
  cmp w0, #3
  mov x0, 1

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.3635

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
802042922780113801138011824035180117802208022080015100
802042923780153801538015724035480118802208022080013100
802042903980115801158011924036080120802208022080013100
802042912880115801158012024036080120802208022080013100
802042912180113801138011824035180117802208022080015100
802042908180154801548015824035780119802208033280126100
802042929880232802328023624058880196802988022080013100
802042940680270802708027424046880156802588022080012100
802042909580113801138011824035180117802208022080015100
802042934180230802308023424035780119802208022080013100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.3631

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
82178453918212881315813812822401198003980040800208001110
800242909580021800210800202400628002080020800208001110
800242898180021800210800202400708002080020800208001110
800252912980072800720800762400618002080020800208001110
800242900080021800210800202400748002080020800208001110
800242896380021800210800202400668002080020800208001110
800242901980021800210800202400678002080020800208001110
800242894080021800210800202400798002080020800208001110
800242910880021800210800202400808002080020800208001110
800242904380021800210800202400708002080020800208001110