Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

STNP (64-bit)

Test 1: uops

Code:

  stnp x0, x1, [x6]
  mov x0, 0

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
1005115410191101810001742110001000300011000
1004106810011100010001727910001000300011000
1004106110011100010001727910001000300011000
1004106110011100010001727910001000300011000
1004106110011100010001727910001000300011000
1004106110011100010001727910001000300011000
1004106110011100010001727910001000300011000
1004106110011100010001727910001000300011000
1004106110011100010001727910001000300011000
1004106110011100010001727910001000300011000

Test 2: throughput

Code:

  stnp x0, x1, [x6]
  add x6, x6, 16

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.0060

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20214112852070210522101801052310003452021712862011010209100091020930027100061000010100
20204100652010510105100001010610001859681703202010710206100061020630018100051000010100
20204100602010510105100001010610001859681703202010710206100061020630018100051000010100
20204100602010510105100001010610001859681703202010710206100061020630018100051000010100
20204100602010510105100001010610001859681703202010710206100061020630018100051000010100
20204100602010510105100001010610001859681703202010710206100061020630018100051000010100
20204100602010510105100001010610001859681703202010710206100061020630018100051000010100
20204100602010510105100001010610001859681703202010710206100061020630018100051000010100
20204101202010510105100001010610001859681703202010710206100061020630018100051000010100
20204100602010510105100001010610001643161728582010710206100061020630018100051000010100

1000 unrolls and 10 iterations

Result (median cycles for code): 1.0102

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20034115212060710427101801042810003489441713892002010027100081002030000100011000010010
20024101192001110011100001001010000508191710552001010020100001002030000100011000010010
20024101012001110011100001001010000489671710552001010020100001002030000100011000010010
20024101012001110011100001001010000508191710552001010020100001002030000100011000010010
20024101112001110011100001001010000508191715952001010020100001002030000100011000010010
20024101112001110011100001001010000389921731252001010020100001002030000100011000010010
20024101212001110011100001001010000508191714152001010020100001002030000100011000010010
20024101012001110011100001001010000508191710552001010020100001002030000100011000010010
20024101012001110011100001001010000508191710552001010020100001002030000100011000010010
20024101012001110011100001001010000508191710552001010020100001002030000100011000010010

Test 3: throughput

Code:

  stnp x0, x1, [x6]
  mov x7, 8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.0408

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1020510152101191011001810010001300176446101012001000820030024110000100
1020410408101011011000010010001300176544101012001000820030024110000100
1020410408101011011000010010001300176544101012001000820030024110000100
1020410408101011011000010010001300176544101012001000820030024110000100
1020410408101011011000010010001300176544101012001000820030024110000100
1020410408101011011000010010001300176544101012001000820030024110000100
1020510429101201031001710210001300176544101012001000820030024110000100
1020410408101011011000010010001300176544101012001000820030024110000100
1020410408101011011000010010001300176544101012001000820030024110000100
1020410408101011011000010010001300176544101012001000820030024110000100

1000 unrolls and 10 iterations

Result (median cycles for code): 1.0401

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1002510146100291110018101000130176508100112010008203000011000010
1002410408100111110000101000030176541100102010000203000011000010
1002410408100111110000101003030176406100402010044203000011000010
1002410408100111110000101000030176541100102010000203000011000010
1002410401100111110000101000030176541100102010000203000011000010
1002410408100111110000101000030176399100102010000203000011000010
1002410401100111110000101000030176399100102010000203000011000010
1002410408100111110000101000030176399100102010000203000011000010
1002410401100111110000101000030176399100102010000203000011000010
1002410401100111110000101000030176399100102010000203000011000010