Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

STP (pre-index, 64-bit)

Test 1: uops

Code:

  stp x0, x1, [x6, #8]!

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 2.000

Integer unit issues: 1.000

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)03mmu table walk data (08)l2 tlb miss data (0b)1e1f2022233a3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)606d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst int store (96)inst ldst (9b)l1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)aaabacafldst x64 uop (b1)bcl1d cache miss st nonspec (c0)cfd5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)? ldst retires (ed)? int retires (ef)f5f6f7f8fd
1005104080063240000102501612520001000100010001000507784582411040104082438982000100030001040124111001100010001000076430100003026010271250857311611103710001000100010411041104110411041
1004104080003200060102515117225200010001000100010005077845824110401040824389820001000300010401241110011000100010260691010000160310321250697311611103710001000100010411041104110411041
10041040700454181000102591502520001000100010001000507784582411040104082438982000100030001040124111001100010001014060021100101820110001251567311611103710001000100010411041104110411041
10041040800041800624102515141252000100010001000100050778458241104010408243898200010003000104012411100110001000101407121210010146010181250347311611103710001000100010411041104110411041
100410408000300000102501512520001000100010001000507784582411040104082438982000100030001040124111001100010001012643018100011812010161250607311611103710001000100010411041104110411041
1004104080063141080102562912520001000100010001000507784582411040104082438982000100030001040124111001100010001020123401410010144010081250347311611103710001000100010411041104110411041
100410407100314101501025101812520001000100010001000507784582411040104082438982000100030001040124111001100010001000060001000000010001250527311611103710001000100010411041104110411041
100410408005751410120102591412520001000100010001000507784582411040104082438982000100030001040124111001100010001000052001000000010001250257311611103710001000100010411041104110411041
100410408000520107010251724125200010001000100010005077845824110401040824389820001000300010401241110011000100010260342121001000010001250517311611103710001000100010411041104110411041
1004104080003101010010251359125200010001000100010005077845824110401040824389820001000300010401241110011000100010140393910000328310221250657311611103710001000100010411041104110411041

Test 2: Latency 3->3

Code:

  stp x0, x1, [x6, #8]!

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.0040

retire uop (01)cycle (02)03mmu table walk data (08)1e1f2022293a3c3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)60696a6d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst branch cond (94)inst int store (96)inst int alu (97)inst ldst (9b)9fl1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)a4ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)aaabacafldst x64 uop (b1)ldst xpg uop (b2)bcl1d cache miss st nonspec (c0)cfd5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)eaeb? ldst retires (ed)? int retires (ef)f5f6f7f8fd
10209100407502010727921752870100100257661241893425201001010010000101001000052216346882404969601004010040867438747201002001000020030000100401221110201100991001000010010000100108620133638206651025524391840710109011250421119971011711100371000059010000101001004110041100411004110041
1020410040750203153792173683211210025758120162252520100101001000010100100005222034688240496960100401004086743874720100200100002003000010040122111020110099100100001001000010010858012973820660102582318604075410903125042112707101171110037100007010000101001004110041100411004110041
10204100407501878728021720781116100257611031682325201001010010000101001000052220346882404969601004010040867438747201002001000020030000100401221110201100991001000010010000100108860141935306651024024085434731108881250420129871011711100371000077010000101001004110041100411004110041
1020410040750186665813170488016010025769921592225201001010010000101001000052212346882404969601004010040867438747201002001000020030000100401221110201100991001000010010000100108780141936506311023625592236806109061250417119671011711100371000018010000101001004110041100411004110041
102041004075018275676717368541161002575790176322520100101001000010100100005221314688240496960100401004086743874720100200100002003000010040122111020110099100100001001000010010862013153560677102602338743268910907125041911947101171110037100007010000101001004110041100411004110041
1020410040750191178768172887288100257701141853725201001010010000101001000052212346882404969601004010040867438747201002001000020030000100401221110201100991001000010010000100108920132435706451025623589436769108951250417116171011711100371000017010000101001004110041100411004110041
10204100407501839587631720750108100257741081642725201001010010000101001000052217946882414969601009310040867438747201002001000020030000100401221110201100991001000010010000100109120138336006321025122489832736108941250426130071011711100371000054110000101001004110041100411004110041
10204100407502049747971752700160100257631091673225201001010010000101001000052215546882404969601004010040867438747201002001000020030000100401221110201100991001000010010000100109060137738006451023123486236717109191250422122471011711100371000072010000101001004110041100411004110041
1020410040750192667813166475414410025794113157342520100101001000010100100005221334688241496960100401004086743874720100200100002003000010040122111020110099100100001001000010010874014033800673102302578763274510884125042212547101171110037100007010000101001004110041100411004110041
10204100407501953648191664980108100257691111422625201001010010000101001000052213346882404969601004010040867438747201002001000020030000100401221110201100991001000010010000100109000132936706531025725686798786109361250422124071011711100371000081010000101001004110041100411004110041

1000 unrolls and 10 iterations

Result (median cycles for code): 1.0040

retire uop (01)cycle (02)03l1d tlb fill (05)mmu table walk data (08)09l2 tlb miss data (0b)1e1f2022293a3c3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)6061696a6d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst branch cond (94)inst int store (96)inst int alu (97)inst ldst (9b)9fl1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafldst x64 uop (b1)ldst xpg uop (b2)bcl1d cache miss st nonspec (c0)l1d tlb miss nonspec (c1)c2cfd0d2d5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)eaeb? ldst retires (ed)? int retires (ef)f5f6f7f8fd
100291004075110022236881717366501081002579012020440252001010010100001001010000521049468824104969601004010040869638770200102010000203000010040124111002110910100001010000101089314142136063010278242089011473510904125043412051426400031633100371000093010000100101004110041100411004110041
10024100407520002121768091744853144100257741412013925200101001010000100101000052106546882410496960100401004086963877020010201000020300001004012411100211091010000101000010109341814013666671028323108896476710948125043412991406400021633100371000011010000100101004110041100411004110041
1002410040752200207679834175272110010025754125183272520010100101000010010100005210094688241049696010040100408696387692001020100002030000100401241110021109101000010100001010900141434350670102842930913588251090112504361353140640003163210037100006010000100101004110041100411004110041
10024100407520002016828151776810284100257671212004425200101001010000100101000052103346882410496960100401004086963877020010201000020300001004012411100211091010000101000010109271614633926391027626908684680510926125043712291426400031622100371000077110000100101004110041100411004110041
100241004075202020917577116967321161002578213818541252001010010100001001010000521025468824104969601004010040869638770200102010000203000010040124111002110910100001010000101092871424376650102752820937387921089412504261381706400031633100371000086010000100101004110041100411004110041
100241004076100019836380817287901121002579310214431252001010010100001001010000521065468824104969601004010040869638770200102010000203000010040124111002110910100001010000101092771294409643102702720858647671090512504321327726400031633100371000087010000100101004110041100411004110041
100241004075111021697579717127221121002578816120428252001010010100001001010000521073468824104969601004010040869638770200102010000203000010040124111002110910100001010000101091371365370690102902770879528081090012504321107706400021633100371000018010000100101004110041100411004110041
1002410040751000207678853172875113610025779147184312520010100101000010010100005210814688241049696010040100408696387702001020100002030000100401241110021109101000010100001010929712943806981025824828673881310921125043112357464000316331003710000100010000100101004110041100411004110041
100241004075100020678480917688301441002577613719729252001010010100001001010000521097468824154969601004010040869668770200102010000203000010040124111002110910100001010000101091891401381655102732391916461568109031250429131072640003163310037100006010000100101004110041100411004110041
100241004075110022418081217608111001002579411822036252001010010100001001010000521073468824154969601004010040869638770200102010000203000010040124111002110910100001010000101091871434382658102642621903448461095012504271292706405431633100371000015010000100101004110041100411004110041

Test 3: throughput

Count: 8

Code:

  stp x0, x1, [x6, #8]!
  stp x0, x1, [x7, #8]!
  stp x0, x1, [x8, #8]!
  stp x0, x1, [x9, #8]!
  stp x0, x1, [x10, #8]!
  stp x0, x1, [x11, #8]!
  stp x0, x1, [x12, #8]!
  stp x0, x1, [x13, #8]!
  mov x7, x6
  mov x8, x6
  mov x9, x6
  mov x10, x6
  mov x11, x6
  mov x12, x6
  mov x13, x6

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5237

retire uop (01)cycle (02)03l1d tlb fill (05)mmu table walk data (08)09l2 tlb miss instruction (0a)l2 tlb miss data (0b)18191e1f202223293a3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)67696a6b6d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst branch cond (94)inst int store (96)inst int alu (97)inst ldst (9b)9fl1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)a4ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafldst x64 uop (b1)ldst xpg uop (b2)bcl1d cache miss st nonspec (c0)l1d tlb miss nonspec (c1)c2branch mispred nonspec (cb)cfd5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)? ldst retires (ed)? int retires (ef)f5f6f7f8fd
80209419283130000000173183379810712135964186176020622908130525161812808078000080100800004035401928056265493888204200141916318263318451601002008000020024000041898751180201100991008000010080000100809550495648087385480578250090276149781447100003266252011210511011611419668297180000801004196041931418794190041893
8020441833314000000016808107781070413512041908794171625361325251633648222080000801008000040187219272402264938759041860417903179733191116010020080000200240000419988211802011009910080000100800001008093313537347586988580568262091144151381545100003263657201320511011711418658059780000801004191041898419304193841961
8020441897314100000018457808031076812314441883776164225111329251615138345180000801008000040661019261844214938899041967419603181933185016010020080000200240000418477511802011009910080000100800001008095113579845186986280574266290446163781451100003258745361320511011711419058184380000801004196541816418524185341916
80204417833140000000184582076410680146144418177982100252013302516145382621800158010080000402302192829634249387450418614201231793331939160100200800002002400004195675118020110099100800001008000010080942053204888629008063425009244615398148510000326065002000511011711418758044380000801004188641893418534195241928
8020441874314000000015758217761067214317241929844194225961372251608788063680000801008000040401519316563434938874041822418903174133186616010020080000200240000418487511802011009910080000100800001008097314457244985989880545273091732157981499100003263545721300511011711418628229780000801004189141857418764195041968
80204418993130000000183673180110680135100419498151619223713032516082783804800008010080000402480192671233349388550418804185031817331897160100200800002002400004198175118020110099100800001008000010080928854464708789008074429608563816228146910000326505076000511011711418228408180000801004192141952419274182241920
80204419303141000000169884081110736116132420758312015269713072516085681084800008010080000402133192692828449388600418564188831854331783160320200800002002400004195175118020110099100800001008000010080970052764658638928067228929234415968151610000325944845000511011711420378228880000801004190741866418994190541873
802044181531500000001851920791106801231044183477120512415132425162194829118000080100800004041721927751281493882104182041927318533319401601002008000020024000041943751180201100991008000010080000100809470501148785788680656276093388157281477100003258350601410511011711418468075980000801004199641892419554190242022
802044190531510100001749759773106321141044186877718482676127325160858823448000080100800004023541928896203493885104186241935318013318691601002008000020024000041845751180201100991008000010080000100809210491542685389080569241086446155481518100003261255331310511011711419118039080000801004187941825419144198341855
8020441965314000000018038328231073611988419107811758239912612516076883341800008010080000402139192666481249388130419624195431857332022160100200800002002400004197882118020110099100800001008000010080952053714818879068058224108923614898149310000325375588000511011611419868117380000801004206441880419314187541893

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5258

retire uop (01)cycle (02)03l1d tlb fill (05)191e1f2022293a3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)5f6067696a6b6d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst branch cond (94)inst int store (96)inst int alu (97)inst ldst (9b)9fl1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)a4ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafldst x64 uop (b1)ldst xpg uop (b2)bcl1d cache miss st nonspec (c0)cdcfd5map dispatch bubble (d6)d9ddfetch restart (de)e0? int output thing (e9)? ldst retires (ed)? int retires (ef)f5f6f7f8fd
80029421783153016329148191688149120421417911758264014132516356280408800008001080000404213193715610327493893604204542063320633321351600102080360202400004204776118002110910800001080000108098105412486905850806862750886701674816181000032600529305020916042420378069380000800104215542184420974209542062
800244206131530184285185717521621524209777819952764144425162376810648000080010800004122101938668001352493896204203342015319343320811600102080000202400004214675118002110910800001080000108097506224463913852805732680902341659816661000032608488705020416024420838115180000800104208341998420844203042135
80024421523162016628198301704139220421178011815245014122516344781049800008001080000402832193518700812493906304201542053319953320781600102080000202400004207776118002110910800001080000108092905899510915867806432490885301638816001000032635473705020516024420098145180000800104204542089420834201442125
800244205231520177984480416561491244201378416832797146825160555803918024280010800004016701941485006294939019042101420653203021319861600102080000202400004208975118002110910800001080000108099505758481911837806112660877821682815881000032605521205020415042420648108880000800104208542066421584195541999
80024420263142016688128141632138144420297731847274914082516074680753800008001080000401515193609610391493895204208342213321173321771600102080000202400004211675118002110910800001080000108096505868517882910805922600891681538815931000032644504405020416042420778130580000800104195541984420444211842029
80024421033152019358718071688122192421888021901311813892516098584259800028001080000402230193223201350493898304199542116319573321091600102080000202400004204775118002110910800001080000108098105373489900926806882610904541563816061000032589430305020215042421598064580000800104212942117421174205242189
80024420443152017678938001520140100419967881791272014242516049681331800008001080000402697194006000374493906204206142160320563320311600102080000202400004207576118002110910800001080000108094105403473884898806292532878321717815151000032664552405020448042420698222880000800104199842566421574205242178
80024421903152018818297921760127152422868011859234814242516119880639800188001080000402936193710400542493892404211842006320323321571600102080000202400004198175218002110910800001080000108095105491443916901806702310883321619815781000032695552105020418063424988293280000800104218842039422054204442137
800244205531610175212198061688135140422848002153229014042516209583825800008001080000403246193796800268493905804204742011320333321091600102080000202400004193176118002210910800001080000108098205670499880913806192720902381640815751000032656546305020216024420468053780000800104210241992421064200942031
80024421133152018668168011736136164420637791873280113812516077180482800008001080000401804193484800327493895704211342047320473320421600102080000202400004196675118002110910800001080000108094305289469901946806322680920361631815711000032616511205020416024421388114980000800104208942027420534207542008