Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

STP (pre-index, 32-bit)

Test 1: uops

Code:

  stp w0, w1, [x6, #8]!

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 2.000

Integer unit issues: 1.000

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)03l1d tlb fill (05)mmu table walk data (08)09l2 tlb miss data (0b)1e1f2022293a3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)606d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst int store (96)inst ldst (9b)l1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafbcl1d cache miss st nonspec (c0)l1d tlb miss nonspec (c1)c2cfd5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)? ldst retires (ed)? int retires (ef)f5f6f7f8fd
100510407000001721101501025962525200010001000100010005074645824110401040824389820001000300010401241110011000100010238643351014102981510451263707331633103710001000100010411041104110411041
1004104071111016000801025111208252000100010001000100050746458240104010408243898200010003000104012411100110001000102575332610140131121510341270707331633103710001000100010411041104110411041
100410408101101518205010251715108252000100010001000100050738458241104010408243898200010003000104012411100110001000102877772210121128121810241471717331633103710001000100010411041104110411041
1004104081110014181070102516151052520001000100010001000507384582401040104082438982000100030001040124111001100010001047316901101220001210051271727331633103710001000100010411041104110411041
10041040811016202120121210250026252000100010001000100050754458240104010408243898200010003000104012411100110001000101977072710130021182010281256727331633103710001000100010411041104110411041
100410408110061617101101025161155252000100010001000100050730458240104010408243898200010003000104012411100110001000102486172110200030181110231271017331633103710001000100010411041104110411041
100410408000001414108010251900325200010001000100010005076245824010401040824389820001000300010401241110011000100010088686101015022801210051247727331633103710001000100010411041104110411041
100410408111101514209010250527252000100010001000100050746458241104010408243898200010003000104012411100110001000102485462410161125121310271259707331633103710001000100010411041104110411041
100410407100161820107121025010825200010001000100010005073845824110401040824389820001000300010401241110011000100010338616231017002261510191363717331733103710001000100010411041104110411041
100410408110161817107010252205102520001000100010001000507544582411040104082438982000100030001040124111001100010001007868511020113201210221279727331633103710001000100010411041104110411041

Test 2: Latency 3->3

Code:

  stp w0, w1, [x6, #8]!

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.0040

retire uop (01)cycle (02)03mmu table walk data (08)1e1f2022293a3c3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)60696a6d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst branch cond (94)inst int store (96)inst int alu (97)inst ldst (9b)9fl1d tlb access (a0)l1d cache miss st (a2)l1d cache miss ld (a3)a4ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafbcl1d cache miss st nonspec (c0)branch cond mispred nonspec (c5)branch mispred nonspec (cb)cdcfd5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)eaeb? ldst retires (ed)? int retires (ef)f5f6f7f8fd
1020910040750229888811170486017210025813691065925201001010010000101061000052205546882414969601004010040868178743201062001000820030024100401221110201100991001000010010000100109241422399069210240302090940829109411112331117180160010037100000010000101001004110041100411004110041
102041004075020648182717849201201002580482735625201001010010000101061000052213346882414969601004010040868138747201002001000020030000100401221110201100991001000010010000100109081325380064410237298093234857108991511430007101171110037100001010000101001004110041100411004110041
1020410040750213088840172881096100257988195722520100101001000010100100005220594688241496960100401004086743874720100200100002003000010040122111020110099100100001001000010010904141241966791025027009054093610929810620007101171110037100001010000101001004110041100411004110041
10204100407502148878471776700120100258051001056025201001010010000101001000052207946882414969601004010040867438747201002001000020030000100401221110201100991001000010010000100109021378440067110241293093262829109231111660007101171110037100003010000101001004110041100411004110041
102041004075022237084517127401641002577171638425201001010010000101001000052216546882404969601004010040867438747201002001000020030000100401221110201100991001000010010000100109221359379068310250257090134839108821311240007101171110037100000010000101001004110041100411004110041
102041004075022507781116969601521002576093815525201001010010000101001000052207946882414969601004010040867438747201002001015620030000100401221110201100991001000010010000100109101140363068410244266091932854109471111590007101171110037100000010000101001004110041100411004110041
102041004075022716683117207701601002581469765525201001010010000101001000052202946882414969601004010040867438747201002001000020030000100401221110201100991001000010010000100109241361372266110257287090534884109501012560007101171110037100000010000101001004110041100411004110041
1020410040750224774806174463012010025800104705225201001010010000101001000052202746882414969601004010040867438747201002001000020030000100401221110201100991001000010010000100108881206379068510245295088140788109141012030007101171110037100000010000101001004110041100411004110041
1020410040750219993823176880096100258098010456252010010100100001010010000522117468824149696010040100408674387472010020010000200300001004012211102011009910010000100100001001094213114411169310256296092034909109431711080007101171110037100001010000101001004110041100411004110041
1020410040750240389840172889012010025783741025625201001010010000101001000052212146882404969601004010040867438747201002001000020030000100401221110201100991001000010010000100109361227375068710247279091740951109371012450007101171110037100000010000101001004110041100411004110041

1000 unrolls and 10 iterations

Result (median cycles for code): 1.0040

retire uop (01)cycle (02)0318191e1f2022293a3c3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)60696a6d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst branch cond (94)inst int store (96)inst int alu (97)inst ldst (9b)9fl1d tlb access (a0)l1d cache miss st (a2)l1d cache miss ld (a3)ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafbcl1d cache miss st nonspec (c0)c2cfd5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)eaeb? ldst retires (ed)? int retires (ef)f5f6f7f8fd
10029100407511219074809174471115210025801102116982520010100101000010010100005188894688241496960100401004086963877020010201000020300001004012411100211091010000101000010109221303408693102233080942388001092614117806406163310037100007010000100101004110041100411004110041
1002410040750023319084017688109610025804103104632520010100101000010010100005210974688240496960100401004086963877020010201000020300001004012411100211091010000101000010109221181410689102442820912428601091113111406402163310037100002010000100101004110041100411004110041
100241004076002265698071688860152100258098892562520010100101000010010100005210814688240496960100401004086963877020010201000020300001004012411100211091010000101000010108981295400688102322900914368401092710107706402162210037100001010000100101004110041100411004110041
1002410040750020107680417528401481002580994106572520010100101000010010100005211134688240496960100401004086963877020010201000020300001004012411100211091010000101000010109141286378683102162840936368631094610103306403163310037100001010000100101004110041100411004110041
1002410040750023379081517767601561002580211199532520010100101000010010100005211134688240496960100401004086963877020010201000020300001004012411100211091010000101000010109301218378684102522980868328761091712104306403162210037100000010000100101004110041100411004110041
1002410040750021878584217208211201002579910896552520010100101000010010100005211054688240496960100401004086963877020010201000020300001004012411100211091010000101000010108821336395667102403050914408571091711119406403163310037100001010000100101004110041100411004110041
1002410040750022207983217288109210025807101106532520010100101000010010100005210974688240496960100401004086963877020010201000020300001004012411100211091010000101000010108941239388680102352600922428931091114113606403163210037100001010000100101004110041100411004110041
100241004075002310748341752940116100258111101106025200101001010000100101000052111346882404969601004010040869611877020010201000020302401004012411100211091010000101000010108901337367710102463050902468151089014106806403162310037100006010000100101004110041100411004110041
1002410040750020949284217527661601002582110082482520010100101000010010100005211134688240496960100401004086963877020010201000020300001004012411100211091010000101000010109321350389700102382810948429471093416119906402163310037100003010000100101004110041100411004110041
10024100407500218481830170473116010025765122110612520010100101000010010100005210974688240496960100401004086963877020010201000020300001004012411100211091010000101000010108861197363698102432810934368581095214122536403162310037100000010000100101004110041100411004110041

Test 3: throughput

Count: 8

Code:

  stp w0, w1, [x6, #8]!
  stp w0, w1, [x7, #8]!
  stp w0, w1, [x8, #8]!
  stp w0, w1, [x9, #8]!
  stp w0, w1, [x10, #8]!
  stp w0, w1, [x11, #8]!
  stp w0, w1, [x12, #8]!
  stp w0, w1, [x13, #8]!
  mov x7, x6
  mov x8, x6
  mov x9, x6
  mov x10, x6
  mov x11, x6
  mov x12, x6
  mov x13, x6

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5103

retire uop (01)cycle (02)03l1d tlb fill (05)mmu table walk data (08)09l2 tlb miss instruction (0a)l2 tlb miss data (0b)1e1f2022293a3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)6067696a6d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst branch cond (94)inst int store (96)inst int alu (97)inst ldst (9b)9fl1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)a4ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafbcl1d cache miss st nonspec (c0)l1d tlb miss nonspec (c1)c2c3cfd5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)? ldst retires (ed)? int retires (ef)f5f6f7f8fd
80209405603041100018333508311728102148405208202002223712825160717809388001480100800004021441859584135349374324036140466304303303831601002008000020024000040363761180201100991008000010080000100809450501653168758029727309157011468119929250071400511012161311404498024280000801004048640469404484039640438
80204404093031010117523938501664132140404868912010219413825160581804898000080100800004121041861072047493750440454403913032233044216010020080000200240000403707511802011009910080000100800001008093312490748478888025927819277412508115725945351320511013181014404628071080000801004043340460404024041540477
8020440472303000001860306830179210211240502794217020161142516048285977800008010080000401327186200802504937298404134044730347330401160100200800002002400004037475118020110099100800001008000010080905049194719939802742880929841245812022754754000511011161413403658217180000801004041140439404134044540477
8020440477303100001989380821174411214840611803206420451372516423980369800008010080000409727185860011016493740640441404583039333044416010020080000200240000404317511802011009910080000100800001008095013493048669128028827019155211508121129648211310511015181211404298328980000801004076540835408574094040863
80204409443071010016839098151680136152407808081925180021025160821809898002680100800004152021873768118394937740408914081730732330744160100200800002002400004096475118020110099100800001008000010080893044124646895805872510937981546813525734370000511012161313408218145480000801004082340781408164077740681
80204408823060000018578457931752137104409317952026184917425160806811048000080100800004029151874152016764937764407474093630696330760160100200800002002400004085175118020110099100800001008000010080919045325036878805362740917401575814524884575000511012171312408448462280000801004088040770408174084140821
8020440821305000001887737802172813410440783801190419811572516050281108800008010080000402963187967213344937740408124086330746330820160100200800002002400004088675118020110099100800001008000010080889041954529867804532560893361503815064854091000511010161414409128061680000801004084140807409534084941038
802044075430500010175575782116801221844080179017861969146251607448122780007801008000040336318721431244149379254083340922307453308641601002008000020024000040869751180201100991008000010080000100808970426849210930804972420937361683813985214435000511012171311408908022180000801004089240751408894083440759
802044077330500000165387878316961301444074180317071810169251607358059480000801008000040192518722561414493779240751408883081233080716010020080000200240000407887511802011009910080000100800001008091504568484139548057125008638416268143447444581200511014171214407338062380000801004081540997408814090740859
8020440812323100011788868835166410710040759796200717601702516480980701800008010080000419900187396012491493766540824409963069033076316010020080000200240000407787511802011009910080000100800001008092915451648158798057827609052616798140252846931400511013161210409038085680000801004085140850409264086940893

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5103

retire uop (01)cycle (02)03l1d tlb fill (05)mmu table walk data (08)l2 tlb miss data (0b)1e1f2022293a3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)5f6067696a6d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst branch cond (94)inst int store (96)inst int alu (97)inst ldst (9b)9fl1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)a4ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafbcl1d cache miss st nonspec (c0)l1d tlb miss nonspec (c1)c2cfd0d5map dispatch bubble (d6)dbddfetch restart (de)e0? int output thing (e9)? ldst retires (ed)? int retires (ef)f5f6f7f8fd
8002940725306200198371178217201061004085579615101772181251626848247880166800108000040164118811640174493781740839408213077833086216001020800002024000040899881180021109108000010800001080827274601503589780493297294032175181411588400627050200317047408818078280000800104084140814407824088540852
800244086830520218368117811672115128408407921639195713546160664805958002380010801084023791876844013574938184408644090730820330773160010208000020240360408888811800211091080000108000010810282543225121086980594260089840165081472542427921250200717056406978797480000800104080140818408554077840758
800244086930520019328318061704123132408467811665206116925168370805838000080010800004020591881164012199493776640781408003072933082416001020800002024000040807871180021109108000010800001080931215105504585380534298089446162781354584404527050200217179408288063680000800104073440808408574090040843
80024410043072201632889796172011410040906816178618041922516040380265800008001080000406969188032400352493769540838408133081233092016001020800002024000040836871180021109108000010800001080896294494494186180564272085744164881415575403427050200518035408478718880000800104087040851407704085640828
80024409413062001833837765172011610040764758188118491682516043880520800008001080000405122187797200141493768840749408223073233086616001020800002024000040785881180021109108000010800001080938274591513187480509281090628161681345487441628450200517056407908038780000800104080840855407474088140899
8002440809305200169294784816401141004078277415271714200251602158232280000800108000040058518758600033374937744408484092830668330899160010208000020240000409248811800211091080000108000010809152747454274898805822602868110168981450552483827050200717075408698061880000800104094140815408464093640757
8002440885307200195079681916648620040863822155716441922516037182438801678001080000401830187446800337493827640899408433076033077416001020800002024000040917881180021109108000010800001080944274145430587280471251489332178981432630403922050200316053407948047080000800104088940890407084090140816
80024407873062001797892845169693100407047651586169715625160650815348000080010800004066951878621111754937776407194075330767330805160010208000020240000409209611800211091080000108000010809522741284751699880548288086744167681384555458428050200417057407778054180000800104087840812408804086540727
8002440757306220174987583115281189640804762167818841382516043080630800008001080000402137187950800369493783540844409023071033083716001020800002024000040910841180021109108000010800001080926284805478590680519292089034173881413492446127250200416064409068045680000800104090740775409244079940862
8002440763305222184579981116569613640739748174117071732516054687382800298001080000401008188042301975493774340761409123083633083516001020800002024000040808881180021109108000010800001080940243784479688380551264085046177881470484405228050200716055408078050380000800104080440850409134084940770