Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

STP (post-index, 64-bit)

Test 1: uops

Code:

  stp x0, x1, [x6], #8

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 2.000

Integer unit issues: 1.000

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)03l1d tlb fill (05)mmu table walk data (08)09l2 tlb miss data (0b)1e1f20223a3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)606d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst int store (96)inst ldst (9b)l1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafldst x64 uop (b1)bcl1d cache miss st nonspec (c0)l1d tlb miss nonspec (c1)c2cfd5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)? ldst retires (ed)? int retires (ef)f5f6f7f8fd
1005104081111010141202810250075252000100010001000100050738458241104010408243898200010003000104012411100110001000102286031210081112871012125752717341622103710001000100010411041104110411092
1004104080000068181210250361252000100010001000100050778458241104010408243898200010003000104012411100110001000100006121610001024001000125077007321622103710001000100010411041104110411041
10041040801000418013010259162252000100010001000100050778458241104010408243898200010003000104012411100110001000100006801410000020061020125035007321622103710001000100010411041104110411041
10041040711110111411112102518155252000100010001000100050738458241104010408243898200010003000104012411100110001000102784411410070114671010125744717321622103710001000100010411041104110411041
100410408111101122115010258211425200010001000100010005074645824110401040824389820001000300010401241110011000100010238430910071118471010125744717321622103710001000100010411041104110411041
100410408101001091144102592632520001000100010001000507624582411040104082438982000100030001040124111001100010001008743011007121212101016125743727321622103710001000100010411041104110411041
100410408110101014116010258394252000100010001000100050762458241104010408243898200010003000104012411100110001000100887000100711100101000125752707321622103710001000100010411041104110411041
100410408101001000901025100432520001000100010001000507464582411040104082438982000100030001040124111001100010001018852315100711156111012125744717321622103710001000100010411041104110411041
10041040711110918110010256255252000100010001000100050754458241104010408243898200010003000104012411100110001000100873502100700221671000125744707321622103710001000100010411041104110411041
1004104081110011141140102510295252000100010001000100050746458241104010408243898200010003000104012411100110001000102384311410080114471016125751717321622103710001000100010411041104110411041

Test 2: Latency 3->3

Code:

  stp x0, x1, [x6], #8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.0040

retire uop (01)cycle (02)03l1d tlb fill (05)mmu table walk data (08)18191e1f2022293a3c3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)6067696a6d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst branch cond (94)inst int store (96)inst int alu (97)inst ldst (9b)9fl1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)a4ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafldst x64 uop (b1)ldst xpg uop (b2)bcl1d cache miss st nonspec (c0)l1d tlb miss nonspec (c1)c2cfd5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)eaeb? ldst retires (ed)? int retires (ef)f5f6f7f8fd
1020910040750000212176799173674092100257807816540252010010100100001010010000522019468824004969601004010040867438747201002001000020030000100401221110201100991001000010010000100109280143436406691025128909024087710908125041512890071011711100371000012010000101001004110041100411004110041
10204100407500002259798211744740120100257857415243252010010100100001010010000522157468824004969601004010040867438747201002001000020030000100401221110201100991001000010010000100109460147842816721024327709164085710917125042312080071011711100371000068010000101001004110041100411004110041
1020410040750000226573800173670010010025828881415325201001010010000101001000052207546882400496960100401004086743874720100200100002003000010040122111020110099100100001001000010010870013133850654102572710900329721090912504211228007101171110037100009010000101001004110041100411004110041
1020410040750000222071807176082015210025805811686525201001010010000101001000052206946882400496960103441004087081188492071820610232205306961019612251102011009910010000100100001001100601393351068310343288088864293211021126342012120077514811100761012752010000101001023110249101931027210245
10204102467601042616332799168890088102328349512422286203311024210102104191027952091947576100497165102451027287412187472071120610312204309421019212241102011009910010000100100001001096901504339063110329270089438363011031126241712890077824912102451013672110000101001019110296102311024310197
1020410273770044243051081217767701241023378975164232107203981024310000104321030752010847316900497165102451024687591688772071120010339208309541019512251102011009910010000100100001001096801403394065310360267088646452710939126441613170079315011102421012813110000101001024610249102481024410303
1020410247770045230743279817128401041023279291114222135203381023510102103541027951942347330601497280102411027186742589272075020210234208307021024612261102011009910010000100100001001092301392365067510311286088438204410946125941812130076014321101571005181110000101001019610245100411024410143
10204100407700312616337785176879014010074775991469085202851010010076102731021052005846990302497063100401009187071287802010020410159202304681014212211102011009910010000100100001001098591297380167910298297090242228110898125542114780772713321100371007097010000101001004110092101421014310146
1020410143760020220574807170494096100257898012940252010010100100001010010000522187468824104969601004010040867438747201002001000020030000100401221110201100991001000010010000100108960139539606851025326809063886610936125041512360071011711100371000063010000101001004110041100411004110041
10204100407500002121788141736820144100257637412843252010010100100001010010000522147468824004969601004010040867438747201002001000020230000100401221110201100991001000010010000100109360133339706551026326908724486810924125042012960071011711100371000083010000101001004110041100411004110041

1000 unrolls and 10 iterations

Result (median cycles for code): 1.0040

retire uop (01)cycle (02)03l1d tlb fill (05)mmu table walk data (08)l2 tlb miss data (0b)1e1f2022293a3c3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)696a6d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst branch cond (94)inst int store (96)inst int alu (97)inst ldst (9b)9fl1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)a4ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafldst x64 uop (b1)ldst xpg uop (b2)bcl1d cache miss st nonspec (c0)l1d tlb miss nonspec (c1)c2cfd5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)eaeb? ldst retires (ed)? int retires (ef)f5f6f7f8fd
1002910040752201986908501688764196100257921131654725200101001010000100101000052096946882449696010040100408696387702001020100002030000100401241110021109101000010100001010944241352367063710283253086434742109351250441127314364021633100371000018010000100101004110041100411004110041
1002410040753002232707761712790116100257781082004925200101001010000100101000052104146882449696010040100408696387702001020100002030000100401241110021109101000010100001010911211280366062910264269087444811109101250441112021064021633100371000093010000100101004110041100411004110041
1002410040753302070897941704812120100257931241654625200101001010000100101000052097746882449696010040100408696387702001020100002030000100401241110021109101000010100001010953211197386066410284255388448841109191250440127821064031622100371000090110000100101004110041100411004110041
1002410040753331767847871696721156100257771111644825200101001010000100101000052103346882449696010040100408696387702001020100002030000100401241110021109101000010100001010916211368324066610265249089036805108751250447126721064031622100371000011010000100101004110041100411004110041
100241004076303199290802173696111610025791135198392520010100101000010010100005210494688244969601004010040869638770200102010000203000010040124111002110910100001010000101089921130335606211027126308344076510896125045212852166403162210037100005010000100101004110041100411004110041
1002410040753301881957611712681152100258111111814425200101001010000100101000052104946882449696010040100408696387702001020100002030000100401241110021109101000010100001010899211333345063710276244089432747108681250443122521064021622100371000010010000100101004110041100411004110041
100241004075333226284810168870011610025804102180492520010100101000010010100005210254688244969601004010040869638770200102010000203000010040124111002110910100001010000101088814125833006691027124508983268910903125042912687064031632100371000066010000100101004110041100411004110041
1002410040752001944828211720590160100257691151704225200101001010000100101000052100946882449696010040100408696387702001020100002030000100401241110021109101000010100001010945141357377068710279268088082730109411250432115514464031632100371000063010000100101004110041100411004110041
1002410040752022055848291776861144100257741301864325200101001010000100101000052104946882449696010040100408696387702001020100002030000100401241110021109101000010100001010889271203358066810277222089042811109231250441124121064021633100371000057010000100101004110041100411004110041
100241004075330210982774171286215610025769112194402520010100101000010010100005210414688244969601004010040869638790200102010000203000010040124111002110910100001010000101088828128037106601027425709164276210913125044710917864021633100371000064110000100101004110041100411004110041

Test 3: throughput

Count: 8

Code:

  stp x0, x1, [x6], #8
  stp x0, x1, [x7], #8
  stp x0, x1, [x8], #8
  stp x0, x1, [x9], #8
  stp x0, x1, [x10], #8
  stp x0, x1, [x11], #8
  stp x0, x1, [x12], #8
  stp x0, x1, [x13], #8
  mov x7, x6
  mov x8, x6
  mov x9, x6
  mov x10, x6
  mov x11, x6
  mov x12, x6
  mov x13, x6

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5238

retire uop (01)cycle (02)03l1d tlb fill (05)mmu table walk data (08)l2 tlb miss data (0b)181e1f2022293a3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)67696a6b6d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst branch cond (94)inst int store (96)inst int alu (97)inst ldst (9b)9fl1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)a4ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafldst x64 uop (b1)ldst xpg uop (b2)bcl1d cache miss st nonspec (c0)l1d tlb miss nonspec (c1)c2c3cfd5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)? ldst retires (ed)? int retires (ef)f5f6f7f8fd
802094180331221201704349787172014512041713810174229241447251616148336780011801008000040293919162246054939071041635417203152033167616010020080000200240000417347511802011009910080000100800001008099615554747891887380310238090180111481195100003232059252700511011711416548083380000801004213241759416724169841684
802044173331211101800706795176011713641676788213333641387251611378087680015801008000040231519184325344938630041778416873158833164116032020080000200240000416837511802011009910080000100800001008095213560246290590980607270090738106381203100003233457621300511011611416878284480000801004165841754416484165341672
802044164531210001692344778170411714041650778194928841392251608458126680007801008000040148419200646254938580041690416843158933158416010020080136200240000416388211802011009910080000100800001008094313514949090888380348267086644110381221100003238044561400511011711419238075880000801004189841953419054188641938
802044188839610101728790836173612423241897729205025931373251608148193480015801008000040274019317526214938759041878418623180233182616010020080000200240000418807511802011009910080000100800001008096115442451785289880588224083544163081486100003257545221300511011711419318202180000801004193041961418714186541829
802044191831410101896874802173612226041994784191827321279251608698083080000801008000040326719300962584938884042119419533178633193016010020080000200240000420917511802011009910080000100800001008091113505948985786780637240088938157581453100003264050041300511011711418118051080000801004177141878418944196141894
802044189231410001653820767168014110841924754176924041320251607738104380000801008043241183319298082334938851041899419793190733187916010020080000200240000419358211802011009910080000100800001008093715582046888188080619240186342156081470100003260854881400511014811418748290880000801004188241877419194198341957
802044183131411001605813791168013310841856776197323951362251627948084580000801008000040786319320161614938798042004419293180033187816010020080000200240000419538111802011009910080000100800001008096617520843987187180590264188644158881517100003260651781310516511611418458063880000801004193941839419544200541939
80204417923131100227186778216881511484192378420842106126825160605833948000080100800004089461930840426493881504187841974317993319061601002008000020024000041879751180201100991008000010080000100809511451724278549018061423908834416878150810000326064942141256511011711418938090580000801004196041904419634239941991
802044190231411101725757815168812324841921771179723161380251644648053880000801008000040345419352121994938994042034421483185133184016010020080000200240000419208111802011009910080000100800001008095713531145485690280662250092180156681552100003265147281410511011711418958223680000801004194041977419214182542351
802044192331311001578857782168813711642401754174324161271251610558200880000801008000040372119269284664938786041867418273209633182016010020080000200240000418257551802011009910080000100800001008094913441645987087780616251187244162381459100003255247041300511011711419928069680000801004187042049419374191341928

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5246

retire uop (01)cycle (02)03l1d tlb fill (05)mmu table walk data (08)09l2 tlb miss data (0b)1e1f2022293a3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)6067696a6b6d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst branch cond (94)inst int store (96)inst int alu (97)inst ldst (9b)9fl1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)a4ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafldst x64 uop (b1)ldst xpg uop (b2)bcl1d cache miss st nonspec (c0)l1d tlb miss nonspec (c1)c2branch cond mispred nonspec (c5)branch mispred nonspec (cb)cfd2l1i cache miss demand (d3)d5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)ec? ldst retires (ed)? int retires (ef)f5f6f7f8fd
8002941861314110017523838651688131156417887521951296615952516101080088800008001080000404091192932811554938738041915418513177133181716001020800002024000041866851180021109108000010800001080996155399474956868803412650923421043812331000032329622312100502000117114184280143080000800104188641866417944179741875
800244182631310101854426783168813522441768776178728771587251603478133580000800108000040262219253201804938766041777419493175833179216001020800002024000041864751180021109108000010800001080973135793467996938803392522877321186812481000032315585214000503800117114175980651080000800104174541777417904176141765
800244172531311111845363825274414618441745782180929411574251608638147480000800108000040036019231600322493871104181941727316843317201600102080000202400004178475118002110910800001080000108092075278438971892803352661899481024812081000032329530414000502000117114186481276080000800104175941841417144178441856
8002441889313110016413397981720137232418158092016290215772516067580784800008001080000402287192085604124938599041815419053176833186816001020800002024000041841751180021109108000010800001080989155850522983947803182910944301127811811000032308536813100502000117114174280591080000800104171441857418614182741790
80024417173131000182136479716801131444192676222272548154725160740803468000080010800004006411921912013749386900417994185631794331832160010208000020240000418117511800211091080000108000010809640588049198685680279243087980110081220100003231353440000502000117114183180575080000800104187241782418444173941896
80024418163140000176432680117121031124182179220952805153825160861806328000080010800004039051921240035449387770417844177931743331822160010208000020240000418037611800211091080000108000010809360588244498585380322255086358113081217100003232454730000502000117114178180943080000800104181541887418614181141872
80024417963130000170733978517281379241728798188728081520251602248026880000800108000040235219237840385493875404186841874318433317521600102080000202400004186285518002110910800001080000108098105782478100885780278249089342109081171100003231058850000502000117114178380926080000800104178141827417884172541711
80024418783120000169537682316881142364179074617093063155225160299810748000080010800004010131926136017949387540418764181931724331822160010208000020240000419507611800211091080000108000010809240518546598193580335238090062110681172100003229252590000502000117114181480151080000800104188141753419404192141817
80024417883130000168636282616801251444171982019102908164325160671803858000080010800004028671926424010149386090417974183631772331792160010208000020240000418477511800211091080000108000010809380552247398590680296264088374115081178100003232957970000502000117114181680448080000800104177641878419814180641828
80024417533140000170433081017281261564187379519942869163925160828806948000080010800004018351923328031849388130418274187931651331699160010208000020240000418387511800211091080000108000010809700599250595688080297217091134103981228100003231657030000502000118114182780797080000800104189041816419254177041746