Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

STR (post-index, 32-bit)

Test 1: uops

Code:

  str w0, [x6], #8

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 2.000

Integer unit issues: 1.000

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)03l1d tlb fill (05)mmu table walk data (08)09l2 tlb miss data (0b)1e1f20223a3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)6d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst int store (96)inst ldst (9b)l1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafbcl1d cache miss st nonspec (c0)l1d tlb miss nonspec (c1)c2cfd5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)? ldst retires (ed)? int retires (ef)f5f6f7f8fd
10051040811116141201201025242372520001000100010001000507544582410401040824389820001000200010401241110011000100010317567211008003422111000756717351633103710001000100010411041104110411041
10041040810100132211001025001642520001000100010001000507544582410401040824389820001000200010401241110011000100010387486231008001014101028764727331633103710001000100010411041104110411041
10041040810116111601316102501116252000100010001000100050754458241040104082438982000100020001040124111001100010001037997730100810246101010772717331633103710001000100010411041104110411041
100410408110161318110102526166252000100010001000100050754458241040104082438982000100020001040124111001100010001008972034100712332071016764727331633103710001000100010411041104110411041
100410408110101214211410251811262520001000100010001000507544582410401040824389820001000200010401241110011000100010369723151008223016131024872717331633103710001000100010411041104110411041
100410408111161312081210252620225200010001000100010005075445824104010408243898200010002000104012411100110001000103988901510222200101028752707331633103710001000100010411041104110411041
1004104071111012120101025181042520001000100010001000507544582410401040824389820001000200010401241110011000100010378727301018001812101024852717331633103710001000100010411041104110411041
1004104081111612222512102514324252000100010001000100050762458241040104082438982000100020001040124111001100010001022773622100720341871031852707331633103710001000100010411041104110411041
10041040810011213005010252004425200010001000100010005074645824104010408243898200010002000104012411100110001000104181129271007102616131000788707331633103710001000100010411041104110411041
1004104071110012241140102522905252000100010001000100050746458241040104082438982000100020001040124111001100010001027972532100911120101013852717331633103710001000100010411041104110411041

Test 2: Latency 2->2

Code:

  str w0, [x6], #8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.0040

retire uop (01)cycle (02)03l1d tlb fill (05)mmu table walk data (08)l2 tlb miss data (0b)18191e1f202223293a3c3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)6067696a6d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst branch cond (94)inst int store (96)inst int alu (97)inst ldst (9b)9fl1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)a4ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafbcl1d cache miss st nonspec (c0)l1d tlb miss nonspec (c1)c2cfd5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)eaeb? ldst retires (ed)? int retires (ef)f5f6f7f8fd
1020910040783440023551208121081693012010025784119819325201001010010000101001000052195746882400496960100401004186743874720100200100002002000010040122111020110099100100001001000010010978351503367067710290308092242980109484513033507101171110037100000010000101001004110041100411004110041
1020410040775000021241258271080869011210025794104905625201001010010000101001000052203946882410496960100401004086743874720100200100002002000010040122111020110099100100001001000010010944281506359067810289322495638882109573812852807291252110077100321010000101001009210092100921014510092
10204100907940011246620787310776860156100777951429859452015810130100251019810070521296469939004970121019610091869378781202682041007820220156100921223110201100991001000010010000100109782115384230655102982910948201609109492513401407272343110078100451010000101001004110041100911004110041
10204100408120000239492823107688419610025811115784625201001010010000101001000052213946882400496960100401004086743874720100200100002002000010040122111020110099100100001001000010010924161443365069710272309293434933109502712601427101171110037100003010000101001004110041100411004110041
10204100407720000258611185210752740124100258071061034125201001010010000101001000052209546882610496960100401004086743874720100200100002002000010040122111020110099100100001001000010010960141442370067510268303092836881109702613531407101171110037100000010000101001004110041100411004110041
102041004078200002373938641077690210010025834114924925201001010010000101001000052206146882410496960100401004086743874720100200100002002000010040122111020110099100100001001000010010912141500369068410278300493842932109382513301427101171110037100001010000101001004110041100411004110041
102041004078220002277928361080081116410025819921035025201001010010000101001000052206346882400496960100401004086743874720100200100002002000010040122111020110099100100001001000010010942141484363069510277299093442979109522213631407101171110037100002010000101001004110041100411004110041
1020410040782200024668984310760760116100257961171245225201001010010000101001000052201746882400496960100401004086743874720100200100002002000010040122111020110099100100001001000010010952141528399066210282294093032985109542712751407101171110037100001010000101001004110041100411004110041
102041004075222002328958331078482011610025821105965525201001010010000101001000052206346882410496960100401004086743874720100200100002002000010040122111020110099100100001001000010010954161488417070410267324093856894109573013451427101171110037100003010000101001004110041100411004110041
1020410040752200021878485310744810116100258121151274725201001010010000101001000052209946882400496960100401004086743874720100200100002002000010040122111020110099100100001001000010010956121429407066310274293090234897109322513511427101171110037100002010000101001004110041100411004110041

1000 unrolls and 10 iterations

Result (median cycles for code): 1.0040

retire uop (01)cycle (02)03l1d tlb fill (05)mmu table walk data (08)09l2 tlb miss data (0b)1e1f2022293a3c3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)60696a6d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst branch cond (94)inst int store (96)inst int alu (97)inst ldst (9b)9fl1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)a4ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafbcl1d cache miss st nonspec (c0)l1d tlb miss nonspec (c1)c2cfd5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)ea? ldst retires (ed)? int retires (ef)f5f6f7f8fd
100291004075333321338979817607301641002577897973425200101001010000100101000052092346882404969601004010040869638770200102010000202000010040124111002110910100001010000101092721127935026661028725508583286110900321066210640316331003710000010000100101004110041100411004110041
1002410040762000213910083613521040961002580063784425200101001010000100101000052102546882404969601004010040869638770200102010000202000010040124111002110910100001010000101088423148038706491026925408824088210896231105140640416331003710000110000100101004110041100411004110041
1002410040752000211892777176082216810025785327033252001010010100001001010000521089468824049696010040100408696387702001020100002020000100401241110021109101000010100001010910712143330666102642640912327481094614104572640416331003710000110000100101004110041100411004110041
1002410040751100215471795174473016410025786395438252001010010100001001010000521113468824049696010040100408696387702001020100002020000100401242110021109101000010100001010929813103990670102632640916388331087720113870640416441003710000010000100101004110041100411004110041
1002410040751100234690817175287011610025782908516252001010010100001001010000521121468824049696010040100408696387702001020100002020000100401241110021109101000010100001010925814053750652102732620902368021093218108571640416431003710000110000100101004110041100411004110041
1002410040751100217589818175276111210025797726538252001010010100001001010000521073468824049696010040100408696387702001020100002020000100401241110021109101000010100001010921713673840704102832720906467771092017124770640316441003710000110000100101004110041100411004110041
10024100407510012187858041696770164100257761025852252001010010100001001010000521057468824049696010040100408696387702001020100002020000100401241110021109101000010100001010873813094010650102422510854367671091216114670640416441003710000010000100101004110041100411004110041
10024100407510011932917871744770116100257927653432520010100101000010010100005211054688241496960100401004086963877020010201000020200001004012411100211091010000101000010108777133837806781026627808843273810932151105726404164410037100002510000100101004110041100411004110041
1002410040751000198682820176079010810025790716341252001010010100001001010000521073468824049696010040100408696387702001020100002020000100401241110021109101000010100001010903712994090678102652830902328991093215997716404164410037100001210000100101004110041100411004110041
1002410040751010201978827173676011210025769486135252001010010100001001010000521097468824049696010040100408696387702001020100002020000100401241110021109101000010100001010905812213570676102582700862367621087620115371640416331003710000010000100101004110041100411004110041

Test 3: throughput

Count: 8

Code:

  str w0, [x6], #8
  str w0, [x7], #8
  str w0, [x8], #8
  str w0, [x9], #8
  str w0, [x10], #8
  str w0, [x11], #8
  str w0, [x12], #8
  str w0, [x13], #8
  mov x7, x6
  mov x8, x6
  mov x9, x6
  mov x10, x6
  mov x11, x6
  mov x12, x6
  mov x13, x6

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5056

retire uop (01)cycle (02)03l1d tlb fill (05)mmu table walk data (08)l2 tlb miss data (0b)18191e1f2022293a3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)6067696a6d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst branch cond (94)inst int store (96)inst int alu (97)inst ldst (9b)9fl1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)a4ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafbcl1d cache miss st nonspec (c0)l1d tlb miss nonspec (c1)c2branch mispred nonspec (cb)cfd5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)? ldst retires (ed)? int retires (ef)f5f6f7f8fd
80209405773024040020373908151712109144404377821900188710625160756819918000680100800004178021859191019714937373404594046530343330392160100200800002001600004043475118020110099100800001008000010080991414353537138828030130609194413598114927744554100511013161111405148094780000801004045740456404884042740468
8020440424303200001746386820172010714440451799173016041052516068380672800058010080000401313185884403844937369404274044330323330486160100200800002001602404049785118020110099100800001008000010080926354446481198818026928608967411208113625042382200511012161111404238071480000801004045540534403944046540451
8020440444303200002067385822174410410040457787205819381152516072082755800008010080000403414185920403294937411404574051630391330391160100200800002001600004046491118020110099100800001008000010080952414423521169528026330709245212018118225445962600511011161111404558216080000801004051340422404334044440483
8020440490303330002040419841180011414440478807200719661222516051787634800008010080000402536185960002683493732140526404183030933051116010020080000200160000404467511802011009910080000100800001008090026421351588908028026039027012558118527440844100511013161012405028230480000801004045940417403794058140378
802044039530222200190535879917281131564042779818352196104251615208030180001801008000041077418585760362493730040338403873039333036116010020080000200160000404397511802011009910080000100800001008093113453051398958026928029172811238117226745612800511010171312404458096380000801004047440370404004044240443
80204404243021100018513498091712109964035680217971918108251608568079380003801008000041041118605200151493737740474403903037033035716010020080000200160000403888111802011009910080000100800001008093015421950599098027826809284412638117928438192700511013161312404388443780000801004045340418404414046240365
8020440442303110002001382834170412992404297961900200212225160923809478000180100800004187251859472015534937312404254037430438330347160100200800002001600004046885118020110099100800001008000010080930254279481228638024927909004211238118527242851300511011161013404308037580000801004043440438404544053040370
802044039030311000195340280717121121284042283118172093128251604508356480000801008000040397818579320273493738740409404363031133039316010020080000200160000404667511802011009910080000100800001008091428448648369318027028009208213088114327740453860511011161013404448455580000801004044040539404844041040455
802044044730320000183637079117681021124050479521691927113251648978051380000801008000040263218632620333493736540461404763025933039116010020080000200160000404727511802011009910080000100800001008091528409048349038043225989544820288113826743524200511011171213404878535680000801004046940439404874046940446
80205404983033000020943568211720123132405057931782208411625161239806678000080100800004111131860288055349373494049340478303103303871601002008000020016000040453921180201100991008000010080000100812803143824921658818062828209074842988151924738662760511010171113403958076180000801004044040475404724048740447

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5050

retire uop (01)cycle (02)03l1d tlb fill (05)mmu table walk data (08)09l2 tlb miss data (0b)181e1f2022293a3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)5f6067696a6b6d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst branch cond (94)inst int store (96)inst int alu (97)inst ldst (9b)9fl1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)a4ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafbcl1d cache miss st nonspec (c0)l1d tlb miss nonspec (c1)c2branch cond mispred nonspec (c5)cfd0d2d5map dispatch bubble (d6)dadbddfetch restart (de)e0? int output thing (e9)ea? ldst retires (ed)? int retires (ef)f5f6f7f8fd
8002940637302000001947311851243210325640393806171018561302516718182272800008001080000420001185363200180493724004032940489302663302931600102080000201600004039276118002110910800001080000108087904049526688380242286088564122281175225448100050200041604424039680371080000800104039140497403784041440339
8002440425302000001806279818251210627640390788190719141572516028980572800008001080000401208185483200232493735004045140480303163303951600102080000201600004038975118002110910800001080000108091804699513590180218296093798112581128215385900050200031604534040280312080000800104030940330404104043340427
800244034730200000184530484225529224040381773180319861462516038380436800008001080000413089186244000136493737704043640442304643303681600102080000201600004040475118002110910800001080000108085704481506487480256264086172115681115234427700050200031604444038087742080000800104041540379404094035940334
8002440502302111101863356784250498240404208151990169712825162133806508000280010800004098811856944001375493727304041540403303403304521600102080000201600004040176118002110910800001080000108087413405955368678022931718741481117810862384599130050200031604324041680387080000800104041440396405504041640296
80024404003021011020673058032432952564051278517961755107251603378669780000800108000041475618500320025349373510403474033230343330347160010208000020160000403097511800211091080000108000010809381346945418880802542770906521158811762284348130050200031704334037180570080000800104040740431403774037440443
8002440482303100001959275839252812720040346769194218411052516054580286800008001080000401487185660800143493730604044140373303823303431600102080000201600004035475118002110910800001080000108084714421452118558027428328911561122811242484369130050200031604134036180564080000800104047040393403974037340416
800244036630320010193835079024961082444049579017851647123251606258033280000800108000040596118556480075349373610403834043130327330500160010208000020160000403327611800211091080000108000010809171248375596853802292830880801099811332554768140050200021714234034280583080000800104037440407403894037940418
8002440373302101001881279818248812536040376774187719011422516078984021800008001080000400561185586400196493726004037140387303273303261600102080000201602404034175118002110910800001080000108086715433150411864802392560912661137811102614234131050200031804274030882539080000800104037640367403714040940459
80024404113031001018182968662440106228403077881744191613825160183807668000080010800004158691855768001957493727504044140342303383303001602522080000201600004049576118002110910800001080000108089414400254758698022526709511081121811002484283132050200021804354039984716080000800104032340383404774039740320
800244038030310000179129779425041042164037377919671957142251605318051980007800108000041435918558160020549373210404734056030405330454160010208000020160000404927611800211091080000108000010808861541305154870802472830881921088811262475046140050200031804524032680643080000800104040940384404754038540426