Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

STR (pre-index, S)

Test 1: uops

Code:

  str s0, [x6, #0x10]!

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 2.000

Integer unit issues: 1.000

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)03l1d tlb fill (05)mmu table walk data (08)09l2 tlb miss data (0b)1e1f2022293a3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)60696d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst simd store (99)inst ldst (9b)l1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafbcl1d cache miss st nonspec (c0)l1d tlb miss nonspec (c1)c2cfd5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)ea? ldst retires (ed)? int retires (ef)f5f6f7f8fd
1005104081111613321081210250408252000100010001000100050754458241101510401040824389820001000200010401040111001100010001064952034100902120710007517173116111037100001000100010411041104110411041
100410408123131246081601025340062520001000100010001000507624582401015104010408243898200010002000104010401110011000100010099310010071200710007317273116111037100001000100010411041104110411041
1004104081011012421010441025490072520001000100010001000507544582411015104010408243898200010002000104010401110011000100010088272101010015214710227357173116111037100001000100010411041104110411041
1004104081110181012001241025240062520001000100010001000507544582411015104010408243898200010002000104010401110011000100010648432191007013016710087477173116111037100001000100010411041104110411041
100410408123191235009410251201525200010001000100010005075445824010151040104082438982000100020001040104011100110001000100874301100701014710327277073116111037100001000100010411041104110411041
100410408101178112218801025013725200010001000100010005075445824110151040104082438982000100020001040104011100110001000104074774210130000710327317273116111037100001000100010411041104110411041
1004104081101961256002010250007252000100010001000100050754458241101510401040824389820001000200010401040111001100010001008831028101701014710227357173116111037100001000100010411041104110411041
10041040811011212400017121025350152520001000100010001000507544582401015104010408243898200010002000104010401110011000100010647394431013003601010327517173116111037100001000100010411041104110411041
100410408110227104900236102540005252000100010001000100050762458240101510401040824389820001000200010401040111001100010001052243511010090200710007517273116111037100001000100010411041104110411041
10041040810011811301054010251012725200010001000100010005075445824110151040104082438982000100020001040104011100110001000103317515241007013624710367397173116111037100001000100010411041104110411041

Test 2: Latency 3->3

Code:

  str s0, [x6, #0x10]!

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.0040

retire uop (01)cycle (02)03l1d tlb fill (05)mmu table walk data (08)l2 tlb miss data (0b)1e1f2022293a3c3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)6061696d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst branch cond (94)inst int alu (97)inst simd store (99)inst ldst (9b)9fl1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)a4ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafbcl1d cache miss st nonspec (c0)l1d tlb miss nonspec (c1)c2branch cond mispred nonspec (c5)branch mispred nonspec (cb)cdcfd0d2d5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)eaebec? ldst retires (ed)? int retires (ef)f5f6f7f8fd
10214100407544410353752253117122007241002522281652083625201001010010000101061000052215346882400100171004010040868168742201062001000820020016100401004011102011009910010010000100001001249539178714320146111030149202493504550124692176700111717000160010037100002752110000101001004110041100411004110041
10204100407555010545762275117122007241002522291761894725201001010010000101061000052191746882400100171004010040868178742201062001000820020016100401004011102011009910010010000100001001248134170614430148711009150142497504612124942675404111718000160010037100003507110000101001004110041100411004110041
10204100407550010506602294114968171610025222222321837252010010100100001010610000522159468824001001710040100408681787432010620010008200200161004010040111020110099100100100001000010012493361713147101501110411535024975044901248722874710111717000160010037100003496110000101001004110041100411004110041
10204100407540010449622275117201217161002522341791953425201001010010000101061000052218346882400100171004010040868168742201062001000820020016100401004011102011009910010010000100001001247924165014710148610987153202489504593124942584200111718000160010037100002202110000101001004110041100411004110041
10204100407544010386562306115841709521002522392372083125201001010010000101061000052218346882400100171004010040868168742201062001000820020016100401004011102011009910010010000100001001248939175614680150711014152502473504671124863375004000710001171110037100002224110000101001004110041100411004110041
10204100407640010410932267117041717241002522431772194825201001010010000101001000052203946882400100171004010040867438747201002001000020020000100401004011102011009910010010000100001001251331181814450145311145151302481504528124742275800000710001171110037100002554110000101001004110041100411004110041
10204100407540410551512276116881117641002522282311772025201001010010000101001000052210346882400100171004010040867438747201002001000020020000100401004011102011009910010010000100001001248135173514170150411018155852481504494124672680505000710001171110037100003137110000101001004110041100411004110041
10204100407630310230742278116961509321002522421772222625201001010010000101001000052180346882400100171004010040867438747201002001000020020000100401004011102011009910010010000100001001249334171814740151111015150302485504469124752778300000710001171110037100002749110000101001004110041100411004110041
10204100407540410317652267116961107241002522171991774025201001010010000101001000052216546882400100171004010040867438747201002001000020020000100401004011102011009910010010000100001001248924172914400148211021155162493504679124815975506000710001171110037100002201110000101001004110041100411004110041
10204100407544010404652308117201019321002522371871823825201001010010000101001000052215146882400100171004010040867438747201002001000020020000100401004011102011009910010010000100001001248135159914960150310996153142481504538124883077200000710001171110037100002573110000101001004110041100411004110041

1000 unrolls and 10 iterations

Result (median cycles for code): 1.0040

retire uop (01)cycle (02)03l1d tlb fill (05)mmu table walk data (08)l2 tlb miss data (0b)1e1f2022293a3c3e3f404446494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)60696d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst branch cond (94)inst int alu (97)inst simd store (99)inst ldst (9b)9fl1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafbcl1d cache miss st nonspec (c0)l1d tlb miss nonspec (c1)c2cfd5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)eaeb? ldst retires (ed)? int retires (ef)f5f6f7f8fd
10034100407511110134742246117208270410025222202692233725200101001010000100101000052106546882411002210040100408696387702001020100002020000100401004011100211091010100001000010124770158515551534109771557024735046301249831700006404163310037100003361210000100101004110041100411004110041
1002410040750001012575226011496309401002522180233252392520010100101000010010100005210334688241100221004010040869638770200102010000202000010040100401110021109101010000100001012485015731555154610965151602481504570125092573800640316331003710000388310000100101004110041100411004110041
1002410040750001018271227511664309121002522330223268372520010100101000010010100005205934688241100221004010040869638770200102010000202000010040100401110021109101010000100001012469816971534155210944149102481504598124943777900640316331003710000291110000100101004110041100411004110041
1002410040751001020380227211688507201002522190259276382520010100101000010010100005210094688241100221004010040869638770200102010000202000010040100401110021109101010000100001012477816721500157010949152202457504576124913567800640316331003710000353110000100101004110041100411004110041
1002410040750001025453229111656707081002522590252235412520010100101000010010100005210894688241100221004010040869638770200102010000202000010040100401110021109101010000100001012473816781548154910948150412473504626125233274000640216321003710000260310000100101004110041100411004110041
10024100407510010299712264116643075610025222602672503625200101001010000100101000052098546882411002210040100408696387702001020100002020000100401004011100211091010100001000010124941016571533156010972150612485504707124963881400640316331003710000220110000100101004110041100411004110041
1002410040751101011666227911576319361002522360226222312520010100101000010010100005207854688241100221004010040869638770200102010000202000010040100401110021109101010000100001012481817611555155810961150112473504586125133660000640316331003710000196610000100101004110041100411004110041
10024100407600010164872268115123080410025221902752713725200101001010000100101000052089746882411002210040100408696387702001020100002020000100401004011100211091010100001000010124811016911579156810966151222473504621125103482600640216221003710000305110000100101004110041100411004110041
1002410040750001031174224611488207881002522430222294312520010100101000010010100005210894688241100221004010040869638770200102010000202000010040100401110021109101010000100001012493817201512159810956151502473504545125123569100640316331003710000220210000100101004110041100411004110041
10024100407511010116712262114964071610025222602602493925200101001010000100101000052114546882411002210040100408696387702001020100002020000100401004011100211091010100001000010124741016481548155010958149902477504493125004368100640316321003710000382310000100101004110041100411004110041

Test 3: throughput

Count: 8

Code:

  str s0, [x6, #0x10]!
  str s0, [x7, #0x10]!
  str s0, [x8, #0x10]!
  str s0, [x9, #0x10]!
  str s0, [x10, #0x10]!
  str s0, [x11, #0x10]!
  str s0, [x12, #0x10]!
  str s0, [x13, #0x10]!
  mov x7, x6
  mov x8, x6
  mov x9, x6
  mov x10, x6
  mov x11, x6
  mov x12, x6
  mov x13, x6

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5017

retire uop (01)cycle (02)03l1d tlb fill (05)mmu table walk data (08)09l2 tlb miss data (0b)18191e1f20222324293a3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)6067696d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst branch cond (94)inst int alu (97)inst simd store (99)inst ldst (9b)9fl1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)a4ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafbcl1d cache miss st nonspec (c0)l1d tlb miss nonspec (c1)c2c3cfd5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)? ldst retires (ed)? int retires (ef)f5f6f7f8fd
80214401673001000001003832229510013285256401182265427379472516010280102800008010080000400535184364812400694010240146300833300791601002008000020016000040119401361180201100991001008000080000100824760167424639248380034153802474986461482520341136010511011611401578000280000801004013840134401544014840208
80204401013011100001005043226510015283232400952290568578732516010280102800008010080000400535184448812401334015440188301003300871601002008000020016000040175401621180201100991001008000080000100825082144724473247680043151522492766455282510211928000511011611401178000280000801004015840119401694012440139
80204401463011010001016140230410012564232401332264473477672516010280102800008010080000400535184429612400874019340166300483301081601002008000020016000040127401171180201100991001008000080000100824892153824420248280027150202480762456882514231433000511011611401118000280000801004016940141401614012240124
8020440121301110000997287228310014808196401372282487379442516010280102800008010080000400535184441612401384013740151300213301281601002008000020016000040184401261180201100991001008000080000100824805210124462246480061153802496754449882519361849010511011611401078000280000801004015340110400944014640108
8020440190301100100992741228910014725196401442272404352252516010280102800008010080000400535184379212401124019340170300673300711601002008000020016000040150400921180201100991001008000080000100824722139724557246480035149402480766463482510191479000511011611401448000280000801004013540166401544016640113
80204400963001001001012825231510014725184400892261508575362516010280102800008010080000400535184458412401354014440146300263300941601002008000020016000040119401241180201100991001008000080000100824892176724578247080024150312480766456782510341612010511011611401158000280000801004012740140401484014340102
80204401483011000001000240231910014803232401432299221598242516010280102800008010080000400535184403212400934017540126301153300951601002008000020016000040133401511180201100991001008000080000100824694165024518247780040151202480766454682515411446010511011611401718000280000801004013040157401394013940165
802044013730111010010011512288100147241964012222755004595725160102801028000080100800004005351845136124009740126401623002933009016010020080000200160000401404011311802011009910010080000800001008248081689243113247280030151902484762461382506201654000511011611401668000280000801004013340118401314014640128
8020440116300100000983133229810014646248401452270423464492516010280102800008010080000400535184376812401074016040187300403300551601002008000020016000040116401561180201100991001008000080000100824642143524276249580036152112476766455582506241162020511011611401008000280000801004014940150401244015340113
80204401333001000001000258230210015208228401392254580537912516010280102800008010080000400535184312012401324012340200300733301151601002008000020016000040117401461180201100991001008000080000100824883161524162246480045153602496756459482499311442000511011611401228000280000801004018740115401634012840162

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5017

retire uop (01)cycle (02)03l1d tlb fill (05)mmu table walk data (08)l2 tlb miss data (0b)18191e1f2022293a3e3f4046494f51schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)606167696d6emap stall dispatch (70)map rewind (75)map stall (76)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)8283flush restart other nonspec (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst branch cond (94)inst int alu (97)inst simd store (99)inst ldst (9b)9fl1d tlb access (a0)l1d tlb miss (a1)l1d cache miss st (a2)l1d cache miss ld (a3)a4ld unit uop (a6)st unit uop (a7)l1d cache writeback (a8)a9aaabacafbcl1d cache miss st nonspec (c0)l1d tlb miss nonspec (c1)c2cdcfd0d2d5map dispatch bubble (d6)ddfetch restart (de)e0? int output thing (e9)? ldst retires (ed)? int retires (ef)f5f6f7f8fd
80034401963003330010305582333112481226440128226067552377251600128001280000800108000040008318460761024010140138401323009433011816001020800002016000040143401211180021109101080000800001082504261456243610247880054154302501986474482509471828140050200051642401158000280000800104012440116401074013340159
800244013530033000102756123441146410268401442314597649242516001280012800008001080000400083184329210240118402934028930049330089160010208000020160000401234015911800211091010800008000010825041891924608249180046153802490754467682521531067140050200031752401288000280000800104011340140401284014940129
80024401503002020010116812278114722776401022313412451242516001280012800008001080000400083184312410240076401164011030057330099160010208000020160000401124015611800211091010800008000010824991416252476124958003915140249674845948252247754140050200041644401298000280000800104009040135401304013640100
8002440148301101001007450234511512419640076230525139221251600128001280000800108000040008318424041024008340093401453004333006216001020800002016000040123401021180021109101080000800001082510161384247211247580033148402504254460082510521110140050200041742401988000280000800104016740142401024012240101
800244013230110000102124423161127242364007222903252023425160012800128000080010800004000831842141102400954011840137300273300801600102080000201600004012740112118002110910108000080000108250614865247010250280036152302492100045838253333987140050200041743401048000280000800104008940156401064015040134
800244014330022100101495823411100852044008823164054493125160012800128000080010800004000831840989102400794015340091300443300891600102080000201600004012340111118002110910108000080000108252015149824875250680049154022500254466082499351371142050200021724401468000280000800104014440127401134016040153
800244015130010000101495123391172842004010922986402341725160012800128000080010801194000831844204002402344010540074300303301291600102080000201600004009840118118002110910108000080000108251614459247612250080042149612504113846938252631814140050200041742401028000280000800104010640129401214010340110
8002440103301200001016158231011448672840126224856644436251600128001280000800108000040008318443241024011840118401113009633008316001020800002016000040127401181180021109101080000800001082510158832468152495800361531025017724668825103412611400502010321724401358000280000800104013740150401184014540105
80024401213011110099664923431149642244008222733478424625160012800128000080010800004000831842886110240133401394010930033330142160010208000020160000400854012711800211091010800008000010825071411262431112493800311556024882544526825023011131401502010621724401748000280000800104015640104400944009440105
80024404313001010010185542310114485292400862274394189182516001280012800008001080000400083184425201024011240141401283008133012716001020800002016000040121401281180021109101080000800001082508157202434102490800441548024962544679824994215551400502010941744401598000280000800104013840157401074013140131