Apple M1 Microarchitecture Research by Dougall Johnson

Firestorm: Overview | Base Instructions | SIMD and FP Instructions
Icestorm:  Overview | Base Instructions | SIMD and FP Instructions

STR (pre-index, S)

Test 1: uops

Code:

  str s0, [x6, #0x10]!

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 2.000

Integer unit issues: 1.000

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire (01)cycle (02)030508090b1e1f2022293a3e3f4046494f51inst issue (52)~issue int (53)~issue ld/st (55)~dispatch int (56)~dispatch ld/st (58)huge thing int (59)huge thing ld/st (5a)60696d6edispatch stall (70)scheduler rewind (75)scheduler stall (76)~dispatch op (78)~map op ld/st (7d)~map lookup ld/st (80)8283pipeline redirect (84)85inst all (8c)inst fp/simd store (99)inst ldst (9b)a0a1a2a3a6a7a8a9aaabacafbcdcache store miss (c0)dtlb miss (c1)c2cfd5d6ddinst fetch restart (de)e0? int output thing (e9)eald/st retires (ed)gpr retires (ef)f5f6f7f8fd
1005104081111613321081210250408252000100010001000100050754458241101510401040824389820001000200010401040111001100010001064952034100902120710007517173116111037100001000100010411041104110411041
100410408123131246081601025340062520001000100010001000507624582401015104010408243898200010002000104010401110011000100010099310010071200710007317273116111037100001000100010411041104110411041
1004104081011012421010441025490072520001000100010001000507544582411015104010408243898200010002000104010401110011000100010088272101010015214710227357173116111037100001000100010411041104110411041
1004104081110181012001241025240062520001000100010001000507544582411015104010408243898200010002000104010401110011000100010648432191007013016710087477173116111037100001000100010411041104110411041
100410408123191235009410251201525200010001000100010005075445824010151040104082438982000100020001040104011100110001000100874301100701014710327277073116111037100001000100010411041104110411041
100410408101178112218801025013725200010001000100010005075445824110151040104082438982000100020001040104011100110001000104074774210130000710327317273116111037100001000100010411041104110411041
1004104081101961256002010250007252000100010001000100050754458241101510401040824389820001000200010401040111001100010001008831028101701014710227357173116111037100001000100010411041104110411041
10041040811011212400017121025350152520001000100010001000507544582401015104010408243898200010002000104010401110011000100010647394431013003601010327517173116111037100001000100010411041104110411041
100410408110227104900236102540005252000100010001000100050762458240101510401040824389820001000200010401040111001100010001052243511010090200710007517273116111037100001000100010411041104110411041
10041040810011811301054010251012725200010001000100010005075445824110151040104082438982000100020001040104011100110001000103317515241007013624710367397173116111037100001000100010411041104110411041

Test 2: Latency 3->3

Code:

  str s0, [x6, #0x10]!

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.0040

retire (01)cycle (02)0305080b1e1f2022293a3c3e3f4046494f51inst issue (52)~issue int (53)~issue ld/st (55)~dispatch int (56)~dispatch ld/st (58)huge thing int (59)huge thing ld/st (5a)6061696d6edispatch stall (70)scheduler rewind (75)scheduler stall (76)~dispatch op (78)~map op int (7c)~map op ld/st (7d)~map lookup int (7f)~map lookup ld/st (80)8283pipeline redirect (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst b.cc (94)inst integer (97)inst fp/simd store (99)inst ldst (9b)9fa0a1a2a3a4a6a7a8a9aaabacafbcdcache store miss (c0)dtlb miss (c1)c2c5branch mispredict (cb)cdcfd0d2d5d6ddinst fetch restart (de)e0? int output thing (e9)eaebecld/st retires (ed)gpr retires (ef)f5f6f7f8fd
10214100407544410353752253117122007241002522281652083625201001010010000101061000052215346882400100171004010040868168742201062001000820020016100401004011102011009910010010000100001001249539178714320146111030149202493504550124692176700111717000160010037100002752110000101001004110041100411004110041
10204100407555010545762275117122007241002522291761894725201001010010000101061000052191746882400100171004010040868178742201062001000820020016100401004011102011009910010010000100001001248134170614430148711009150142497504612124942675404111718000160010037100003507110000101001004110041100411004110041
10204100407550010506602294114968171610025222222321837252010010100100001010610000522159468824001001710040100408681787432010620010008200200161004010040111020110099100100100001000010012493361713147101501110411535024975044901248722874710111717000160010037100003496110000101001004110041100411004110041
10204100407540010449622275117201217161002522341791953425201001010010000101061000052218346882400100171004010040868168742201062001000820020016100401004011102011009910010010000100001001247924165014710148610987153202489504593124942584200111718000160010037100002202110000101001004110041100411004110041
10204100407544010386562306115841709521002522392372083125201001010010000101061000052218346882400100171004010040868168742201062001000820020016100401004011102011009910010010000100001001248939175614680150711014152502473504671124863375004000710001171110037100002224110000101001004110041100411004110041
10204100407640010410932267117041717241002522431772194825201001010010000101001000052203946882400100171004010040867438747201002001000020020000100401004011102011009910010010000100001001251331181814450145311145151302481504528124742275800000710001171110037100002554110000101001004110041100411004110041
10204100407540410551512276116881117641002522282311772025201001010010000101001000052210346882400100171004010040867438747201002001000020020000100401004011102011009910010010000100001001248135173514170150411018155852481504494124672680505000710001171110037100003137110000101001004110041100411004110041
10204100407630310230742278116961509321002522421772222625201001010010000101001000052180346882400100171004010040867438747201002001000020020000100401004011102011009910010010000100001001249334171814740151111015150302485504469124752778300000710001171110037100002749110000101001004110041100411004110041
10204100407540410317652267116961107241002522171991774025201001010010000101001000052216546882400100171004010040867438747201002001000020020000100401004011102011009910010010000100001001248924172914400148211021155162493504679124815975506000710001171110037100002201110000101001004110041100411004110041
10204100407544010404652308117201019321002522371871823825201001010010000101001000052215146882400100171004010040867438747201002001000020020000100401004011102011009910010010000100001001248135159914960150310996153142481504538124883077200000710001171110037100002573110000101001004110041100411004110041

1000 unrolls and 10 iterations

Result (median cycles for code): 1.0040

retire (01)cycle (02)0305080b1e1f2022293a3c3e3f404446494f51inst issue (52)~issue int (53)~issue ld/st (55)~dispatch int (56)~dispatch ld/st (58)huge thing int (59)huge thing ld/st (5a)60696d6edispatch stall (70)scheduler rewind (75)scheduler stall (76)~dispatch op (78)~map op int (7c)~map op ld/st (7d)~map lookup int (7f)~map lookup ld/st (80)8283pipeline redirect (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst b.cc (94)inst integer (97)inst fp/simd store (99)inst ldst (9b)9fa0a1a2a3a6a7a8a9aaabacafbcdcache store miss (c0)dtlb miss (c1)c2cfd5d6ddinst fetch restart (de)e0? int output thing (e9)eaebld/st retires (ed)gpr retires (ef)f5f6f7f8fd
10034100407511110134742246117208270410025222202692233725200101001010000100101000052106546882411002210040100408696387702001020100002020000100401004011100211091010100001000010124770158515551534109771557024735046301249831700006404163310037100003361210000100101004110041100411004110041
1002410040750001012575226011496309401002522180233252392520010100101000010010100005210334688241100221004010040869638770200102010000202000010040100401110021109101010000100001012485015731555154610965151602481504570125092573800640316331003710000388310000100101004110041100411004110041
1002410040750001018271227511664309121002522330223268372520010100101000010010100005205934688241100221004010040869638770200102010000202000010040100401110021109101010000100001012469816971534155210944149102481504598124943777900640316331003710000291110000100101004110041100411004110041
1002410040751001020380227211688507201002522190259276382520010100101000010010100005210094688241100221004010040869638770200102010000202000010040100401110021109101010000100001012477816721500157010949152202457504576124913567800640316331003710000353110000100101004110041100411004110041
1002410040750001025453229111656707081002522590252235412520010100101000010010100005210894688241100221004010040869638770200102010000202000010040100401110021109101010000100001012473816781548154910948150412473504626125233274000640216321003710000260310000100101004110041100411004110041
10024100407510010299712264116643075610025222602672503625200101001010000100101000052098546882411002210040100408696387702001020100002020000100401004011100211091010100001000010124941016571533156010972150612485504707124963881400640316331003710000220110000100101004110041100411004110041
1002410040751101011666227911576319361002522360226222312520010100101000010010100005207854688241100221004010040869638770200102010000202000010040100401110021109101010000100001012481817611555155810961150112473504586125133660000640316331003710000196610000100101004110041100411004110041
10024100407600010164872268115123080410025221902752713725200101001010000100101000052089746882411002210040100408696387702001020100002020000100401004011100211091010100001000010124811016911579156810966151222473504621125103482600640216221003710000305110000100101004110041100411004110041
1002410040750001031174224611488207881002522430222294312520010100101000010010100005210894688241100221004010040869638770200102010000202000010040100401110021109101010000100001012493817201512159810956151502473504545125123569100640316331003710000220210000100101004110041100411004110041
10024100407511010116712262114964071610025222602602493925200101001010000100101000052114546882411002210040100408696387702001020100002020000100401004011100211091010100001000010124741016481548155010958149902477504493125004368100640316321003710000382310000100101004110041100411004110041

Test 3: throughput

Count: 8

Code:

  str s0, [x6, #0x10]!
  str s0, [x7, #0x10]!
  str s0, [x8, #0x10]!
  str s0, [x9, #0x10]!
  str s0, [x10, #0x10]!
  str s0, [x11, #0x10]!
  str s0, [x12, #0x10]!
  str s0, [x13, #0x10]!
  mov x7, x6
  mov x8, x6
  mov x9, x6
  mov x10, x6
  mov x11, x6
  mov x12, x6
  mov x13, x6

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5017

retire (01)cycle (02)030508090b18191e1f20222324293a3e3f4046494f51inst issue (52)~issue int (53)~issue ld/st (55)~dispatch int (56)~dispatch ld/st (58)huge thing int (59)huge thing ld/st (5a)6067696d6edispatch stall (70)scheduler rewind (75)scheduler stall (76)~dispatch op (78)~map op int (7c)~map op ld/st (7d)~map lookup int (7f)~map lookup ld/st (80)8283pipeline redirect (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst b.cc (94)inst integer (97)inst fp/simd store (99)inst ldst (9b)9fa0a1a2a3a4a6a7a8a9aaabacafbcdcache store miss (c0)dtlb miss (c1)c2c3cfd5d6ddinst fetch restart (de)e0? int output thing (e9)ld/st retires (ed)gpr retires (ef)f5f6f7f8fd
80214401673001000001003832229510013285256401182265427379472516010280102800008010080000400535184364812400694010240146300833300791601002008000020016000040119401361180201100991001008000080000100824760167424639248380034153802474986461482520341136010511011611401578000280000801004013840134401544014840208
80204401013011100001005043226510015283232400952290568578732516010280102800008010080000400535184448812401334015440188301003300871601002008000020016000040175401621180201100991001008000080000100825082144724473247680043151522492766455282510211928000511011611401178000280000801004015840119401694012440139
80204401463011010001016140230410012564232401332264473477672516010280102800008010080000400535184429612400874019340166300483301081601002008000020016000040127401171180201100991001008000080000100824892153824420248280027150202480762456882514231433000511011611401118000280000801004016940141401614012240124
8020440121301110000997287228310014808196401372282487379442516010280102800008010080000400535184441612401384013740151300213301281601002008000020016000040184401261180201100991001008000080000100824805210124462246480061153802496754449882519361849010511011611401078000280000801004015340110400944014640108
8020440190301100100992741228910014725196401442272404352252516010280102800008010080000400535184379212401124019340170300673300711601002008000020016000040150400921180201100991001008000080000100824722139724557246480035149402480766463482510191479000511011611401448000280000801004013540166401544016640113
80204400963001001001012825231510014725184400892261508575362516010280102800008010080000400535184458412401354014440146300263300941601002008000020016000040119401241180201100991001008000080000100824892176724578247080024150312480766456782510341612010511011611401158000280000801004012740140401484014340102
80204401483011000001000240231910014803232401432299221598242516010280102800008010080000400535184403212400934017540126301153300951601002008000020016000040133401511180201100991001008000080000100824694165024518247780040151202480766454682515411446010511011611401718000280000801004013040157401394013940165
802044013730111010010011512288100147241964012222755004595725160102801028000080100800004005351845136124009740126401623002933009016010020080000200160000401404011311802011009910010080000800001008248081689243113247280030151902484762461382506201654000511011611401668000280000801004013340118401314014640128
8020440116300100000983133229810014646248401452270423464492516010280102800008010080000400535184376812401074016040187300403300551601002008000020016000040116401561180201100991001008000080000100824642143524276249580036152112476766455582506241162020511011611401008000280000801004014940150401244015340113
80204401333001000001000258230210015208228401392254580537912516010280102800008010080000400535184312012401324012340200300733301151601002008000020016000040117401461180201100991001008000080000100824883161524162246480045153602496756459482499311442000511011611401228000280000801004018740115401634012840162

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5017

retire (01)cycle (02)0305080b18191e1f2022293a3e3f4046494f51inst issue (52)~issue int (53)~issue ld/st (55)~dispatch int (56)~dispatch ld/st (58)huge thing int (59)huge thing ld/st (5a)606167696d6edispatch stall (70)scheduler rewind (75)scheduler stall (76)~dispatch op (78)~map op int (7c)~map op ld/st (7d)~map lookup int (7f)~map lookup ld/st (80)8283pipeline redirect (84)85inst all (8c)inst branch (8d)inst branch taken (90)inst b.cc (94)inst integer (97)inst fp/simd store (99)inst ldst (9b)9fa0a1a2a3a4a6a7a8a9aaabacafbcdcache store miss (c0)dtlb miss (c1)c2cdcfd0d2d5d6ddinst fetch restart (de)e0? int output thing (e9)ld/st retires (ed)gpr retires (ef)f5f6f7f8fd
80034401963003330010305582333112481226440128226067552377251600128001280000800108000040008318460761024010140138401323009433011816001020800002016000040143401211180021109101080000800001082504261456243610247880054154302501986474482509471828140050200051642401158000280000800104012440116401074013340159
800244013530033000102756123441146410268401442314597649242516001280012800008001080000400083184329210240118402934028930049330089160010208000020160000401234015911800211091010800008000010825041891924608249180046153802490754467682521531067140050200031752401288000280000800104011340140401284014940129
80024401503002020010116812278114722776401022313412451242516001280012800008001080000400083184312410240076401164011030057330099160010208000020160000401124015611800211091010800008000010824991416252476124958003915140249674845948252247754140050200041644401298000280000800104009040135401304013640100
8002440148301101001007450234511512419640076230525139221251600128001280000800108000040008318424041024008340093401453004333006216001020800002016000040123401021180021109101080000800001082510161384247211247580033148402504254460082510521110140050200041742401988000280000800104016740142401024012240101
800244013230110000102124423161127242364007222903252023425160012800128000080010800004000831842141102400954011840137300273300801600102080000201600004012740112118002110910108000080000108250614865247010250280036152302492100045838253333987140050200041743401048000280000800104008940156401064015040134
800244014330022100101495823411100852044008823164054493125160012800128000080010800004000831840989102400794015340091300443300891600102080000201600004012340111118002110910108000080000108252015149824875250680049154022500254466082499351371142050200021724401468000280000800104014440127401134016040153
800244015130010000101495123391172842004010922986402341725160012800128000080010801194000831844204002402344010540074300303301291600102080000201600004009840118118002110910108000080000108251614459247612250080042149612504113846938252631814140050200041742401028000280000800104010640129401214010340110
8002440103301200001016158231011448672840126224856644436251600128001280000800108000040008318443241024011840118401113009633008316001020800002016000040127401181180021109101080000800001082510158832468152495800361531025017724668825103412611400502010321724401358000280000800104013740150401184014540105
80024401213011110099664923431149642244008222733478424625160012800128000080010800004000831842886110240133401394010930033330142160010208000020160000400854012711800211091010800008000010825071411262431112493800311556024882544526825023011131401502010621724401748000280000800104015640104400944009440105
80024404313001010010185542310114485292400862274394189182516001280012800008001080000400083184425201024011240141401283008133012716001020800002016000040121401281180021109101080000800001082508157202434102490800441548024962544679824994215551400502010941744401598000280000800104013840157401074013140131