Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
ldpsw x0, x1, [x6, #8]!
mov x0, 1 mov x1, 2 mov x8, 0
(no loop instructions)
Retires: 3.000
Issues: 2.000
Integer unit issues: 1.001
Load/store unit issues: 1.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
3005 | 1367 | 2036 | 1020 | 1016 | 1028 | 1000 | 13094 | 14879 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1119 | 2001 | 1001 | 1000 | 1000 | 1000 | 13576 | 14860 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1131 | 2001 | 1001 | 1000 | 1000 | 1000 | 13551 | 14818 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1101 | 2001 | 1001 | 1000 | 1000 | 1000 | 13185 | 14475 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1099 | 2001 | 1001 | 1000 | 1000 | 1000 | 13580 | 14900 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1097 | 2001 | 1001 | 1000 | 1000 | 1000 | 13348 | 14594 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1098 | 2001 | 1001 | 1000 | 1000 | 1000 | 13406 | 14668 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1135 | 2001 | 1001 | 1000 | 1000 | 1000 | 13434 | 14994 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1098 | 2001 | 1001 | 1000 | 1000 | 1000 | 13695 | 14958 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1139 | 2001 | 1001 | 1000 | 1000 | 1000 | 13772 | 14939 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
Chain cycles: 3
Code:
ldpsw x0, x1, [x6, #8]! eor x8, x8, x0 eor x8, x8, x0 add x6, x6, x8
mov x0, 1 mov x1, 2 mov x8, 0
(fused SUBS/B.cc loop)
Result (median cycles for code, minus 3 chain cycles): 4.0140
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
60209 | 71246 | 50261 | 40256 | 10005 | 40348 | 10003 | 1850123 | 548908 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70094 | 50203 | 40203 | 10000 | 40206 | 10003 | 1849852 | 548855 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70092 | 50203 | 40203 | 10000 | 40206 | 10003 | 1849852 | 548855 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70092 | 50203 | 40203 | 10000 | 40206 | 10003 | 1849852 | 548855 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70092 | 50203 | 40203 | 10000 | 40206 | 10003 | 1849852 | 548855 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70092 | 50203 | 40203 | 10000 | 40206 | 10003 | 1849852 | 548855 | 50209 | 40212 | 20008 | 70291 | 20028 | 40114 | 10000 | 50100 |
60205 | 70259 | 50218 | 40216 | 10002 | 40239 | 10003 | 1851067 | 549253 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70092 | 50203 | 40203 | 10000 | 40206 | 10003 | 1849852 | 548855 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70092 | 50203 | 40203 | 10000 | 40206 | 10003 | 1849852 | 548855 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70094 | 50203 | 40203 | 10000 | 40206 | 10003 | 1852228 | 549645 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
Result (median cycles for code, minus 3 chain cycles): 4.0106
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
60029 | 71232 | 50080 | 40075 | 10005 | 40166 | 10003 | 1850545 | 549833 | 50029 | 40032 | 20008 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70113 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850326 | 549746 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70104 | 50023 | 40023 | 10000 | 40020 | 10000 | 1850380 | 549764 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70104 | 50023 | 40023 | 10000 | 40020 | 10000 | 1850326 | 549746 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70104 | 50023 | 40023 | 10000 | 40020 | 10000 | 1850326 | 549746 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70104 | 50023 | 40023 | 10000 | 40020 | 10000 | 1850326 | 549746 | 50020 | 40020 | 20000 | 70111 | 20028 | 40024 | 10000 | 50010 |
60024 | 70131 | 50023 | 40023 | 10000 | 40020 | 10000 | 1850326 | 549746 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70104 | 50023 | 40023 | 10000 | 40020 | 10000 | 1850326 | 549746 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70104 | 50023 | 40023 | 10000 | 40020 | 10000 | 1850326 | 549746 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70104 | 50023 | 40023 | 10000 | 40020 | 10000 | 1850326 | 549746 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
Chain cycles: 3
Code:
ldpsw x0, x1, [x6, #8]! eor x8, x8, x1 eor x8, x8, x1 add x6, x6, x8
mov x0, 1 mov x1, 2 mov x8, 0
(fused SUBS/B.cc loop)
Result (median cycles for code, minus 3 chain cycles): 4.0112
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
60209 | 71571 | 50261 | 40256 | 10005 | 40348 | 10003 | 1850473 | 549059 | 50209 | 40212 | 20008 | 70221 | 20008 | 40104 | 10000 | 50100 |
60204 | 70117 | 50204 | 40204 | 10000 | 40206 | 10013 | 1851879 | 550901 | 50253 | 40252 | 20028 | 70221 | 20008 | 40103 | 10000 | 50100 |
60205 | 70189 | 50216 | 40214 | 10002 | 40240 | 10003 | 1850959 | 549228 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70106 | 50203 | 40203 | 10000 | 40206 | 10003 | 1850230 | 548985 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60205 | 70197 | 50216 | 40214 | 10002 | 40240 | 10003 | 1851040 | 549255 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70134 | 50203 | 40203 | 10000 | 40206 | 10003 | 1850824 | 549183 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70119 | 50203 | 40203 | 10000 | 40206 | 10003 | 1850986 | 549237 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70117 | 50203 | 40203 | 10000 | 40206 | 10003 | 1850797 | 549174 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70132 | 50203 | 40203 | 10000 | 40206 | 10003 | 1851013 | 549244 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60205 | 70240 | 50218 | 40216 | 10002 | 40240 | 10003 | 1851013 | 549246 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
Result (median cycles for code, minus 3 chain cycles): 4.0142
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
60029 | 71226 | 50079 | 40074 | 10005 | 40166 | 10003 | 1850599 | 549851 | 50029 | 40032 | 20008 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70122 | 50025 | 40025 | 10000 | 40020 | 10000 | 1850650 | 549840 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
60025 | 70186 | 50037 | 40035 | 10002 | 40059 | 10000 | 1850488 | 549793 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70111 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850515 | 549802 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70111 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850650 | 549840 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70111 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850650 | 549840 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70111 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850650 | 549840 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70111 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850650 | 549840 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70111 | 50024 | 40024 | 10000 | 40020 | 10013 | 1852601 | 550443 | 50073 | 40072 | 20028 | 70020 | 20000 | 40015 | 10000 | 50010 |
60024 | 70111 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850650 | 549840 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
Count: 8
Code:
ldpsw x0, x1, [x6, #8]! ldpsw x0, x1, [x7, #8]! ldpsw x0, x1, [x8, #8]! ldpsw x0, x1, [x9, #8]! ldpsw x0, x1, [x10, #8]! ldpsw x0, x1, [x11, #8]! ldpsw x0, x1, [x12, #8]! ldpsw x0, x1, [x13, #8]!
mov x7, x6 mov x8, x6 mov x9, x6 mov x10, x6 mov x11, x6 mov x12, x6 mov x13, x6
(fused SUBS/B.cc loop)
Result (median cycles for code divided by count): 0.7516
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
240209 | 61205 | 160352 | 80240 | 80112 | 80241 | 80009 | 240475 | 251298 | 160117 | 80208 | 160018 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60153 | 160111 | 80106 | 80005 | 80108 | 80008 | 240476 | 251278 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60144 | 160111 | 80106 | 80005 | 80108 | 80008 | 240475 | 251274 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60133 | 160111 | 80106 | 80005 | 80108 | 80008 | 240474 | 251261 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60131 | 160111 | 80106 | 80005 | 80108 | 80008 | 240487 | 251244 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60133 | 160111 | 80106 | 80005 | 80108 | 80008 | 240475 | 251312 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60130 | 160111 | 80106 | 80005 | 80108 | 80008 | 240472 | 251288 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60136 | 160111 | 80106 | 80005 | 80108 | 80008 | 240476 | 251296 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60134 | 160111 | 80106 | 80005 | 80108 | 80008 | 240476 | 251317 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60134 | 160111 | 80106 | 80005 | 80108 | 80008 | 240475 | 251268 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
Result (median cycles for code divided by count): 0.7515
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
240029 | 61368 | 160255 | 80149 | 80106 | 80151 | 80008 | 240203 | 251593 | 160026 | 80028 | 160016 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60091 | 160011 | 80011 | 80000 | 80010 | 80000 | 240168 | 251304 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60089 | 160011 | 80011 | 80000 | 80010 | 80000 | 240167 | 251274 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60088 | 160011 | 80011 | 80000 | 80010 | 80000 | 240167 | 251281 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60081 | 160011 | 80011 | 80000 | 80010 | 80000 | 240168 | 251280 | 160010 | 80020 | 160000 | 80054 | 160069 | 80032 | 80000 | 160010 |
240024 | 60081 | 160011 | 80011 | 80000 | 80010 | 80000 | 240167 | 251271 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60095 | 160011 | 80011 | 80000 | 80010 | 80000 | 240168 | 251275 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60088 | 160011 | 80011 | 80000 | 80010 | 80000 | 240167 | 251281 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |
240025 | 60262 | 160073 | 80043 | 80030 | 80045 | 80000 | 240167 | 251126 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60084 | 160011 | 80011 | 80000 | 80010 | 80036 | 240388 | 254162 | 160081 | 80055 | 160073 | 80020 | 160000 | 80001 | 80000 | 160010 |