Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
ldp w0, w1, [x6], #8
mov x0, 1 mov x1, 2 mov x8, 0
(no loop instructions)
Retires: 3.000
Issues: 2.000
Integer unit issues: 1.001
Load/store unit issues: 1.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
3005 | 1298 | 2038 | 1023 | 1015 | 1028 | 1000 | 13226 | 15065 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1113 | 2001 | 1001 | 1000 | 1000 | 1000 | 13277 | 14749 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1108 | 2001 | 1001 | 1000 | 1000 | 1000 | 13806 | 15667 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1086 | 2001 | 1001 | 1000 | 1000 | 1000 | 13529 | 15036 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1075 | 2001 | 1001 | 1000 | 1000 | 1000 | 13547 | 15098 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1076 | 2001 | 1001 | 1000 | 1000 | 1000 | 13467 | 14608 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1093 | 2001 | 1001 | 1000 | 1000 | 1000 | 13374 | 15011 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1136 | 2001 | 1001 | 1000 | 1000 | 1000 | 13642 | 14973 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1120 | 2001 | 1001 | 1000 | 1000 | 1000 | 13389 | 14657 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1080 | 2001 | 1001 | 1000 | 1000 | 1000 | 13440 | 14997 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
Chain cycles: 3
Code:
ldp w0, w1, [x6], #8 eor x8, x8, x0 eor x8, x8, x0 add x6, x6, x8
mov x0, 1 mov x1, 2 mov x8, 0
(fused SUBS/B.cc loop)
Result (median cycles for code, minus 3 chain cycles): 4.0108
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
60209 | 71439 | 50262 | 40257 | 10005 | 40348 | 10003 | 1850933 | 549182 | 50209 | 40212 | 20008 | 70221 | 20008 | 40104 | 10000 | 50100 |
60204 | 70145 | 50204 | 40204 | 10000 | 40206 | 10003 | 1850447 | 549027 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70132 | 50203 | 40203 | 10000 | 40206 | 10003 | 1850635 | 549120 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70118 | 50203 | 40203 | 10000 | 40206 | 10003 | 1850635 | 549120 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70118 | 50203 | 40203 | 10000 | 40206 | 10003 | 1850554 | 549093 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70110 | 50203 | 40203 | 10000 | 40206 | 10013 | 1852446 | 549673 | 50253 | 40252 | 20028 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70140 | 50203 | 40203 | 10000 | 40206 | 10003 | 1851634 | 549453 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70136 | 50203 | 40203 | 10000 | 40206 | 10003 | 1850851 | 549192 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70124 | 50203 | 40203 | 10000 | 40206 | 10003 | 1850959 | 549228 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70168 | 50203 | 40203 | 10000 | 40206 | 10013 | 1852097 | 549557 | 50253 | 40252 | 20028 | 70293 | 20028 | 40116 | 10000 | 50100 |
Result (median cycles for code, minus 3 chain cycles): 4.0180
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
60029 | 71247 | 50080 | 40075 | 10005 | 40166 | 10003 | 1849952 | 549608 | 50029 | 40032 | 20008 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70094 | 50023 | 40023 | 10000 | 40020 | 10000 | 1849948 | 549616 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60025 | 70177 | 50036 | 40034 | 10002 | 40060 | 10000 | 1849948 | 549616 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70090 | 50023 | 40023 | 10000 | 40020 | 10000 | 1849948 | 549616 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70090 | 50023 | 40023 | 10000 | 40020 | 10000 | 1849948 | 549616 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70090 | 50023 | 40023 | 10000 | 40020 | 10000 | 1849948 | 549616 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70090 | 50023 | 40023 | 10000 | 40020 | 10000 | 1849948 | 549616 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70095 | 50023 | 40023 | 10000 | 40020 | 10000 | 1849948 | 549616 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70090 | 50023 | 40023 | 10000 | 40020 | 10000 | 1849948 | 549616 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70090 | 50023 | 40023 | 10000 | 40020 | 10000 | 1849948 | 549616 | 50020 | 40020 | 20000 | 70111 | 20028 | 40024 | 10000 | 50010 |
Chain cycles: 3
Code:
ldp w0, w1, [x6], #8 eor x8, x8, x1 eor x8, x8, x1 add x6, x6, x8
mov x0, 1 mov x1, 2 mov x8, 0
(fused SUBS/B.cc loop)
Result (median cycles for code, minus 3 chain cycles): 4.0160
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
60209 | 71227 | 50259 | 40254 | 10005 | 40348 | 10003 | 1851447 | 549297 | 50209 | 40212 | 20008 | 70221 | 20008 | 40106 | 10000 | 50100 |
60204 | 70159 | 50206 | 40206 | 10000 | 40206 | 10003 | 1851688 | 549431 | 50209 | 40212 | 20008 | 70221 | 20008 | 40107 | 10000 | 50100 |
60204 | 70160 | 50207 | 40207 | 10000 | 40206 | 10003 | 1851688 | 549431 | 50209 | 40212 | 20008 | 70221 | 20008 | 40107 | 10000 | 50100 |
60204 | 70160 | 50207 | 40207 | 10000 | 40206 | 10003 | 1851742 | 549449 | 50209 | 40212 | 20008 | 70221 | 20008 | 40107 | 10000 | 50100 |
60204 | 70160 | 50207 | 40207 | 10000 | 40206 | 10003 | 1851688 | 549431 | 50209 | 40212 | 20008 | 70291 | 20028 | 40119 | 10000 | 50100 |
60204 | 70208 | 50207 | 40207 | 10000 | 40206 | 10003 | 1852255 | 549620 | 50209 | 40212 | 20008 | 70221 | 20008 | 40107 | 10000 | 50100 |
60204 | 70198 | 50207 | 40207 | 10000 | 40206 | 10003 | 1853416 | 550005 | 50209 | 40212 | 20008 | 70221 | 20008 | 40107 | 10000 | 50100 |
60204 | 70160 | 50207 | 40207 | 10000 | 40206 | 10003 | 1851472 | 549359 | 50209 | 40212 | 20008 | 70221 | 20008 | 40108 | 10000 | 50100 |
60204 | 70206 | 50207 | 40207 | 10000 | 40206 | 10003 | 1852417 | 549674 | 50209 | 40212 | 20008 | 70221 | 20008 | 40107 | 10000 | 50100 |
60204 | 70149 | 50207 | 40207 | 10000 | 40206 | 10003 | 1851796 | 549467 | 50209 | 40212 | 20008 | 70221 | 20008 | 40107 | 10000 | 50100 |
Result (median cycles for code, minus 3 chain cycles): 4.0133
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
60029 | 71306 | 50080 | 40075 | 10005 | 40166 | 10003 | 1850896 | 549950 | 50029 | 40032 | 20008 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70119 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850677 | 549856 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70117 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850677 | 549856 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70117 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850677 | 549856 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70117 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850677 | 549856 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70117 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850677 | 549856 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70117 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850677 | 549856 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70117 | 50024 | 40024 | 10000 | 40020 | 10013 | 1852431 | 550391 | 50073 | 40072 | 20028 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70117 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850677 | 549856 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70117 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850677 | 549856 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
Count: 8
Code:
ldp w0, w1, [x6], #8 ldp w0, w1, [x7], #8 ldp w0, w1, [x8], #8 ldp w0, w1, [x9], #8 ldp w0, w1, [x10], #8 ldp w0, w1, [x11], #8 ldp w0, w1, [x12], #8 ldp w0, w1, [x13], #8
mov x7, x6 mov x8, x6 mov x9, x6 mov x10, x6 mov x11, x6 mov x12, x6 mov x13, x6
(fused SUBS/B.cc loop)
Result (median cycles for code divided by count): 0.7517
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
240209 | 61200 | 160330 | 80240 | 80090 | 80241 | 80008 | 240475 | 251434 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60135 | 160111 | 80106 | 80005 | 80108 | 80008 | 240474 | 251386 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60137 | 160111 | 80106 | 80005 | 80108 | 80008 | 240475 | 251480 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60138 | 160111 | 80106 | 80005 | 80108 | 80008 | 240477 | 251422 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60135 | 160111 | 80106 | 80005 | 80108 | 80008 | 240475 | 251398 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60137 | 160111 | 80106 | 80005 | 80108 | 80008 | 240486 | 251388 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60137 | 160111 | 80106 | 80005 | 80108 | 80008 | 240472 | 251381 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60138 | 160111 | 80106 | 80005 | 80108 | 80008 | 240475 | 251367 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60136 | 160111 | 80106 | 80005 | 80108 | 80008 | 240476 | 251388 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60133 | 160111 | 80106 | 80005 | 80108 | 80008 | 240475 | 251362 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
Result (median cycles for code divided by count): 0.7509
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
240029 | 61228 | 160249 | 80149 | 80100 | 80151 | 80008 | 240205 | 251392 | 160026 | 80028 | 160016 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60080 | 160011 | 80011 | 80000 | 80010 | 80000 | 240116 | 251216 | 160010 | 80020 | 160000 | 80056 | 160072 | 80034 | 80000 | 160010 |
240024 | 60081 | 160011 | 80011 | 80000 | 80010 | 80000 | 240116 | 251221 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60082 | 160011 | 80011 | 80000 | 80010 | 80009 | 240409 | 251992 | 160027 | 80028 | 160018 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60103 | 160011 | 80011 | 80000 | 80010 | 80000 | 240307 | 251549 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60103 | 160011 | 80011 | 80000 | 80010 | 80000 | 240310 | 251549 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60108 | 160011 | 80011 | 80000 | 80010 | 80000 | 240307 | 251444 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60103 | 160011 | 80011 | 80000 | 80010 | 80000 | 240305 | 251551 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60108 | 160011 | 80011 | 80000 | 80010 | 80000 | 240311 | 251549 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60104 | 160011 | 80011 | 80000 | 80010 | 80000 | 240306 | 251548 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |