Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
swplh w0, w1, [x6] nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop
mov x0, 0
(no loop instructions)
Retires (minus 70 nops): 2.000
Issues: 2.000
Integer unit issues: 0.001
Load/store unit issues: 2.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch ldst uop (58) | simd uops in schedulers (5a) | dispatch uop (78) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
72005 | 34563 | 2005 | 1 | 2004 | 2000 | 11765 | 2000 | 2000 | 0 | 4000 | 1 | 2000 | 0 | 0 |
72004 | 34339 | 2001 | 1 | 2000 | 2000 | 11766 | 2000 | 2000 | 297 | 4246 | 153 | 2118 | 1 | 217 |
72004 | 34185 | 2001 | 1 | 2000 | 2000 | 11762 | 2000 | 2000 | 0 | 4004 | 1 | 2000 | 0 | 0 |
72004 | 34188 | 2001 | 1 | 2000 | 2000 | 11762 | 2000 | 2000 | 0 | 4000 | 1 | 2000 | 0 | 0 |
72004 | 34411 | 2001 | 1 | 2000 | 2002 | 11789 | 2002 | 2002 | 0 | 4000 | 1 | 2000 | 0 | 0 |
72004 | 34360 | 2001 | 1 | 2000 | 2000 | 11760 | 2000 | 2000 | 0 | 4000 | 1 | 2000 | 0 | 0 |
72004 | 34523 | 2001 | 1 | 2000 | 2000 | 11760 | 2000 | 2000 | 0 | 4000 | 1 | 2000 | 0 | 0 |
72004 | 34293 | 2001 | 1 | 2000 | 2000 | 11760 | 2000 | 2000 | 0 | 4000 | 1 | 2000 | 0 | 0 |
72004 | 34471 | 2001 | 1 | 2000 | 2000 | 11760 | 2000 | 2000 | 0 | 4000 | 1 | 2000 | 0 | 0 |
72004 | 34204 | 2001 | 1 | 2000 | 2000 | 11760 | 2000 | 2000 | 0 | 4000 | 1 | 2000 | 0 | 0 |
Code:
swplh w0, w1, [x6] add x6, x6, 2
(fused SUBS/B.cc loop)
Result (median cycles for code): 6.0064
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
30206 | 60619 | 30181 | 10128 | 20053 | 10129 | 20004 | 32888 | 132716 | 30106 | 10202 | 20004 | 10202 | 40008 | 10001 | 20000 | 0 | 10100 |
30204 | 60064 | 30101 | 10101 | 20000 | 10101 | 20004 | 32894 | 132724 | 30106 | 10202 | 20004 | 10222 | 40090 | 10022 | 20000 | 0 | 10100 |
30204 | 60064 | 30103 | 10101 | 20002 | 10102 | 20002 | 32895 | 132757 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 0 | 10100 |
30204 | 60064 | 30101 | 10101 | 20000 | 10101 | 20002 | 32893 | 132706 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 0 | 10100 |
30204 | 60064 | 30101 | 10101 | 20000 | 10101 | 20002 | 32893 | 132727 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 0 | 10100 |
30204 | 60064 | 30101 | 10101 | 20000 | 10101 | 20002 | 32893 | 132749 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 0 | 10100 |
30204 | 60064 | 30101 | 10101 | 20000 | 10101 | 20002 | 32893 | 132744 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 0 | 10100 |
30204 | 60064 | 30101 | 10101 | 20000 | 10101 | 20002 | 32891 | 132685 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 0 | 10100 |
30204 | 60064 | 30101 | 10101 | 20000 | 10101 | 20002 | 32893 | 132751 | 30103 | 10201 | 20003 | 10223 | 40089 | 10023 | 20000 | 0 | 10100 |
30204 | 60064 | 30101 | 10101 | 20000 | 10101 | 20002 | 32893 | 132740 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 0 | 10100 |
Result (median cycles for code): 6.0057
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30026 | 60260 | 30071 | 10035 | 20036 | 10035 | 20002 | 32615 | 132958 | 30013 | 10021 | 20003 | 10041 | 40086 | 10021 | 20000 | 10010 |
30024 | 60061 | 30011 | 10011 | 20000 | 10010 | 20000 | 32583 | 132863 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60061 | 30011 | 10011 | 20000 | 10010 | 20000 | 32581 | 132812 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60058 | 30011 | 10011 | 20000 | 10010 | 20000 | 32614 | 132996 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60054 | 30011 | 10011 | 20000 | 10010 | 20000 | 32581 | 132818 | 30010 | 10020 | 20000 | 10041 | 40089 | 10022 | 20000 | 10010 |
30024 | 60054 | 30011 | 10011 | 20000 | 10010 | 20000 | 32583 | 132893 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60054 | 30011 | 10011 | 20000 | 10010 | 20000 | 32612 | 132952 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60054 | 30011 | 10011 | 20000 | 10010 | 20000 | 32583 | 132859 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60054 | 30011 | 10011 | 20000 | 10010 | 20000 | 32583 | 132848 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60054 | 30011 | 10011 | 20000 | 10010 | 20002 | 32648 | 133445 | 30013 | 10021 | 20003 | 10022 | 40008 | 10001 | 20000 | 10010 |
Code:
swplh w0, w1, [x6]
mov x7, 8
(fused SUBS/B.cc loop)
Result (median cycles for code): 9.8428
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
20205 | 100247 | 21105 | 109 | 20996 | 108 | 20567 | 379 | 1759585 | 20667 | 208 | 21842 | 206 | 43508 | 1 | 20000 | 100 |
20204 | 99552 | 20951 | 106 | 20845 | 105 | 20089 | 398 | 1732332 | 20189 | 200 | 20278 | 200 | 40232 | 1 | 20000 | 100 |
20204 | 99313 | 20807 | 101 | 20706 | 100 | 20841 | 395 | 1760597 | 20941 | 200 | 22730 | 200 | 43296 | 1 | 20000 | 100 |
20204 | 99366 | 20821 | 101 | 20720 | 100 | 20183 | 328 | 1747920 | 20283 | 200 | 20620 | 200 | 43312 | 1 | 20000 | 100 |
20204 | 98428 | 20496 | 101 | 20395 | 100 | 20113 | 332 | 1763266 | 20213 | 200 | 20378 | 200 | 45904 | 1 | 20000 | 100 |
20204 | 98144 | 20189 | 101 | 20088 | 100 | 20637 | 563 | 1727853 | 20753 | 290 | 22024 | 262 | 47440 | 1 | 20000 | 100 |
20204 | 99391 | 20941 | 118 | 20823 | 117 | 20236 | 512 | 1728515 | 20348 | 258 | 20736 | 200 | 43856 | 1 | 20000 | 100 |
20204 | 98318 | 20504 | 101 | 20403 | 100 | 20319 | 396 | 1729837 | 20419 | 200 | 21044 | 200 | 41384 | 1 | 20000 | 100 |
20204 | 98116 | 20168 | 101 | 20067 | 100 | 20407 | 500 | 1752948 | 20507 | 200 | 21286 | 292 | 42588 | 1 | 20000 | 100 |
20204 | 96593 | 20382 | 101 | 20281 | 100 | 20194 | 500 | 1726916 | 20294 | 200 | 20596 | 204 | 40276 | 1 | 20000 | 100 |
Result (median cycles for code): 10.0391
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
20025 | 100719 | 21531 | 11 | 21520 | 10 | 21564 | 52 | 1774675 | 21575 | 22 | 24772 | 24 | 51868 | 1 | 20000 | 0 | 10 |
20025 | 100705 | 21624 | 12 | 21612 | 11 | 21482 | 49 | 1781598 | 21492 | 20 | 24622 | 26 | 47476 | 1 | 20000 | 0 | 10 |
20024 | 100025 | 21278 | 13 | 21265 | 12 | 21756 | 57 | 1781900 | 21768 | 26 | 25536 | 24 | 48084 | 1 | 20000 | 0 | 10 |
20024 | 100574 | 21520 | 12 | 21508 | 11 | 21896 | 50 | 1780449 | 21907 | 22 | 25804 | 20 | 50744 | 1 | 20000 | 0 | 10 |
20024 | 100441 | 21405 | 15 | 21390 | 14 | 21378 | 56 | 1776676 | 21391 | 26 | 24138 | 26 | 48372 | 1 | 20000 | 0 | 10 |
20024 | 100491 | 21249 | 15 | 21234 | 14 | 21786 | 54 | 1783185 | 21798 | 24 | 25530 | 28 | 51112 | 1 | 20000 | 0 | 10 |
20024 | 100394 | 21329 | 17 | 21312 | 16 | 21939 | 63 | 1778291 | 21953 | 28 | 25694 | 20 | 49664 | 1 | 20000 | 0 | 10 |
20024 | 100421 | 21453 | 13 | 21440 | 12 | 21645 | 52 | 1780252 | 21656 | 22 | 25030 | 26 | 48768 | 1 | 20000 | 0 | 10 |
20024 | 100187 | 21126 | 13 | 21113 | 12 | 21670 | 48 | 1774721 | 21680 | 20 | 25192 | 22 | 47036 | 1 | 20000 | 0 | 10 |
20024 | 100402 | 21324 | 14 | 21310 | 13 | 21051 | 56 | 1775323 | 21061 | 34 | 23334 | 24 | 51108 | 1 | 20000 | 0 | 10 |