Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
swph w0, w1, [x6] nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop
mov x0, 0
(no loop instructions)
Retires (minus 70 nops): 2.000
Issues: 2.000
Integer unit issues: 0.001
Load/store unit issues: 2.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch ldst uop (58) | simd uops in schedulers (5a) | dispatch uop (78) | map ldst uop (7d) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) |
72005 | 34428 | 2005 | 1 | 2004 | 2000 | 11767 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34206 | 2001 | 1 | 2000 | 2000 | 11767 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34169 | 2001 | 1 | 2000 | 2000 | 11767 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34158 | 2001 | 1 | 2000 | 2000 | 11767 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34344 | 2001 | 1 | 2000 | 2000 | 11841 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34593 | 2001 | 1 | 2000 | 2000 | 11770 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34509 | 2001 | 1 | 2000 | 2000 | 11770 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 35043 | 2001 | 1 | 2000 | 2000 | 11770 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 35013 | 2001 | 1 | 2000 | 2000 | 11770 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34561 | 2001 | 1 | 2000 | 2000 | 11770 | 2000 | 2000 | 4000 | 1 | 2000 |
Code:
swph w0, w1, [x6] add x6, x6, 2
(fused SUBS/B.cc loop)
Result (median cycles for code): 3.0062
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30206 | 30405 | 30195 | 10137 | 0 | 20058 | 10137 | 0 | 20006 | 32902 | 125763 | 30109 | 10203 | 20007 | 10203 | 40013 | 10003 | 20000 | 10100 |
30204 | 30065 | 30105 | 10102 | 0 | 20003 | 10102 | 0 | 20005 | 32877 | 125719 | 30107 | 10202 | 20006 | 10202 | 40012 | 10002 | 20000 | 10100 |
30204 | 30058 | 30105 | 10102 | 0 | 20003 | 10102 | 0 | 20005 | 32877 | 125743 | 30107 | 10202 | 20006 | 10202 | 40012 | 10002 | 20000 | 10100 |
30204 | 30065 | 30105 | 10102 | 0 | 20003 | 10102 | 0 | 20005 | 32877 | 125657 | 30107 | 10202 | 20006 | 10202 | 40012 | 10002 | 20000 | 10100 |
30204 | 30665 | 30105 | 10102 | 0 | 20003 | 10102 | 0 | 20005 | 32903 | 125881 | 30107 | 10202 | 20006 | 10202 | 40012 | 10002 | 20000 | 10100 |
30204 | 30062 | 30105 | 10102 | 0 | 20003 | 10102 | 0 | 20005 | 32906 | 125838 | 30107 | 10202 | 20006 | 10202 | 40012 | 10002 | 20000 | 10100 |
30204 | 30062 | 30105 | 10102 | 0 | 20003 | 10102 | 0 | 20005 | 32906 | 125998 | 30107 | 10202 | 20006 | 10202 | 40012 | 10002 | 20000 | 10100 |
30204 | 30062 | 30105 | 10102 | 0 | 20003 | 10102 | 0 | 20005 | 32906 | 125952 | 30107 | 10202 | 20006 | 10202 | 40012 | 10002 | 20000 | 10100 |
30204 | 30062 | 30105 | 10102 | 0 | 20003 | 10102 | 0 | 20005 | 32906 | 125898 | 30107 | 10202 | 20006 | 10202 | 40012 | 10002 | 20000 | 10100 |
30204 | 30062 | 30105 | 10102 | 0 | 20003 | 10102 | 0 | 20005 | 32906 | 125942 | 30107 | 10202 | 20006 | 10202 | 40012 | 10002 | 20000 | 10100 |
Result (median cycles for code): 3.0058
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30026 | 30369 | 30098 | 10045 | 20053 | 10045 | 20005 | 32625 | 127284 | 30017 | 10022 | 20006 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 30065 | 30011 | 10011 | 20000 | 10010 | 20000 | 32601 | 126991 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 30058 | 30011 | 10011 | 20000 | 10010 | 20000 | 32601 | 126343 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 30058 | 30011 | 10011 | 20000 | 10010 | 20000 | 32601 | 126771 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 30058 | 30011 | 10011 | 20000 | 10010 | 20000 | 32594 | 126288 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 30058 | 30011 | 10011 | 20000 | 10010 | 20000 | 32601 | 126573 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 30058 | 30011 | 10011 | 20000 | 10010 | 20040 | 32735 | 127120 | 30070 | 10040 | 20040 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 30058 | 30011 | 10011 | 20000 | 10010 | 20000 | 32601 | 126979 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 30058 | 30011 | 10011 | 20000 | 10010 | 20000 | 32594 | 126448 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 30058 | 30011 | 10011 | 20000 | 10010 | 20000 | 32601 | 126453 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
Code:
swph w0, w1, [x6]
mov x7, 8
(fused SUBS/B.cc loop)
Result (median cycles for code): 10.4944
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
20206 | 102013 | 20226 | 101 | 20125 | 100 | 20072 | 499 | 1879656 | 20172 | 200 | 20198 | 200 | 40152 | 1 | 20000 | 100 |
20205 | 107651 | 20200 | 101 | 20099 | 100 | 20009 | 500 | 1948291 | 20109 | 200 | 20028 | 200 | 40040 | 1 | 20000 | 100 |
20204 | 100057 | 20109 | 101 | 20008 | 100 | 20009 | 500 | 1778625 | 20109 | 200 | 20020 | 200 | 40040 | 1 | 20000 | 100 |
20204 | 103796 | 20172 | 101 | 20071 | 100 | 20037 | 500 | 1866111 | 20137 | 200 | 20100 | 204 | 40156 | 1 | 20000 | 100 |
20204 | 104308 | 20101 | 101 | 20000 | 100 | 20032 | 404 | 1857090 | 20132 | 200 | 20092 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 105066 | 20235 | 101 | 20134 | 100 | 20019 | 400 | 1876779 | 20119 | 200 | 20044 | 200 | 40136 | 1 | 20000 | 100 |
20204 | 106494 | 20154 | 101 | 20053 | 100 | 20075 | 514 | 1967675 | 20177 | 202 | 20130 | 200 | 40248 | 1 | 20000 | 100 |
20204 | 105338 | 20116 | 101 | 20015 | 100 | 20209 | 500 | 1847557 | 20309 | 200 | 20552 | 200 | 40360 | 1 | 20000 | 100 |
20204 | 104001 | 20125 | 101 | 20024 | 100 | 20020 | 417 | 1839194 | 20120 | 200 | 20052 | 204 | 40196 | 1 | 20000 | 100 |
20204 | 102651 | 20122 | 101 | 20021 | 100 | 20044 | 503 | 1892650 | 20145 | 202 | 20096 | 200 | 40040 | 1 | 20000 | 100 |
Result (median cycles for code): 10.0701
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
20025 | 100179 | 20045 | 11 | 20034 | 10 | 20009 | 50 | 1778625 | 20019 | 20 | 20020 | 20 | 40032 | 1 | 20000 | 0 | 10 |
20024 | 100061 | 20019 | 11 | 20008 | 10 | 20013 | 49 | 1781990 | 20023 | 20 | 20026 | 20 | 40044 | 1 | 20000 | 0 | 10 |
20024 | 100057 | 20019 | 11 | 20008 | 10 | 20009 | 50 | 1778499 | 20019 | 20 | 20016 | 20 | 40032 | 1 | 20000 | 0 | 10 |
20024 | 100060 | 20019 | 11 | 20008 | 10 | 20013 | 49 | 1782008 | 20023 | 20 | 20026 | 20 | 40052 | 1 | 20000 | 0 | 10 |
20024 | 100410 | 20017 | 11 | 20006 | 10 | 20029 | 49 | 1784376 | 20039 | 20 | 20070 | 20 | 40032 | 1 | 20000 | 0 | 10 |
20024 | 100322 | 20019 | 11 | 20008 | 10 | 20091 | 50 | 1788953 | 20101 | 20 | 20152 | 20 | 40324 | 1 | 20000 | 0 | 10 |
20024 | 100702 | 20075 | 11 | 20064 | 10 | 20024 | 49 | 1790658 | 20034 | 20 | 20066 | 20 | 40188 | 1 | 20000 | 0 | 10 |
20024 | 100482 | 20031 | 11 | 20020 | 10 | 20053 | 49 | 1782360 | 20063 | 20 | 20134 | 20 | 40224 | 1 | 20000 | 0 | 10 |
20024 | 100069 | 20019 | 11 | 20008 | 10 | 20009 | 50 | 1778499 | 20019 | 20 | 20016 | 20 | 40032 | 1 | 20000 | 0 | 10 |
20024 | 100185 | 20031 | 11 | 20020 | 10 | 20012 | 50 | 1782706 | 20022 | 20 | 20032 | 20 | 40072 | 1 | 20000 | 0 | 10 |