Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
swpb w0, w1, [x6] nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop
mov x0, 0
(no loop instructions)
Retires (minus 70 nops): 2.000
Issues: 2.000
Integer unit issues: 0.001
Load/store unit issues: 2.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch ldst uop (58) | simd uops in schedulers (5a) | dispatch uop (78) | map ldst uop (7d) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) |
72006 | 34862 | 2019 | 1 | 2018 | 2000 | 11770 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34097 | 2001 | 1 | 2000 | 2000 | 11770 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34069 | 2001 | 1 | 2000 | 2000 | 11770 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34072 | 2001 | 1 | 2000 | 2000 | 11770 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34072 | 2001 | 1 | 2000 | 2002 | 11782 | 2002 | 2002 | 4000 | 1 | 2000 |
72004 | 34212 | 2001 | 1 | 2000 | 2000 | 11770 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34068 | 2001 | 1 | 2000 | 2000 | 11770 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34071 | 2001 | 1 | 2000 | 2000 | 11770 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34075 | 2001 | 1 | 2000 | 2000 | 11770 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34068 | 2001 | 1 | 2000 | 2000 | 11770 | 2000 | 2000 | 4000 | 1 | 2000 |
Code:
swpb w0, w1, [x6] add x6, x6, 2
(fused SUBS/B.cc loop)
Result (median cycles for code): 3.0062
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
30206 | 30342 | 30195 | 10137 | 20058 | 10137 | 20006 | 32901 | 125769 | 30109 | 10203 | 20007 | 10203 | 40013 | 0 | 10003 | 20000 | 0 | 10100 |
30204 | 30065 | 30107 | 10103 | 20004 | 10103 | 20005 | 32870 | 125788 | 30107 | 10202 | 20006 | 10202 | 40012 | 0 | 10002 | 20000 | 0 | 10100 |
30204 | 30058 | 30105 | 10102 | 20003 | 10102 | 20005 | 32870 | 125708 | 30107 | 10202 | 20006 | 10202 | 40012 | 0 | 10002 | 20000 | 0 | 10100 |
30204 | 30058 | 30105 | 10102 | 20003 | 10102 | 20005 | 32895 | 125715 | 30107 | 10202 | 20006 | 10202 | 40012 | 0 | 10002 | 20000 | 0 | 10100 |
30204 | 30058 | 30105 | 10102 | 20003 | 10102 | 20005 | 32870 | 125744 | 30107 | 10202 | 20006 | 10202 | 40012 | 0 | 10002 | 20000 | 0 | 10100 |
30204 | 30058 | 30105 | 10102 | 20003 | 10102 | 20005 | 32895 | 125743 | 30107 | 10202 | 20006 | 10202 | 40012 | 0 | 10002 | 20000 | 0 | 10100 |
30204 | 30065 | 30105 | 10102 | 20003 | 10102 | 20005 | 32870 | 125754 | 30107 | 10202 | 20006 | 10202 | 40012 | 0 | 10002 | 20000 | 0 | 10100 |
30204 | 30065 | 30105 | 10102 | 20003 | 10102 | 20005 | 32870 | 125668 | 30107 | 10202 | 20006 | 10202 | 40012 | 0 | 10002 | 20000 | 0 | 10100 |
30204 | 30065 | 30105 | 10102 | 20003 | 10102 | 20005 | 32870 | 125686 | 30107 | 10202 | 20006 | 10202 | 40012 | 0 | 10002 | 20000 | 0 | 10100 |
30204 | 30058 | 30105 | 10102 | 20003 | 10102 | 20005 | 32870 | 125762 | 30107 | 10202 | 20006 | 10202 | 40012 | 0 | 10002 | 20000 | 0 | 10100 |
Result (median cycles for code): 3.0065
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30027 | 30498 | 30138 | 10062 | 20076 | 10062 | 20005 | 32625 | 126456 | 30017 | 10022 | 20006 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 30065 | 30011 | 10011 | 20000 | 10010 | 20000 | 32628 | 127200 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 30065 | 30011 | 10011 | 20000 | 10010 | 20000 | 32628 | 126560 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 30065 | 30011 | 10011 | 20000 | 10010 | 20000 | 32619 | 126941 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 30065 | 30011 | 10011 | 20000 | 10010 | 20000 | 32628 | 126360 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 30065 | 30011 | 10011 | 20000 | 10010 | 20000 | 32628 | 126695 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 30065 | 30011 | 10011 | 20000 | 10010 | 20000 | 32628 | 126374 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 30065 | 30011 | 10011 | 20000 | 10010 | 20000 | 32628 | 126596 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 30065 | 30011 | 10011 | 20000 | 10010 | 20000 | 32628 | 126650 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 30065 | 30011 | 10011 | 20000 | 10010 | 20000 | 32628 | 127148 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
Code:
swpb w0, w1, [x6]
mov x7, 8
(fused SUBS/B.cc loop)
Result (median cycles for code): 10.5995
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
20206 | 102617 | 20178 | 101 | 20077 | 100 | 20000 | 500 | 1803287 | 20100 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 101792 | 20112 | 101 | 20011 | 100 | 20011 | 500 | 1809866 | 20111 | 200 | 20022 | 202 | 40316 | 1 | 20000 | 100 |
20204 | 105995 | 20130 | 101 | 20029 | 100 | 20016 | 350 | 1867080 | 20116 | 200 | 20034 | 200 | 40560 | 1 | 20000 | 100 |
20204 | 109623 | 20120 | 101 | 20019 | 100 | 20024 | 500 | 1829620 | 20124 | 200 | 20060 | 200 | 40496 | 1 | 20000 | 100 |
20204 | 105581 | 20122 | 101 | 20021 | 100 | 20047 | 364 | 1857617 | 20149 | 204 | 20114 | 200 | 40712 | 1 | 20000 | 100 |
20205 | 107197 | 20204 | 103 | 20101 | 102 | 20008 | 447 | 1913497 | 20108 | 200 | 20028 | 200 | 40096 | 1 | 20000 | 100 |
20204 | 109823 | 20120 | 101 | 20019 | 100 | 20041 | 434 | 1946942 | 20141 | 200 | 20106 | 202 | 40216 | 1 | 20000 | 100 |
20204 | 110296 | 20104 | 101 | 20003 | 100 | 20000 | 500 | 1985314 | 20100 | 200 | 20004 | 200 | 40152 | 1 | 20000 | 100 |
20204 | 106286 | 20158 | 101 | 20057 | 100 | 20016 | 500 | 1932411 | 20116 | 200 | 20048 | 200 | 40304 | 1 | 20000 | 100 |
20204 | 106307 | 20144 | 101 | 20043 | 100 | 20015 | 500 | 1915486 | 20115 | 200 | 20040 | 200 | 40144 | 1 | 20000 | 100 |
Result (median cycles for code): 10.0672
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
20025 | 100157 | 20046 | 11 | 20035 | 10 | 20009 | 50 | 1778625 | 20019 | 20 | 20016 | 20 | 40052 | 1 | 20000 | 0 | 10 |
20024 | 100251 | 20017 | 11 | 20006 | 10 | 20012 | 50 | 1782994 | 20022 | 20 | 20032 | 20 | 40032 | 1 | 20000 | 0 | 10 |
20024 | 100322 | 20019 | 11 | 20008 | 10 | 20025 | 50 | 1782653 | 20035 | 20 | 20056 | 20 | 40228 | 1 | 20000 | 0 | 10 |
20024 | 100479 | 20070 | 11 | 20059 | 10 | 20054 | 49 | 1785044 | 20064 | 20 | 20142 | 20 | 40124 | 1 | 20000 | 0 | 10 |
20024 | 100140 | 20045 | 11 | 20034 | 10 | 20009 | 50 | 1783395 | 20019 | 20 | 20016 | 20 | 40248 | 1 | 20000 | 0 | 10 |
20024 | 100693 | 20034 | 11 | 20023 | 10 | 20049 | 49 | 1784261 | 20059 | 20 | 20122 | 20 | 40204 | 1 | 20000 | 0 | 10 |
20024 | 100673 | 20031 | 11 | 20020 | 10 | 20043 | 49 | 1789553 | 20053 | 20 | 20106 | 20 | 40296 | 1 | 20000 | 0 | 10 |
20024 | 100419 | 20099 | 11 | 20088 | 10 | 20063 | 49 | 1785864 | 20073 | 20 | 20166 | 20 | 40180 | 1 | 20000 | 0 | 10 |
20024 | 100582 | 20045 | 11 | 20034 | 10 | 20109 | 49 | 1797192 | 20119 | 20 | 20218 | 20 | 40260 | 1 | 20000 | 0 | 10 |
20024 | 100885 | 20048 | 11 | 20037 | 10 | 20049 | 50 | 1792567 | 20059 | 20 | 20124 | 20 | 40188 | 1 | 20000 | 0 | 10 |