Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
swpal w0, w1, [x6] nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop
mov x0, 0
(no loop instructions)
Retires (minus 70 nops): 2.000
Issues: 2.000
Integer unit issues: 0.001
Load/store unit issues: 2.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch ldst uop (58) | simd uops in schedulers (5a) | dispatch uop (78) | map ldst uop (7d) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) |
72005 | 34377 | 2005 | 1 | 2004 | 2000 | 11788 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34131 | 2001 | 1 | 2000 | 2000 | 11788 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34173 | 2001 | 1 | 2000 | 2000 | 11794 | 2000 | 2000 | 4012 | 1 | 2000 |
72004 | 34210 | 2001 | 1 | 2000 | 2000 | 11788 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34112 | 2001 | 1 | 2000 | 2000 | 11788 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34112 | 2001 | 1 | 2000 | 2000 | 11788 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34103 | 2001 | 1 | 2000 | 2000 | 11788 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34082 | 2001 | 1 | 2000 | 2000 | 11788 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34164 | 2001 | 1 | 2000 | 2002 | 11799 | 2002 | 2002 | 4008 | 1 | 2000 |
72004 | 34627 | 2001 | 1 | 2000 | 2000 | 11788 | 2000 | 2000 | 4000 | 1 | 2000 |
Code:
swpal w0, w1, [x6] add x6, x6, 4
(fused SUBS/B.cc loop)
Result (median cycles for code): 6.0057
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30207 | 60456 | 30181 | 10126 | 20055 | 10127 | 20004 | 35252 | 125608 | 30106 | 10202 | 20004 | 10202 | 40008 | 10001 | 20000 | 10100 |
30204 | 60054 | 30103 | 10101 | 20002 | 10102 | 20002 | 35233 | 125527 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60054 | 30101 | 10101 | 20000 | 10101 | 20002 | 35233 | 125577 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60054 | 30101 | 10101 | 20000 | 10101 | 20026 | 35293 | 125845 | 30139 | 10213 | 20027 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60055 | 30101 | 10101 | 20000 | 10101 | 20051 | 35334 | 126651 | 30177 | 10227 | 20051 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60054 | 30101 | 10101 | 20000 | 10101 | 20002 | 35233 | 125505 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60054 | 30101 | 10101 | 20000 | 10101 | 20002 | 35233 | 125508 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60054 | 30101 | 10101 | 20000 | 10101 | 20002 | 35233 | 125493 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60054 | 30101 | 10101 | 20000 | 10101 | 20002 | 35233 | 125501 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30205 | 60100 | 30137 | 10113 | 20024 | 10113 | 20002 | 35249 | 125624 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
Result (median cycles for code): 6.0064
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30027 | 60441 | 30099 | 10039 | 20060 | 10039 | 20000 | 35066 | 126019 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60064 | 30011 | 10011 | 20000 | 10010 | 20000 | 35066 | 125988 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60064 | 30011 | 10011 | 20000 | 10010 | 20000 | 35066 | 125976 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60064 | 30011 | 10011 | 20000 | 10010 | 20000 | 35066 | 125999 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60064 | 30011 | 10011 | 20000 | 10010 | 20000 | 35066 | 125978 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60064 | 30011 | 10011 | 20000 | 10010 | 20000 | 35066 | 125995 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60064 | 30011 | 10011 | 20000 | 10010 | 20000 | 35066 | 125991 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60064 | 30011 | 10011 | 20000 | 10010 | 20000 | 35066 | 125987 | 30010 | 10020 | 20000 | 10033 | 40053 | 10013 | 20000 | 10010 |
30024 | 60064 | 30011 | 10011 | 20000 | 10010 | 20000 | 35066 | 125926 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60064 | 30011 | 10011 | 20000 | 10010 | 20000 | 35066 | 125980 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
Code:
swpal w0, w1, [x6]
mov x7, 8
(fused SUBS/B.cc loop)
Result (median cycles for code): 24.0046
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
20205 | 240150 | 20125 | 101 | 0 | 20024 | 100 | 0 | 20004 | 300 | 2365831 | 20104 | 200 | 20004 | 200 | 40048 | 1 | 20000 | 100 |
20204 | 240046 | 20105 | 101 | 0 | 20004 | 100 | 0 | 20004 | 300 | 2365842 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240046 | 20105 | 101 | 0 | 20004 | 100 | 0 | 20004 | 300 | 2365842 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240046 | 20105 | 101 | 0 | 20004 | 100 | 0 | 20004 | 300 | 2365842 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240046 | 20105 | 101 | 0 | 20004 | 100 | 0 | 20004 | 300 | 2365842 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240046 | 20105 | 101 | 0 | 20004 | 100 | 0 | 20004 | 300 | 2365851 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240046 | 20105 | 101 | 0 | 20004 | 100 | 0 | 20004 | 300 | 2365842 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240047 | 20105 | 101 | 0 | 20004 | 100 | 0 | 20004 | 300 | 2365842 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240046 | 20105 | 101 | 0 | 20004 | 100 | 0 | 20004 | 300 | 2365842 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20205 | 240114 | 20125 | 101 | 0 | 20024 | 100 | 0 | 20004 | 300 | 2365842 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
Result (median cycles for code): 24.0039
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
20025 | 240153 | 20035 | 11 | 0 | 20024 | 10 | 0 | 20024 | 30 | 2363544 | 20034 | 20 | 20024 | 20 | 40052 | 1 | 20000 | 0 | 10 |
20024 | 240040 | 20011 | 11 | 0 | 20000 | 10 | 0 | 20000 | 30 | 2362644 | 20010 | 20 | 20000 | 20 | 40000 | 1 | 20000 | 0 | 10 |
20024 | 240039 | 20011 | 11 | 0 | 20000 | 10 | 0 | 20020 | 30 | 2363132 | 20030 | 20 | 20020 | 20 | 40048 | 1 | 20000 | 0 | 10 |
20024 | 240039 | 20011 | 11 | 0 | 20000 | 10 | 0 | 20000 | 30 | 2362644 | 20010 | 20 | 20000 | 20 | 40000 | 1 | 20000 | 0 | 10 |
20024 | 240039 | 20011 | 11 | 0 | 20000 | 10 | 0 | 20000 | 30 | 2362644 | 20010 | 20 | 20000 | 20 | 40000 | 1 | 20000 | 0 | 10 |
20024 | 240039 | 20011 | 11 | 0 | 20000 | 10 | 0 | 20024 | 30 | 2362824 | 20034 | 20 | 20024 | 20 | 40000 | 1 | 20000 | 0 | 10 |
20024 | 240039 | 20011 | 11 | 0 | 20000 | 10 | 0 | 20000 | 30 | 2362644 | 20010 | 20 | 20000 | 20 | 40000 | 1 | 20000 | 0 | 10 |
20024 | 240039 | 20011 | 11 | 0 | 20000 | 10 | 0 | 20000 | 30 | 2362644 | 20010 | 20 | 20000 | 20 | 40000 | 1 | 20000 | 0 | 10 |
20026 | 240093 | 20055 | 11 | 0 | 20044 | 10 | 0 | 20000 | 30 | 2362644 | 20010 | 20 | 20000 | 20 | 40000 | 1 | 20000 | 0 | 10 |
20024 | 240039 | 20011 | 11 | 0 | 20000 | 10 | 0 | 20000 | 30 | 2362644 | 20010 | 20 | 20000 | 20 | 40000 | 1 | 20000 | 0 | 10 |