Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
swpalh w0, w1, [x6] nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop
mov x0, 0
(no loop instructions)
Retires (minus 70 nops): 2.000
Issues: 2.000
Integer unit issues: 0.001
Load/store unit issues: 2.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch ldst uop (58) | simd uops in schedulers (5a) | dispatch uop (78) | map ldst uop (7d) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) |
72005 | 35698 | 2005 | 1 | 2004 | 2000 | 11788 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34654 | 2001 | 1 | 2000 | 2000 | 11802 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34807 | 2001 | 1 | 2000 | 2000 | 11792 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34236 | 2001 | 1 | 2000 | 2000 | 11784 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34124 | 2001 | 1 | 2000 | 2000 | 11784 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34125 | 2001 | 1 | 2000 | 2000 | 11784 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34127 | 2001 | 1 | 2000 | 2000 | 11784 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34126 | 2001 | 1 | 2000 | 2000 | 11784 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34126 | 2001 | 1 | 2000 | 2000 | 11784 | 2000 | 2000 | 4001 | 1 | 2000 |
72004 | 34159 | 2001 | 1 | 2000 | 2000 | 11782 | 2000 | 2000 | 4000 | 1 | 2000 |
Code:
swpalh w0, w1, [x6] add x6, x6, 2
(fused SUBS/B.cc loop)
Result (median cycles for code): 6.0057
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30206 | 60508 | 30157 | 10118 | 20039 | 10119 | 20024 | 35288 | 125791 | 30134 | 10210 | 20024 | 10202 | 40008 | 10001 | 20000 | 10100 |
30204 | 60064 | 30101 | 10101 | 20000 | 10101 | 20002 | 35249 | 125377 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60057 | 30101 | 10101 | 20000 | 10101 | 20002 | 35249 | 125345 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60057 | 30101 | 10101 | 20000 | 10101 | 20002 | 35249 | 125330 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60057 | 30101 | 10101 | 20000 | 10101 | 20002 | 35249 | 125348 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60057 | 30101 | 10101 | 20000 | 10101 | 20002 | 35249 | 125328 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60057 | 30101 | 10101 | 20000 | 10101 | 20002 | 35249 | 125360 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60057 | 30101 | 10101 | 20000 | 10101 | 20002 | 35249 | 125352 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60057 | 30101 | 10101 | 20000 | 10101 | 20002 | 35249 | 125355 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30205 | 60103 | 30140 | 10114 | 20026 | 10114 | 20002 | 35249 | 125385 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
Result (median cycles for code): 6.0057
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map simd uop (7e) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30026 | 60335 | 30061 | 10027 | 20034 | 10027 | 20002 | 35069 | 125565 | 0 | 30013 | 10021 | 20003 | 0 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60054 | 30011 | 10011 | 20000 | 10010 | 34112 | 305138 | 289440 | 1063 | 63779 | 35692 | 35699 | 67 | 10021 | 40005 | 10001 | 20000 | 10010 |
30024 | 60461 | 30011 | 10011 | 20000 | 10010 | 20000 | 35050 | 125498 | 0 | 30010 | 10020 | 20000 | 0 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60057 | 30011 | 10011 | 20000 | 10010 | 20000 | 35050 | 125519 | 0 | 30010 | 10020 | 20000 | 0 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60057 | 30011 | 10011 | 20000 | 10010 | 20000 | 35050 | 125516 | 0 | 30010 | 10020 | 20000 | 0 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60057 | 30011 | 10011 | 20000 | 10010 | 20000 | 35050 | 125522 | 0 | 30010 | 10020 | 20000 | 0 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60057 | 30011 | 10011 | 20000 | 10010 | 20000 | 35050 | 125519 | 0 | 30010 | 10020 | 20000 | 0 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60057 | 30011 | 10011 | 20000 | 10010 | 20026 | 35108 | 125662 | 0 | 30049 | 10033 | 20027 | 0 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60067 | 30011 | 10011 | 20000 | 10010 | 20000 | 35050 | 125520 | 0 | 30010 | 10020 | 20000 | 0 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60057 | 30011 | 10011 | 20000 | 10010 | 20000 | 35050 | 125508 | 0 | 30010 | 10020 | 20000 | 0 | 10020 | 40000 | 10001 | 20000 | 10010 |
Code:
swpalh w0, w1, [x6]
mov x7, 8
(fused SUBS/B.cc loop)
Result (median cycles for code): 24.0044
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
20205 | 240165 | 20125 | 101 | 20024 | 100 | 20004 | 300 | 2365791 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240044 | 20105 | 101 | 20004 | 100 | 20004 | 300 | 2365697 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240037 | 20105 | 101 | 20004 | 100 | 20024 | 300 | 2371529 | 20124 | 200 | 20024 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240037 | 20105 | 101 | 20004 | 100 | 20004 | 300 | 2365697 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240037 | 20105 | 101 | 20004 | 100 | 20004 | 300 | 2365697 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20205 | 240095 | 20125 | 101 | 20024 | 100 | 20004 | 300 | 2365802 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240037 | 20105 | 101 | 20004 | 100 | 20004 | 300 | 2365697 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240037 | 20105 | 101 | 20004 | 100 | 20024 | 300 | 2365859 | 20124 | 200 | 20024 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240037 | 20105 | 101 | 20004 | 100 | 20024 | 300 | 2366312 | 20124 | 200 | 20024 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240037 | 20105 | 101 | 20004 | 100 | 20004 | 300 | 2365697 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
Result (median cycles for code): 24.0046
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
20025 | 240150 | 20035 | 11 | 0 | 20024 | 10 | 0 | 20004 | 30 | 2362756 | 20014 | 20 | 20004 | 20 | 40000 | 0 | 1 | 20000 | 0 | 10 |
20024 | 240046 | 20011 | 11 | 0 | 20000 | 10 | 0 | 20000 | 30 | 2362749 | 20010 | 20 | 20000 | 20 | 40000 | 0 | 1 | 20000 | 0 | 10 |
20024 | 240046 | 20015 | 11 | 0 | 20004 | 10 | 0 | 20000 | 30 | 2362749 | 20010 | 20 | 20000 | 20 | 40000 | 0 | 1 | 20000 | 0 | 10 |
29240 | 267814 | 27894 | 5082 | 4 | 22808 | 4607 | 4 | 20000 | 30 | 2362781 | 20010 | 20 | 20000 | 20 | 40048 | 0 | 1 | 20000 | 0 | 10 |
20025 | 240080 | 20035 | 11 | 0 | 20024 | 10 | 0 | 20000 | 30 | 2362749 | 20010 | 20 | 20000 | 20 | 40000 | 0 | 1 | 20000 | 0 | 10 |
20024 | 240046 | 20011 | 11 | 0 | 20000 | 10 | 0 | 20000 | 30 | 2362749 | 20010 | 20 | 20000 | 20 | 40000 | 0 | 1 | 20000 | 0 | 10 |
20024 | 240046 | 20011 | 11 | 0 | 20000 | 10 | 0 | 20026 | 30 | 2363061 | 20036 | 20 | 20026 | 20 | 40000 | 0 | 1 | 20000 | 0 | 10 |
20024 | 240046 | 20011 | 11 | 0 | 20000 | 10 | 0 | 20000 | 30 | 2362749 | 20010 | 20 | 20000 | 20 | 40000 | 0 | 1 | 20000 | 0 | 10 |
20024 | 240046 | 20011 | 11 | 0 | 20000 | 10 | 0 | 20000 | 30 | 2362749 | 20010 | 20 | 20000 | 20 | 40048 | 0 | 1 | 20000 | 0 | 10 |
20025 | 240087 | 20035 | 11 | 0 | 20024 | 10 | 0 | 20000 | 30 | 2362749 | 20010 | 20 | 20000 | 20 | 40000 | 0 | 1 | 20000 | 0 | 10 |