Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
staddb w0, [x6] nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop
mov x0, 0
(no loop instructions)
Retires (minus 70 nops): 3.000
Issues: 3.002
Integer unit issues: 1.003
Load/store unit issues: 2.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
73006 | 34510 | 3039 | 1025 | 2014 | 1007 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34170 | 3003 | 1003 | 2000 | 1000 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34164 | 3003 | 1003 | 2000 | 1000 | 2000 | 7772 | 10525 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34453 | 3003 | 1003 | 2000 | 1000 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34184 | 3003 | 1003 | 2000 | 1000 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34152 | 3003 | 1003 | 2000 | 1000 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34160 | 3003 | 1003 | 2000 | 1000 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34171 | 3003 | 1003 | 2000 | 1000 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34179 | 3003 | 1003 | 2000 | 1000 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34159 | 3003 | 1003 | 2000 | 1000 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
Code:
staddb w0, [x6] add x6, x6, 2
(fused SUBS/B.cc loop)
Result (median cycles for code): 3.0056
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
40206 | 30458 | 40298 | 20255 | 20043 | 20168 | 20007 | 115871 | 106071 | 40114 | 20207 | 20007 | 30211 | 40013 | 0 | 20011 | 20000 | 0 | 20100 |
40204 | 30066 | 40112 | 20110 | 20002 | 20104 | 20004 | 115901 | 105872 | 40108 | 20204 | 20004 | 30206 | 40008 | 0 | 20009 | 20000 | 0 | 20100 |
40204 | 30066 | 40112 | 20110 | 20002 | 20104 | 20004 | 115924 | 105916 | 40108 | 20204 | 20004 | 30206 | 40008 | 0 | 20010 | 20000 | 0 | 20100 |
40204 | 30066 | 40111 | 20109 | 20002 | 20104 | 20004 | 115946 | 105960 | 40108 | 20204 | 20004 | 30206 | 40008 | 0 | 20009 | 20000 | 0 | 20100 |
40205 | 30129 | 40186 | 20150 | 20036 | 20138 | 20035 | 111530 | 117927 | 40172 | 20237 | 20035 | 30206 | 40008 | 0 | 20009 | 20000 | 0 | 20100 |
40204 | 30066 | 40112 | 20110 | 20002 | 20104 | 20004 | 115924 | 105916 | 40108 | 20204 | 20004 | 30206 | 40008 | 0 | 20010 | 20000 | 0 | 20100 |
40204 | 30066 | 40111 | 20109 | 20002 | 20104 | 20004 | 115927 | 105922 | 40108 | 20204 | 20004 | 30206 | 40008 | 0 | 20009 | 20000 | 0 | 20100 |
40204 | 30066 | 40111 | 20109 | 20002 | 20104 | 20004 | 115918 | 105895 | 40108 | 20204 | 20004 | 30206 | 40008 | 0 | 20009 | 20000 | 0 | 20100 |
40204 | 30066 | 40112 | 20110 | 20002 | 20104 | 20004 | 115918 | 105904 | 40108 | 20204 | 20004 | 30206 | 40008 | 0 | 20009 | 20000 | 0 | 20100 |
40204 | 30066 | 40112 | 20110 | 20002 | 20104 | 20004 | 115914 | 105896 | 40108 | 20204 | 20004 | 30206 | 40008 | 0 | 20009 | 20000 | 0 | 20100 |
Result (median cycles for code): 3.0066
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
40026 | 30364 | 40157 | 20117 | 20040 | 20075 | 20004 | 115760 | 105963 | 40018 | 20024 | 20004 | 30020 | 40000 | 20010 | 20000 | 20010 |
40024 | 30066 | 40021 | 20021 | 20000 | 20010 | 20036 | 114302 | 107697 | 40082 | 20056 | 20036 | 30020 | 40000 | 20017 | 20000 | 20010 |
40024 | 30066 | 40025 | 20025 | 20000 | 20010 | 20000 | 115772 | 105946 | 40010 | 20020 | 20000 | 30020 | 40000 | 20016 | 20000 | 20010 |
40024 | 30066 | 40027 | 20027 | 20000 | 20010 | 20000 | 115769 | 105929 | 40010 | 20020 | 20000 | 30020 | 40000 | 20017 | 20000 | 20010 |
40024 | 30066 | 40026 | 20026 | 20000 | 20010 | 20000 | 115774 | 105943 | 40010 | 20020 | 20000 | 30020 | 40000 | 20018 | 20000 | 20010 |
40024 | 30066 | 40020 | 20020 | 20000 | 20010 | 20000 | 115773 | 105947 | 40010 | 20020 | 20000 | 30020 | 40000 | 20017 | 20000 | 20010 |
40024 | 30066 | 40026 | 20026 | 20000 | 20010 | 20030 | 114304 | 107744 | 40071 | 20051 | 20030 | 30020 | 40000 | 20017 | 20000 | 20010 |
40024 | 30066 | 40023 | 20023 | 20000 | 20010 | 20000 | 115749 | 105904 | 40010 | 20020 | 20000 | 30020 | 40000 | 20011 | 20000 | 20010 |
40024 | 30066 | 40024 | 20024 | 20000 | 20010 | 20000 | 115756 | 105936 | 40010 | 20020 | 20000 | 30020 | 40000 | 20012 | 20000 | 20010 |
40025 | 30132 | 40103 | 20067 | 20036 | 20049 | 20000 | 115786 | 105976 | 40010 | 20020 | 20000 | 30020 | 40000 | 20016 | 20000 | 20010 |
Code:
staddb w0, [x6]
mov x7, 8
(fused SUBS/B.cc loop)
Result (median cycles for code): 12.9761
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map simd uop (7e) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
30206 | 124053 | 40827 | 20760 | 0 | 20067 | 10145 | 0 | 20006 | 2450368 | 2312371 | 0 | 30113 | 10207 | 20013 | 0 | 20200 | 40000 | 21310 | 20000 | 0 | 10100 |
30204 | 126467 | 41203 | 21203 | 0 | 20000 | 10100 | 0 | 20000 | 2450048 | 2311933 | 0 | 30100 | 10200 | 20000 | 0 | 20200 | 40000 | 21310 | 20000 | 0 | 10100 |
30204 | 129761 | 41410 | 21410 | 0 | 20000 | 10100 | 0 | 20000 | 2450048 | 2311933 | 0 | 30100 | 10200 | 20000 | 0 | 20200 | 40000 | 21310 | 20000 | 0 | 10100 |
30204 | 129761 | 41410 | 21410 | 0 | 20000 | 10100 | 0 | 20000 | 2450048 | 2311933 | 0 | 30100 | 10200 | 20000 | 0 | 20200 | 40000 | 21310 | 20000 | 0 | 10100 |
30204 | 129761 | 41410 | 21410 | 0 | 20000 | 10100 | 0 | 20011 | 2504885 | 2364296 | 0 | 30119 | 10209 | 20019 | 0 | 20200 | 40000 | 21539 | 20000 | 0 | 10100 |
30204 | 131270 | 41543 | 21535 | 0 | 20008 | 10100 | 0 | 20049 | 2488588 | 2350202 | 0 | 30178 | 10229 | 20056 | 0 | 20256 | 40109 | 21336 | 20000 | 0 | 10100 |
30204 | 124632 | 40963 | 20963 | 0 | 20000 | 10100 | 0 | 20000 | 2344612 | 2219536 | 0 | 30100 | 10200 | 20000 | 0 | 20200 | 40000 | 20864 | 20000 | 0 | 10100 |
30204 | 124642 | 40964 | 20964 | 0 | 20000 | 10100 | 0 | 20000 | 2450048 | 2311933 | 0 | 30100 | 10200 | 20000 | 0 | 20256 | 40110 | 20891 | 20000 | 0 | 10100 |
30204 | 129761 | 41410 | 21410 | 0 | 20000 | 10100 | 0 | 20000 | 2450048 | 2311933 | 0 | 30100 | 10200 | 20000 | 0 | 20200 | 40000 | 21310 | 20000 | 0 | 10100 |
30204 | 129761 | 41410 | 21410 | 0 | 20000 | 10100 | 0 | 20000 | 2450048 | 2311933 | 0 | 30100 | 10200 | 20000 | 0 | 20200 | 40000 | 21310 | 20000 | 0 | 10100 |
Result (median cycles for code): 12.6476
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30028 | 127906 | 41226 | 21052 | 20174 | 10124 | 20000 | 2426841 | 2287403 | 30010 | 10020 | 20000 | 20020 | 40000 | 21185 | 20000 | 10010 |
30024 | 128353 | 41193 | 21193 | 20000 | 10010 | 20000 | 2456248 | 2313083 | 30010 | 10020 | 20000 | 20020 | 40000 | 21273 | 20000 | 10010 |
30024 | 129761 | 41283 | 21283 | 20000 | 10010 | 20000 | 2456248 | 2313083 | 30010 | 10020 | 20000 | 20020 | 40000 | 21273 | 20000 | 10010 |
30024 | 129761 | 41283 | 21283 | 20000 | 10010 | 20000 | 2456248 | 2313083 | 30010 | 10020 | 20000 | 20032 | 40024 | 21315 | 20000 | 10010 |
30024 | 129761 | 41283 | 21283 | 20000 | 10010 | 20000 | 2456248 | 2313083 | 30010 | 10020 | 20000 | 20020 | 40000 | 21273 | 20000 | 10010 |
30024 | 129761 | 41283 | 21283 | 20000 | 10010 | 20000 | 2456248 | 2313083 | 30010 | 10020 | 20000 | 20020 | 40000 | 21273 | 20000 | 10010 |
30024 | 129761 | 41283 | 21283 | 20000 | 10010 | 20000 | 2456248 | 2313083 | 30010 | 10020 | 20000 | 20020 | 40000 | 21269 | 20000 | 10010 |
30502 | 130153 | 41656 | 21397 | 20259 | 10344 | 20130 | 2449573 | 2306986 | 30211 | 10091 | 20141 | 20040 | 40040 | 20390 | 20000 | 10010 |
30024 | 126437 | 41107 | 21107 | 20000 | 10010 | 20050 | 2352447 | 2222159 | 30088 | 10048 | 20056 | 20020 | 40000 | 21099 | 20000 | 10010 |
30024 | 126490 | 41113 | 21113 | 20000 | 10010 | 20000 | 2388821 | 2253634 | 30010 | 10020 | 20000 | 20020 | 40000 | 21102 | 20000 | 10010 |