Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
steorb w0, [x6] nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop
mov x0, 0
(no loop instructions)
Retires (minus 70 nops): 3.000
Issues: 3.002
Integer unit issues: 1.003
Load/store unit issues: 2.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
73005 | 34554 | 3018 | 1014 | 2004 | 1002 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34226 | 3003 | 1003 | 2000 | 1000 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34242 | 3003 | 1003 | 2000 | 1000 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34298 | 3003 | 1003 | 2000 | 1000 | 2000 | 7773 | 10527 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34220 | 3003 | 1003 | 2000 | 1000 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34215 | 3003 | 1003 | 2000 | 1000 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34200 | 3003 | 1003 | 2000 | 1000 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34257 | 3003 | 1003 | 2000 | 1000 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73005 | 34398 | 3006 | 1004 | 2002 | 1001 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34219 | 3003 | 1003 | 2000 | 1000 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2002 | 4004 | 1004 | 2000 | 1000 |
Code:
steorb w0, [x6] add x6, x6, 2
(fused SUBS/B.cc loop)
Result (median cycles for code): 3.0063
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
40206 | 30365 | 40254 | 20211 | 20043 | 20168 | 20004 | 115706 | 106111 | 40108 | 20204 | 20004 | 30211 | 40013 | 20013 | 20000 | 20100 |
40204 | 30063 | 40114 | 20112 | 20002 | 20104 | 20004 | 115842 | 105831 | 40108 | 20204 | 20004 | 30206 | 40008 | 20012 | 20000 | 20100 |
40204 | 30063 | 40114 | 20112 | 20002 | 20104 | 20007 | 115852 | 106082 | 40114 | 20207 | 20007 | 30254 | 40072 | 20051 | 20000 | 20100 |
40204 | 30063 | 40114 | 20112 | 20002 | 20104 | 20004 | 115898 | 105892 | 40108 | 20204 | 20004 | 30206 | 40008 | 20012 | 20000 | 20100 |
40204 | 30285 | 40113 | 20111 | 20002 | 20104 | 20007 | 115866 | 106057 | 40114 | 20207 | 20007 | 30206 | 40008 | 20009 | 20000 | 20100 |
40204 | 30066 | 40111 | 20109 | 20002 | 20104 | 20004 | 115929 | 105926 | 40108 | 20204 | 20004 | 30206 | 40008 | 20009 | 20000 | 20100 |
40204 | 30066 | 40111 | 20109 | 20002 | 20104 | 20004 | 115937 | 105942 | 40108 | 20204 | 20004 | 30206 | 40008 | 20009 | 20000 | 20100 |
40204 | 30066 | 40111 | 20109 | 20002 | 20104 | 20004 | 115938 | 105944 | 40108 | 20204 | 20004 | 30206 | 40008 | 20009 | 20000 | 20100 |
40204 | 30066 | 40111 | 20109 | 20002 | 20104 | 20004 | 115950 | 105968 | 40108 | 20204 | 20004 | 30206 | 40008 | 20009 | 20000 | 20100 |
40204 | 30066 | 40111 | 20109 | 20002 | 20104 | 20004 | 115935 | 105938 | 40108 | 20204 | 20004 | 30206 | 40008 | 20009 | 20000 | 20100 |
Result (median cycles for code): 3.0066
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
40027 | 30487 | 40218 | 20148 | 20070 | 20104 | 20004 | 116136 | 106378 | 40018 | 20024 | 20004 | 30020 | 40000 | 20008 | 20000 | 20010 |
40024 | 30066 | 40018 | 20018 | 20000 | 20010 | 20000 | 115723 | 105887 | 40010 | 20020 | 20000 | 30020 | 40000 | 20008 | 20000 | 20010 |
40024 | 30066 | 40018 | 20018 | 20000 | 20010 | 20000 | 115706 | 105895 | 40010 | 20020 | 20000 | 30020 | 40000 | 20008 | 20000 | 20010 |
40024 | 30066 | 40018 | 20018 | 20000 | 20010 | 20000 | 115735 | 105911 | 40010 | 20020 | 20000 | 30020 | 40000 | 20008 | 20000 | 20010 |
40024 | 30066 | 40018 | 20018 | 20000 | 20010 | 20000 | 115733 | 105905 | 40010 | 20020 | 20000 | 30020 | 40000 | 20008 | 20000 | 20010 |
40025 | 30132 | 40096 | 20060 | 20036 | 20048 | 20000 | 115714 | 105911 | 40010 | 20020 | 20000 | 30020 | 40000 | 20008 | 20000 | 20010 |
40024 | 30066 | 40018 | 20018 | 20000 | 20010 | 20000 | 115703 | 105845 | 40010 | 20020 | 20000 | 30020 | 40000 | 20008 | 20000 | 20010 |
40024 | 30066 | 40018 | 20018 | 20000 | 20010 | 20000 | 115725 | 105889 | 40010 | 20020 | 20000 | 30020 | 40000 | 20008 | 20000 | 20010 |
40024 | 30066 | 40018 | 20018 | 20000 | 20010 | 20000 | 115730 | 105901 | 40010 | 20020 | 20000 | 30020 | 40000 | 20008 | 20000 | 20010 |
40024 | 30066 | 40018 | 20018 | 20000 | 20010 | 20000 | 115737 | 105915 | 40010 | 20020 | 20000 | 30020 | 40000 | 20008 | 20000 | 20010 |
Code:
steorb w0, [x6]
mov x7, 8
(fused SUBS/B.cc loop)
Result (median cycles for code): 12.9754
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map simd uop (7e) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30206 | 126776 | 41189 | 21120 | 20069 | 10145 | 20000 | 2368801 | 2242338 | 0 | 30100 | 10200 | 20000 | 0 | 20256 | 40109 | 21150 | 20000 | 10100 |
30204 | 129822 | 41427 | 21419 | 20008 | 10100 | 20000 | 2368800 | 2242338 | 0 | 30100 | 10200 | 20000 | 0 | 20200 | 40000 | 20983 | 20000 | 10100 |
30204 | 129754 | 41409 | 21409 | 20000 | 10100 | 20000 | 2449943 | 2311807 | 0 | 30100 | 10200 | 20000 | 0 | 20200 | 40000 | 21309 | 20000 | 10100 |
30204 | 129754 | 41409 | 21409 | 20000 | 10100 | 20000 | 2449943 | 2311807 | 0 | 30100 | 10200 | 20000 | 0 | 20200 | 40000 | 21309 | 20000 | 10100 |
30204 | 129754 | 41409 | 21409 | 20000 | 10100 | 20000 | 2449943 | 2311807 | 0 | 30100 | 10200 | 20000 | 0 | 20200 | 40000 | 21309 | 20000 | 10100 |
30204 | 129754 | 41409 | 21409 | 20000 | 10100 | 20048 | 2479152 | 2338895 | 0 | 30176 | 10228 | 20056 | 0 | 20200 | 40000 | 21273 | 20000 | 10100 |
30204 | 129826 | 41382 | 21373 | 20009 | 10100 | 20000 | 2449053 | 2312542 | 0 | 30100 | 10200 | 20000 | 0 | 20200 | 40000 | 21273 | 20000 | 10100 |
30204 | 129826 | 41382 | 21373 | 20009 | 10100 | 20000 | 2449053 | 2312542 | 0 | 30100 | 10200 | 20000 | 0 | 20200 | 40000 | 21273 | 20000 | 10100 |
30204 | 129754 | 41409 | 21409 | 20000 | 10100 | 20000 | 2449943 | 2311807 | 0 | 30100 | 10200 | 20000 | 0 | 20200 | 40000 | 21309 | 20000 | 10100 |
30204 | 129754 | 41409 | 21409 | 20000 | 10100 | 20000 | 2449943 | 2311807 | 0 | 30100 | 10200 | 20000 | 0 | 20200 | 40000 | 21309 | 20000 | 10100 |
Result (median cycles for code): 12.9754
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map simd uop (7e) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30026 | 130460 | 41329 | 21246 | 20083 | 10060 | 20000 | 2456139 | 2312957 | 0 | 30010 | 10020 | 20000 | 0 | 20020 | 40000 | 21272 | 20000 | 10010 |
30024 | 129749 | 41281 | 21281 | 20000 | 10010 | 20000 | 2456143 | 2312957 | 0 | 30010 | 10020 | 20000 | 0 | 20020 | 40000 | 21272 | 20000 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20000 | 2456143 | 2312957 | 0 | 30010 | 10020 | 20000 | 0 | 20020 | 40000 | 21272 | 20000 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20000 | 2456143 | 2312957 | 0 | 30010 | 10020 | 20000 | 0 | 20020 | 40000 | 21272 | 20000 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20000 | 2456143 | 2312957 | 0 | 30010 | 10020 | 20000 | 0 | 20062 | 40085 | 21279 | 20000 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20000 | 2456143 | 2312957 | 0 | 30010 | 10020 | 20000 | 0 | 20020 | 40000 | 21272 | 20000 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20000 | 2456143 | 2312957 | 0 | 30010 | 10020 | 20000 | 0 | 20020 | 40000 | 21272 | 20000 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20000 | 2456143 | 2312957 | 0 | 30010 | 10020 | 20000 | 0 | 20020 | 40000 | 21272 | 20000 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20000 | 2456143 | 2312957 | 0 | 30010 | 10020 | 20000 | 0 | 20020 | 40000 | 21272 | 20000 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20000 | 2456143 | 2312957 | 0 | 30010 | 10020 | 20000 | 0 | 20020 | 40000 | 21272 | 20000 | 10010 |