Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
steor w0, [x6] nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop
mov x0, 0
(no loop instructions)
Retires (minus 70 nops): 3.000
Issues: 3.001
Integer unit issues: 1.002
Load/store unit issues: 2.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
73006 | 34630 | 3039 | 1025 | 2014 | 1007 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34172 | 3003 | 1003 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34097 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34111 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34113 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34115 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34098 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34100 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34096 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34095 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
Code:
steor w0, [x6] add x6, x6, 4
(fused SUBS/B.cc loop)
Result (median cycles for code): 3.0066
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
40208 | 30733 | 40382 | 20294 | 20088 | 20233 | 20007 | 115993 | 106394 | 40114 | 20207 | 20007 | 30211 | 40013 | 20010 | 20000 | 20100 |
40204 | 30063 | 40110 | 20108 | 20002 | 20104 | 20004 | 116097 | 106327 | 40108 | 20204 | 20004 | 30211 | 40013 | 20010 | 20000 | 20100 |
40204 | 30063 | 40110 | 20108 | 20002 | 20104 | 20004 | 116088 | 106309 | 40108 | 20204 | 20004 | 30206 | 40008 | 20008 | 20000 | 20100 |
40204 | 30063 | 40110 | 20108 | 20002 | 20104 | 20004 | 116095 | 106321 | 40108 | 20204 | 20004 | 30206 | 40008 | 20008 | 20000 | 20100 |
40204 | 30063 | 40110 | 20108 | 20002 | 20104 | 20004 | 116098 | 106329 | 40108 | 20204 | 20004 | 30206 | 40008 | 20008 | 20000 | 20100 |
40204 | 30063 | 40110 | 20108 | 20002 | 20104 | 20004 | 116100 | 106333 | 40108 | 20204 | 20004 | 30206 | 40008 | 20008 | 20000 | 20100 |
40204 | 30063 | 40110 | 20108 | 20002 | 20104 | 20004 | 116087 | 106309 | 40108 | 20204 | 20004 | 30206 | 40008 | 20009 | 20000 | 20100 |
40204 | 30063 | 40110 | 20108 | 20002 | 20104 | 20004 | 116094 | 106321 | 40108 | 20204 | 20004 | 30206 | 40008 | 20008 | 20000 | 20100 |
40204 | 30063 | 40110 | 20108 | 20002 | 20104 | 20004 | 116090 | 106311 | 40108 | 20204 | 20004 | 30206 | 40008 | 20008 | 20000 | 20100 |
40204 | 30063 | 40110 | 20108 | 20002 | 20104 | 20004 | 116085 | 106303 | 40108 | 20204 | 20004 | 30206 | 40008 | 20008 | 20000 | 20100 |
Result (median cycles for code): 3.0066
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
40028 | 30598 | 40305 | 20214 | 20091 | 20134 | 20004 | 115895 | 106330 | 40018 | 20024 | 20004 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30063 | 40017 | 20017 | 20000 | 20010 | 20000 | 115889 | 106327 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30063 | 40017 | 20017 | 20000 | 20010 | 20000 | 115886 | 106332 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30063 | 40017 | 20017 | 20000 | 20010 | 20000 | 115882 | 106324 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30063 | 40017 | 20017 | 20000 | 20010 | 20000 | 115869 | 106296 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30063 | 40017 | 20017 | 20000 | 20010 | 20000 | 115889 | 106340 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30063 | 40017 | 20017 | 20000 | 20010 | 20000 | 115890 | 106340 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30063 | 40017 | 20017 | 20000 | 20010 | 20000 | 115886 | 106332 | 40010 | 20020 | 20000 | 30020 | 40000 | 20008 | 20000 | 20010 |
40024 | 30063 | 40018 | 20018 | 20000 | 20010 | 20000 | 115835 | 106240 | 40010 | 20020 | 20000 | 30020 | 40000 | 20008 | 20000 | 20010 |
40024 | 30063 | 40018 | 20018 | 20000 | 20010 | 20038 | 116442 | 108864 | 40086 | 20058 | 20038 | 30020 | 40000 | 20008 | 20000 | 20010 |
Code:
steor w0, [x6]
mov x7, 8
(fused SUBS/B.cc loop)
Result (median cycles for code): 12.9761
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
30206 | 131806 | 41684 | 21615 | 20069 | 10145 | 20000 | 2466551 | 2327626 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21408 | 20000 | 0 | 10100 |
30204 | 130606 | 41513 | 21508 | 20005 | 10100 | 20000 | 2382541 | 2252434 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21102 | 20000 | 0 | 10100 |
30204 | 126476 | 41202 | 21202 | 20000 | 10100 | 20000 | 2382667 | 2252542 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21102 | 20000 | 0 | 10100 |
30204 | 126489 | 41202 | 21202 | 20000 | 10100 | 20000 | 2382875 | 2252722 | 30100 | 10200 | 20000 | 20258 | 40110 | 0 | 20743 | 20000 | 0 | 10100 |
30204 | 126476 | 41202 | 21202 | 20000 | 10100 | 20000 | 2382541 | 2252434 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21098 | 20000 | 0 | 10100 |
30204 | 126476 | 41202 | 21202 | 20000 | 10100 | 20000 | 2382724 | 2252596 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21102 | 20000 | 0 | 10100 |
30204 | 126708 | 41533 | 21178 | 20355 | 10316 | 20000 | 2217860 | 2118061 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21103 | 20000 | 0 | 10100 |
30204 | 126245 | 41183 | 21172 | 20011 | 10100 | 20000 | 2382121 | 2252057 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21101 | 20000 | 0 | 10100 |
30204 | 126476 | 41202 | 21202 | 20000 | 10100 | 20000 | 2382541 | 2252434 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21102 | 20000 | 0 | 10100 |
30204 | 126476 | 41202 | 21202 | 20000 | 10100 | 20000 | 2382541 | 2252434 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21102 | 20000 | 0 | 10100 |
Result (median cycles for code): 12.9754
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map simd uop (7e) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30025 | 132252 | 41534 | 21497 | 20037 | 10041 | 20000 | 2453857 | 2310541 | 0 | 30010 | 10020 | 20000 | 0 | 20020 | 40000 | 21341 | 20000 | 10010 |
30024 | 129613 | 41351 | 21351 | 20000 | 10010 | 20000 | 2453755 | 2310415 | 0 | 30010 | 10020 | 20000 | 0 | 20020 | 40000 | 21341 | 20000 | 10010 |
30024 | 129613 | 41351 | 21351 | 20000 | 10010 | 20000 | 2453755 | 2310415 | 0 | 30010 | 10020 | 20000 | 0 | 20020 | 40000 | 21341 | 20000 | 10010 |
30024 | 129613 | 41351 | 21351 | 20000 | 10010 | 20054 | 2462944 | 2318491 | 0 | 30095 | 10051 | 20061 | 0 | 20020 | 40000 | 21273 | 20000 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20000 | 2456143 | 2312957 | 0 | 30010 | 10020 | 20000 | 0 | 20020 | 40000 | 21272 | 20000 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20051 | 2379238 | 2245505 | 0 | 30091 | 10050 | 20058 | 0 | 20020 | 40000 | 21273 | 20000 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20000 | 2456143 | 2312957 | 0 | 30010 | 10020 | 20000 | 0 | 20020 | 40000 | 21272 | 20000 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20000 | 2456143 | 2312957 | 0 | 30010 | 10020 | 20000 | 0 | 20076 | 40110 | 21195 | 20000 | 10010 |
30024 | 129761 | 41283 | 21283 | 20000 | 10010 | 20000 | 2454927 | 2311979 | 0 | 30010 | 10020 | 20000 | 0 | 20020 | 40000 | 21272 | 20000 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20000 | 2456143 | 2312957 | 0 | 30010 | 10020 | 20000 | 0 | 20020 | 40000 | 21272 | 20000 | 10010 |