Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
stclr w0, [x6] nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop
mov x0, 0
(no loop instructions)
Retires (minus 70 nops): 3.000
Issues: 3.002
Integer unit issues: 1.003
Load/store unit issues: 2.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
73005 | 34375 | 3019 | 1015 | 2004 | 1002 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34102 | 3003 | 1003 | 2000 | 1000 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34095 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34119 | 3002 | 1002 | 2000 | 1000 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34122 | 3003 | 1003 | 2000 | 1000 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34108 | 3002 | 1002 | 2000 | 1000 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34107 | 3003 | 1003 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34108 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34119 | 3002 | 1002 | 2000 | 1000 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34119 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
Code:
stclr w0, [x6] add x6, x6, 4
(fused SUBS/B.cc loop)
Result (median cycles for code): 3.0063
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
40207 | 30647 | 40390 | 20324 | 20066 | 20201 | 20007 | 116011 | 106424 | 40114 | 20207 | 20007 | 30206 | 40008 | 20009 | 20000 | 20100 |
40204 | 30063 | 40116 | 20111 | 20005 | 20107 | 20007 | 116064 | 106526 | 40114 | 20207 | 20007 | 30206 | 40008 | 20009 | 20000 | 20100 |
40204 | 30063 | 40110 | 20108 | 20002 | 20104 | 20004 | 116089 | 106317 | 40108 | 20204 | 20004 | 30206 | 40008 | 20009 | 20000 | 20100 |
40204 | 30068 | 40112 | 20110 | 20002 | 20104 | 20004 | 116102 | 106326 | 40108 | 20204 | 20004 | 30206 | 40008 | 20010 | 20000 | 20100 |
40204 | 30063 | 40111 | 20109 | 20002 | 20104 | 20004 | 116141 | 106356 | 40108 | 20204 | 20004 | 30206 | 40008 | 20009 | 20000 | 20100 |
40204 | 30063 | 40111 | 20109 | 20002 | 20104 | 20004 | 116118 | 106354 | 40108 | 20204 | 20004 | 30206 | 40008 | 20009 | 20000 | 20100 |
40204 | 30063 | 40111 | 20109 | 20002 | 20104 | 20036 | 110420 | 114970 | 40172 | 20236 | 20036 | 30206 | 40008 | 20008 | 20000 | 20100 |
40205 | 30141 | 40185 | 20152 | 20033 | 20137 | 20004 | 116090 | 106294 | 40108 | 20204 | 20004 | 30206 | 40008 | 20009 | 20000 | 20100 |
40204 | 30063 | 40111 | 20109 | 20002 | 20104 | 20004 | 116068 | 106272 | 40108 | 20204 | 20004 | 30206 | 40008 | 20009 | 20000 | 20100 |
40204 | 30063 | 40111 | 20109 | 20002 | 20104 | 20004 | 116099 | 106325 | 40108 | 20204 | 20004 | 30206 | 40008 | 20008 | 20000 | 20100 |
Result (median cycles for code): 3.0056
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
40027 | 30514 | 40225 | 20166 | 20059 | 20104 | 20004 | 115698 | 106084 | 40018 | 20024 | 20004 | 30020 | 40000 | 20006 | 20000 | 20010 |
40024 | 30058 | 40018 | 20018 | 20000 | 20010 | 20000 | 115728 | 106158 | 40010 | 20020 | 20000 | 30020 | 40000 | 20006 | 20000 | 20010 |
40024 | 30056 | 40016 | 20016 | 20000 | 20010 | 20000 | 115752 | 106204 | 40010 | 20020 | 20000 | 30020 | 40000 | 20006 | 20000 | 20010 |
40024 | 30056 | 40016 | 20016 | 20000 | 20010 | 20000 | 115733 | 106168 | 40010 | 20020 | 20000 | 30020 | 40000 | 20006 | 20000 | 20010 |
40024 | 30059 | 40018 | 20018 | 20000 | 20010 | 20000 | 115709 | 106120 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30056 | 40016 | 20016 | 20000 | 20010 | 20000 | 115705 | 106120 | 40010 | 20020 | 20000 | 30020 | 40000 | 20006 | 20000 | 20010 |
40024 | 30058 | 40017 | 20017 | 20000 | 20010 | 20004 | 115500 | 107857 | 40018 | 20024 | 20004 | 30020 | 40000 | 20028 | 20000 | 20010 |
40024 | 30059 | 40037 | 20037 | 20000 | 20010 | 20000 | 115974 | 106311 | 40010 | 20020 | 20000 | 30020 | 40000 | 20023 | 20000 | 20010 |
40024 | 30059 | 40037 | 20037 | 20000 | 20010 | 20000 | 115999 | 106375 | 40010 | 20020 | 20000 | 30020 | 40000 | 20027 | 20000 | 20010 |
40024 | 30059 | 40037 | 20037 | 20000 | 20010 | 20000 | 115998 | 106373 | 40010 | 20020 | 20000 | 30074 | 40072 | 20104 | 20000 | 20010 |
Code:
stclr w0, [x6]
mov x7, 8
(fused SUBS/B.cc loop)
Result (median cycles for code): 12.6476
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
30205 | 127077 | 41237 | 21200 | 20037 | 10131 | 20000 | 2400951 | 2270397 | 30100 | 10200 | 20000 | 20200 | 40000 | 21102 | 20000 | 0 | 10100 |
30204 | 126476 | 41202 | 21202 | 20000 | 10100 | 20000 | 2382436 | 2252308 | 30100 | 10200 | 20000 | 20200 | 40000 | 21101 | 20000 | 0 | 10100 |
30204 | 126476 | 41202 | 21202 | 20000 | 10100 | 20051 | 2363121 | 2235503 | 30181 | 10230 | 20058 | 20200 | 40000 | 21097 | 20000 | 0 | 10100 |
30204 | 126476 | 41202 | 21202 | 20000 | 10100 | 20000 | 2382436 | 2252308 | 30100 | 10200 | 20000 | 20200 | 40000 | 21101 | 20000 | 0 | 10100 |
30204 | 126476 | 41202 | 21202 | 20000 | 10100 | 20000 | 2382436 | 2252308 | 30100 | 10200 | 20000 | 20200 | 40000 | 21101 | 20000 | 0 | 10100 |
30205 | 122873 | 40889 | 20847 | 20042 | 10128 | 20000 | 2382436 | 2252308 | 30100 | 10200 | 20000 | 20200 | 40000 | 21101 | 20000 | 0 | 10100 |
30204 | 126476 | 41202 | 21202 | 20000 | 10100 | 20000 | 2382436 | 2252308 | 30100 | 10200 | 20000 | 20200 | 40000 | 21101 | 20000 | 0 | 10100 |
30205 | 125041 | 41000 | 20953 | 20047 | 10130 | 20000 | 2382436 | 2252308 | 30100 | 10200 | 20000 | 20200 | 40000 | 21101 | 20000 | 0 | 10100 |
30204 | 126476 | 41202 | 21202 | 20000 | 10100 | 20000 | 2382436 | 2252308 | 30100 | 10200 | 20000 | 20200 | 40000 | 21101 | 20000 | 0 | 10100 |
30204 | 126317 | 41171 | 21171 | 20000 | 10100 | 20000 | 2381283 | 2251320 | 30100 | 10200 | 20000 | 20200 | 40000 | 21101 | 20000 | 0 | 10100 |
Result (median cycles for code): 12.9754
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30025 | 134184 | 41662 | 21626 | 20036 | 10041 | 20000 | 2520726 | 2370481 | 30010 | 10020 | 20000 | 20020 | 40000 | 21519 | 20000 | 10010 |
30024 | 132965 | 41529 | 21529 | 20000 | 10010 | 20000 | 2520730 | 2370481 | 30010 | 10020 | 20000 | 20020 | 40000 | 21519 | 20000 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20000 | 2456143 | 2312957 | 30010 | 10020 | 20000 | 20020 | 40000 | 21259 | 20000 | 10010 |
30024 | 130466 | 41231 | 21231 | 20000 | 10010 | 20000 | 2470374 | 2325224 | 30010 | 10020 | 20000 | 20020 | 40000 | 21220 | 20000 | 10010 |
30024 | 130454 | 41230 | 21230 | 20000 | 10010 | 20000 | 2456143 | 2312957 | 30010 | 10020 | 20000 | 20020 | 40000 | 21102 | 20000 | 10010 |
30024 | 129764 | 41278 | 21278 | 20000 | 10010 | 20000 | 2455686 | 2312561 | 30010 | 10020 | 20000 | 20020 | 40000 | 21272 | 20000 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20050 | 2392361 | 2257216 | 30088 | 10048 | 20056 | 20076 | 40110 | 21208 | 20000 | 10010 |
30024 | 129761 | 41263 | 21263 | 20000 | 10010 | 20000 | 2455279 | 2312541 | 30010 | 10020 | 20000 | 20020 | 40000 | 21272 | 20000 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20000 | 2456143 | 2312957 | 30010 | 10020 | 20000 | 20020 | 40000 | 21272 | 20000 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20000 | 2456143 | 2312957 | 30010 | 10020 | 20000 | 20020 | 40000 | 21272 | 20000 | 10010 |