Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
stadd w0, [x6] nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop
mov x0, 0
(no loop instructions)
Retires (minus 70 nops): 3.000
Issues: 3.002
Integer unit issues: 1.003
Load/store unit issues: 2.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map simd uop (7e) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
73005 | 34877 | 3019 | 1015 | 0 | 2004 | 1002 | 0 | 2000 | 7769 | 10520 | 0 | 3000 | 1000 | 2000 | 0 | 2002 | 4004 | 0 | 1004 | 2000 | 0 | 1000 |
73004 | 34572 | 3003 | 1003 | 0 | 2000 | 1000 | 0 | 2000 | 7770 | 10521 | 0 | 3000 | 1000 | 2000 | 0 | 2000 | 4000 | 0 | 1003 | 2000 | 0 | 1000 |
73004 | 34764 | 3003 | 1003 | 0 | 2000 | 1000 | 0 | 2000 | 7770 | 10521 | 0 | 3000 | 1000 | 2000 | 0 | 2002 | 4004 | 0 | 1004 | 2000 | 0 | 1000 |
73005 | 35353 | 3012 | 1006 | 0 | 2006 | 1003 | 0 | 2000 | 7770 | 10521 | 0 | 3000 | 1000 | 2000 | 0 | 2000 | 4000 | 0 | 1003 | 2000 | 0 | 1000 |
73004 | 35609 | 3006 | 1006 | 0 | 2000 | 1000 | 0 | 2000 | 7770 | 10521 | 0 | 3000 | 1000 | 2000 | 0 | 2000 | 4000 | 0 | 1003 | 2000 | 0 | 1000 |
73004 | 34176 | 3003 | 1003 | 0 | 2000 | 1000 | 0 | 2000 | 7770 | 10521 | 0 | 3000 | 1000 | 2000 | 0 | 2000 | 4000 | 0 | 1003 | 2000 | 0 | 1000 |
73004 | 34146 | 3003 | 1003 | 0 | 2000 | 1000 | 0 | 2000 | 7770 | 10521 | 0 | 3000 | 1000 | 2000 | 0 | 2000 | 4000 | 0 | 1003 | 2000 | 0 | 1000 |
73004 | 34213 | 3003 | 1003 | 0 | 2000 | 1000 | 0 | 2000 | 7770 | 10521 | 0 | 3000 | 1000 | 2000 | 0 | 2000 | 4000 | 0 | 1003 | 2000 | 0 | 1000 |
73004 | 34201 | 3003 | 1003 | 0 | 2000 | 1000 | 0 | 2000 | 7770 | 10521 | 0 | 3000 | 1000 | 2000 | 0 | 2000 | 4000 | 0 | 1003 | 2000 | 0 | 1000 |
73004 | 34173 | 3003 | 1003 | 0 | 2000 | 1000 | 0 | 2000 | 7770 | 10521 | 0 | 3000 | 1000 | 2000 | 0 | 2000 | 4000 | 0 | 1003 | 2000 | 0 | 1000 |
Code:
stadd w0, [x6] add x6, x6, 4
(fused SUBS/B.cc loop)
Result (median cycles for code): 3.0056
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
40207 | 30639 | 40382 | 20319 | 20063 | 20201 | 20007 | 115973 | 106407 | 40114 | 20207 | 20007 | 30211 | 40013 | 20010 | 20000 | 20100 |
40204 | 30059 | 40115 | 20110 | 20005 | 20107 | 20004 | 116085 | 106316 | 40108 | 20204 | 20004 | 30206 | 40008 | 20012 | 20000 | 20100 |
40204 | 30059 | 40114 | 20112 | 20002 | 20104 | 20004 | 116066 | 106278 | 40108 | 20204 | 20004 | 30206 | 40008 | 20012 | 20000 | 20100 |
40204 | 30059 | 40114 | 20112 | 20002 | 20104 | 20004 | 116081 | 106308 | 40108 | 20204 | 20004 | 30206 | 40008 | 20012 | 20000 | 20100 |
40204 | 30059 | 40114 | 20112 | 20002 | 20104 | 20004 | 116085 | 106316 | 40108 | 20204 | 20004 | 30206 | 40008 | 20012 | 20000 | 20100 |
40204 | 30059 | 40114 | 20112 | 20002 | 20104 | 20004 | 116080 | 106306 | 40108 | 20204 | 20004 | 30206 | 40008 | 20012 | 20000 | 20100 |
40204 | 30059 | 40114 | 20112 | 20002 | 20104 | 20004 | 116070 | 106286 | 40108 | 20204 | 20004 | 30206 | 40008 | 20013 | 20000 | 20100 |
40204 | 30059 | 40114 | 20112 | 20002 | 20104 | 20004 | 116082 | 106310 | 40108 | 20204 | 20004 | 30206 | 40008 | 20012 | 20000 | 20100 |
40204 | 30059 | 40114 | 20112 | 20002 | 20104 | 20004 | 116074 | 106294 | 40108 | 20204 | 20004 | 30206 | 40008 | 20012 | 20000 | 20100 |
40204 | 30059 | 40114 | 20112 | 20002 | 20104 | 20004 | 116069 | 106284 | 40108 | 20204 | 20004 | 30206 | 40008 | 20012 | 20000 | 20100 |
Result (median cycles for code): 3.0056
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
40027 | 30632 | 40254 | 20184 | 20070 | 20104 | 20004 | 115887 | 106318 | 40018 | 20024 | 20004 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30065 | 40018 | 20018 | 20000 | 20010 | 20000 | 115672 | 106049 | 40010 | 20020 | 20000 | 30020 | 40000 | 20009 | 20000 | 20010 |
40024 | 30056 | 40017 | 20017 | 20000 | 20010 | 20000 | 115731 | 106164 | 40010 | 20020 | 20000 | 30020 | 40000 | 20006 | 20000 | 20010 |
40024 | 30056 | 40016 | 20016 | 20000 | 20010 | 20000 | 115731 | 106155 | 40010 | 20020 | 20000 | 30020 | 40000 | 20006 | 20000 | 20010 |
40024 | 30056 | 40016 | 20016 | 20000 | 20010 | 20000 | 115730 | 106164 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30056 | 40017 | 20017 | 20000 | 20010 | 20000 | 115717 | 106136 | 40010 | 20020 | 20000 | 30077 | 40074 | 20048 | 20000 | 20010 |
40024 | 30063 | 40017 | 20017 | 20000 | 20010 | 20000 | 115737 | 106178 | 40010 | 20020 | 20000 | 30020 | 40000 | 20006 | 20000 | 20010 |
40024 | 30056 | 40017 | 20017 | 20000 | 20010 | 20000 | 115735 | 106170 | 40010 | 20020 | 20000 | 30020 | 40000 | 20006 | 20000 | 20010 |
40024 | 30056 | 40016 | 20016 | 20000 | 20010 | 20000 | 115711 | 106126 | 40010 | 20020 | 20000 | 30020 | 40000 | 20006 | 20000 | 20010 |
40024 | 30056 | 40016 | 20016 | 20000 | 20010 | 20000 | 115706 | 106112 | 40010 | 20020 | 20000 | 30020 | 40000 | 20006 | 20000 | 20010 |
Code:
stadd w0, [x6]
mov x7, 8
(fused SUBS/B.cc loop)
Result (median cycles for code): 12.9761
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30205 | 130156 | 41368 | 21330 | 20038 | 10131 | 20000 | 2450044 | 2311933 | 30100 | 10200 | 20000 | 20200 | 40000 | 21310 | 20000 | 10100 |
30204 | 129761 | 41410 | 21410 | 20000 | 10100 | 20000 | 2450048 | 2311933 | 30100 | 10200 | 20000 | 20200 | 40000 | 21310 | 20000 | 10100 |
30204 | 129761 | 41410 | 21410 | 20000 | 10100 | 20000 | 2450048 | 2311933 | 30100 | 10200 | 20000 | 20200 | 40000 | 21310 | 20000 | 10100 |
30204 | 129761 | 41410 | 21410 | 20000 | 10100 | 20000 | 2450048 | 2311933 | 30100 | 10200 | 20000 | 20252 | 40094 | 21354 | 20000 | 10100 |
30204 | 129752 | 41406 | 21406 | 20000 | 10100 | 20000 | 2450048 | 2311933 | 30100 | 10200 | 20000 | 20200 | 40000 | 21310 | 20000 | 10100 |
30204 | 129766 | 41410 | 21410 | 20000 | 10100 | 20000 | 2450048 | 2311933 | 30100 | 10200 | 20000 | 20200 | 40000 | 21309 | 20000 | 10100 |
30204 | 129751 | 41359 | 21359 | 20000 | 10100 | 20000 | 2450028 | 2311915 | 30100 | 10200 | 20000 | 20200 | 40000 | 21310 | 20000 | 10100 |
30204 | 129761 | 41410 | 21410 | 20000 | 10100 | 20000 | 2450048 | 2311933 | 30100 | 10200 | 20000 | 20200 | 40000 | 21310 | 20000 | 10100 |
30204 | 129761 | 41410 | 21410 | 20000 | 10100 | 20051 | 2403929 | 2271429 | 30181 | 10230 | 20058 | 20200 | 40000 | 21274 | 20000 | 10100 |
30204 | 129530 | 41374 | 21374 | 20000 | 10100 | 20000 | 2473338 | 2333811 | 30100 | 10200 | 20000 | 20240 | 40080 | 21415 | 20000 | 10100 |
Result (median cycles for code): 12.9738
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30025 | 134804 | 41775 | 21739 | 20036 | 10041 | 20000 | 2467601 | 2323517 | 30010 | 10020 | 20000 | 20020 | 40000 | 21361 | 20000 | 10010 |
30024 | 130350 | 41372 | 21371 | 20001 | 10010 | 20000 | 2456143 | 2312957 | 30010 | 10020 | 20000 | 20020 | 40000 | 21252 | 20000 | 10010 |
30024 | 126476 | 41112 | 21112 | 20000 | 10010 | 20000 | 2388716 | 2253508 | 30010 | 10020 | 20000 | 20020 | 40000 | 21101 | 20000 | 10010 |
30024 | 126469 | 41111 | 21111 | 20000 | 10010 | 20000 | 2388716 | 2253508 | 30010 | 10020 | 20000 | 20020 | 40000 | 21101 | 20000 | 10010 |
30024 | 126469 | 41111 | 21111 | 20000 | 10010 | 20000 | 2388716 | 2253508 | 30010 | 10020 | 20000 | 20020 | 40000 | 21101 | 20000 | 10010 |
30024 | 126469 | 41111 | 21111 | 20000 | 10010 | 20000 | 2388716 | 2253508 | 30010 | 10020 | 20000 | 20020 | 40000 | 21101 | 20000 | 10010 |
30024 | 126469 | 41111 | 21111 | 20000 | 10010 | 20000 | 2388716 | 2253508 | 30010 | 10020 | 20000 | 20026 | 40009 | 21080 | 20000 | 10010 |
30024 | 126472 | 41111 | 21111 | 20000 | 10010 | 20304 | 2350831 | 2221749 | 30488 | 10194 | 20344 | 20020 | 40000 | 21656 | 20000 | 10010 |
30024 | 126469 | 41111 | 21111 | 20000 | 10010 | 20050 | 2357645 | 2226511 | 30088 | 10048 | 20055 | 20020 | 40000 | 21101 | 20000 | 10010 |
30024 | 125512 | 40976 | 20930 | 20046 | 10038 | 20048 | 2401128 | 2264818 | 30086 | 10048 | 20055 | 20020 | 40000 | 21148 | 20000 | 10010 |