Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
staddh w0, [x6] nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop
mov x0, 0
(no loop instructions)
Retires (minus 70 nops): 3.000
Issues: 3.002
Integer unit issues: 1.003
Load/store unit issues: 2.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
73005 | 35282 | 3019 | 1015 | 2004 | 1002 | 2000 | 7773 | 10527 | 3000 | 1000 | 2000 | 2000 | 4000 | 1004 | 2000 | 0 | 1000 |
73004 | 34154 | 3003 | 1003 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 0 | 1000 |
73004 | 34197 | 3002 | 1002 | 2000 | 1000 | 2000 | 7770 | 10529 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 0 | 1000 |
73004 | 35001 | 3003 | 1003 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 0 | 1000 |
73004 | 34151 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 0 | 1000 |
73004 | 34143 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 0 | 1000 |
73004 | 34159 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 0 | 1000 |
73004 | 34209 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 0 | 1000 |
73004 | 34177 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 0 | 1000 |
73004 | 34201 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 0 | 1000 |
Code:
staddh w0, [x6] add x6, x6, 2
(fused SUBS/B.cc loop)
Result (median cycles for code): 3.0066
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
40206 | 30364 | 40258 | 20215 | 20043 | 20169 | 20007 | 115870 | 106072 | 40114 | 20207 | 20007 | 30211 | 40013 | 20011 | 20000 | 20100 |
40204 | 30066 | 40111 | 20109 | 20002 | 20104 | 20004 | 115958 | 105929 | 40108 | 20204 | 20004 | 30206 | 40008 | 20009 | 20000 | 20100 |
40204 | 30066 | 40116 | 20111 | 20005 | 20107 | 20004 | 115912 | 105886 | 40108 | 20204 | 20004 | 30206 | 40008 | 20009 | 20000 | 20100 |
40204 | 30066 | 40111 | 20109 | 20002 | 20104 | 20004 | 115959 | 105931 | 40108 | 20204 | 20004 | 30206 | 40008 | 20009 | 20000 | 20100 |
40204 | 30066 | 40111 | 20109 | 20002 | 20104 | 20004 | 115970 | 105953 | 40108 | 20204 | 20004 | 30206 | 40008 | 20009 | 20000 | 20100 |
40205 | 30132 | 40186 | 20150 | 20036 | 20138 | 20004 | 115940 | 105942 | 40108 | 20204 | 20004 | 30206 | 40008 | 20009 | 20000 | 20100 |
40204 | 30066 | 40111 | 20109 | 20002 | 20104 | 20004 | 115966 | 105945 | 40108 | 20204 | 20004 | 30206 | 40008 | 20009 | 20000 | 20100 |
40204 | 30066 | 40111 | 20109 | 20002 | 20104 | 20004 | 115959 | 105933 | 40108 | 20204 | 20004 | 30206 | 40008 | 20009 | 20000 | 20100 |
40204 | 30066 | 40111 | 20109 | 20002 | 20104 | 20004 | 115915 | 105888 | 40108 | 20204 | 20004 | 30206 | 40008 | 20009 | 20000 | 20100 |
40204 | 30066 | 40111 | 20109 | 20002 | 20104 | 20004 | 115906 | 105874 | 40108 | 20204 | 20004 | 30206 | 40008 | 20009 | 20000 | 20100 |
Result (median cycles for code): 3.0056
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
40027 | 30736 | 40270 | 20200 | 20070 | 20104 | 20004 | 115677 | 105891 | 40018 | 20024 | 20004 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30063 | 40017 | 20017 | 20000 | 20010 | 20000 | 115473 | 105650 | 40010 | 20020 | 20000 | 30020 | 40000 | 20006 | 20000 | 20010 |
40024 | 30056 | 40016 | 20016 | 20000 | 20010 | 20000 | 115484 | 105670 | 40010 | 20020 | 20000 | 30020 | 40000 | 20006 | 20000 | 20010 |
40024 | 30056 | 40016 | 20016 | 20000 | 20010 | 20000 | 115461 | 105626 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30056 | 40016 | 20016 | 20000 | 20010 | 20000 | 115495 | 105683 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30056 | 40016 | 20016 | 20000 | 20010 | 20000 | 115474 | 105650 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30056 | 40016 | 20016 | 20000 | 20010 | 20000 | 115482 | 105669 | 40010 | 20020 | 20000 | 30020 | 40000 | 20006 | 20000 | 20010 |
40024 | 30056 | 40017 | 20017 | 20000 | 20010 | 20000 | 115497 | 105685 | 40010 | 20020 | 20000 | 30020 | 40000 | 20006 | 20000 | 20010 |
40024 | 30056 | 40016 | 20016 | 20000 | 20010 | 20000 | 115522 | 105702 | 40010 | 20020 | 20000 | 30020 | 40000 | 20006 | 20000 | 20010 |
40024 | 30056 | 40017 | 20017 | 20000 | 20010 | 20000 | 115477 | 105658 | 40010 | 20020 | 20000 | 30020 | 40000 | 20006 | 20000 | 20010 |
Code:
staddh w0, [x6]
mov x7, 8
(fused SUBS/B.cc loop)
Result (median cycles for code): 12.6476
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
30206 | 128717 | 41291 | 21210 | 20081 | 10158 | 20000 | 2382309 | 2252220 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21102 | 20000 | 0 | 10100 |
30204 | 126476 | 41202 | 21202 | 20000 | 10100 | 20000 | 2382541 | 2252434 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21102 | 20000 | 0 | 10100 |
30204 | 126476 | 41202 | 21202 | 20000 | 10100 | 20000 | 2382541 | 2252434 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21102 | 20000 | 0 | 10100 |
30204 | 126476 | 41202 | 21202 | 20000 | 10100 | 20000 | 2382541 | 2252434 | 30100 | 10200 | 20000 | 20300 | 40192 | 0 | 20838 | 20000 | 0 | 10100 |
30204 | 126476 | 41202 | 21202 | 20000 | 10100 | 20050 | 2441568 | 2305015 | 30178 | 10228 | 20056 | 20270 | 40137 | 0 | 21311 | 20000 | 0 | 10100 |
30204 | 126449 | 41199 | 21199 | 20000 | 10100 | 20000 | 2382541 | 2252434 | 30100 | 10200 | 20000 | 20262 | 40119 | 0 | 21227 | 20000 | 0 | 10100 |
30204 | 121277 | 40793 | 20793 | 20000 | 10100 | 20000 | 2341329 | 2217081 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 20850 | 20000 | 0 | 10100 |
30204 | 124509 | 40951 | 20951 | 20000 | 10100 | 20000 | 2341329 | 2217081 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 20851 | 20000 | 0 | 10100 |
30204 | 124509 | 40951 | 20951 | 20000 | 10100 | 20000 | 2382541 | 2252434 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21102 | 20000 | 0 | 10100 |
30204 | 126476 | 41202 | 21202 | 20000 | 10100 | 20000 | 2382541 | 2252434 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21102 | 20000 | 0 | 10100 |
Result (median cycles for code): 12.6469
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30025 | 130316 | 41325 | 21272 | 20053 | 10045 | 20008 | 2275090 | 2153967 | 30024 | 10027 | 20015 | 20020 | 40000 | 21102 | 20000 | 10010 |
30024 | 126476 | 41112 | 21112 | 20000 | 10010 | 20000 | 2388821 | 2253634 | 30010 | 10020 | 20000 | 20020 | 40000 | 21102 | 20000 | 10010 |
30025 | 126370 | 41143 | 21097 | 20046 | 10040 | 20000 | 2388438 | 2252873 | 30010 | 10020 | 20000 | 20020 | 40000 | 21102 | 20000 | 10010 |
30024 | 126469 | 41111 | 21111 | 20000 | 10010 | 20000 | 2388821 | 2253634 | 30010 | 10020 | 20000 | 20020 | 40000 | 21102 | 20000 | 10010 |
30024 | 126469 | 41111 | 21111 | 20000 | 10010 | 20000 | 2388821 | 2253634 | 30010 | 10020 | 20000 | 20020 | 40000 | 21102 | 20000 | 10010 |
30024 | 126469 | 41111 | 21111 | 20000 | 10010 | 20000 | 2388821 | 2253634 | 30010 | 10020 | 20000 | 20020 | 40000 | 21102 | 20000 | 10010 |
30024 | 126469 | 41111 | 21111 | 20000 | 10010 | 20000 | 2388821 | 2253634 | 30010 | 10020 | 20000 | 20076 | 40109 | 21083 | 20000 | 10010 |
30024 | 126476 | 41111 | 21111 | 20000 | 10010 | 20000 | 2388821 | 2253634 | 30010 | 10020 | 20000 | 20020 | 40000 | 21102 | 20000 | 10010 |
30024 | 126469 | 41111 | 21111 | 20000 | 10010 | 20000 | 2388821 | 2253634 | 30010 | 10020 | 20000 | 20020 | 40000 | 21102 | 20000 | 10010 |
30024 | 126469 | 41111 | 21111 | 20000 | 10010 | 20008 | 2275085 | 2153974 | 30024 | 10027 | 20015 | 20020 | 40000 | 21101 | 20000 | 10010 |