Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
steorh w0, [x6] nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop
mov x0, 0
(no loop instructions)
Retires (minus 70 nops): 3.000
Issues: 3.001
Integer unit issues: 1.002
Load/store unit issues: 2.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
73006 | 34944 | 3055 | 1033 | 2022 | 1011 | 2000 | 7760 | 10511 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 35013 | 3003 | 1003 | 2000 | 1000 | 2000 | 7760 | 10511 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34194 | 3002 | 1002 | 2000 | 1000 | 2000 | 7760 | 10511 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34184 | 3002 | 1002 | 2000 | 1000 | 2000 | 7760 | 10511 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34169 | 3002 | 1002 | 2000 | 1000 | 2000 | 7760 | 10511 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34147 | 3002 | 1002 | 2000 | 1000 | 2000 | 7760 | 10511 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34147 | 3002 | 1002 | 2000 | 1000 | 2000 | 7760 | 10511 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73005 | 34184 | 3004 | 1002 | 2002 | 1001 | 2000 | 7760 | 10511 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34178 | 3002 | 1002 | 2000 | 1000 | 2000 | 7760 | 10511 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
73004 | 34182 | 3002 | 1002 | 2000 | 1000 | 2000 | 7760 | 10511 | 3000 | 1000 | 2000 | 2000 | 4000 | 1002 | 2000 | 1000 |
Code:
steorh w0, [x6] add x6, x6, 2
(fused SUBS/B.cc loop)
Result (median cycles for code): 3.0063
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
40206 | 30377 | 40258 | 20215 | 20043 | 20168 | 20007 | 115817 | 106038 | 40114 | 20207 | 20007 | 30211 | 40013 | 20010 | 20000 | 20100 |
40204 | 30063 | 40110 | 20108 | 20002 | 20104 | 20004 | 115854 | 105855 | 40108 | 20204 | 20004 | 30206 | 40008 | 20008 | 20000 | 20100 |
40204 | 30063 | 40115 | 20110 | 20005 | 20107 | 20004 | 115912 | 105930 | 40108 | 20204 | 20004 | 30206 | 40008 | 20008 | 20000 | 20100 |
40204 | 30063 | 40110 | 20108 | 20002 | 20104 | 20004 | 115848 | 105843 | 40108 | 20204 | 20004 | 30206 | 40008 | 20008 | 20000 | 20100 |
40204 | 30063 | 40110 | 20108 | 20002 | 20104 | 20004 | 115837 | 105821 | 40108 | 20204 | 20004 | 30206 | 40008 | 20008 | 20000 | 20100 |
40204 | 30063 | 40110 | 20108 | 20002 | 20104 | 20004 | 115850 | 105847 | 40108 | 20204 | 20004 | 30206 | 40008 | 20008 | 20000 | 20100 |
40204 | 30063 | 40110 | 20108 | 20002 | 20104 | 20004 | 115898 | 105898 | 40108 | 20204 | 20004 | 30206 | 40008 | 20008 | 20000 | 20100 |
40204 | 30063 | 40110 | 20108 | 20002 | 20104 | 20004 | 115820 | 105791 | 40108 | 20204 | 20004 | 30206 | 40008 | 20008 | 20000 | 20100 |
40204 | 30063 | 40110 | 20108 | 20002 | 20104 | 20004 | 115842 | 105831 | 40108 | 20204 | 20004 | 30206 | 40008 | 20008 | 20000 | 20100 |
40204 | 30063 | 40110 | 20108 | 20002 | 20104 | 20004 | 115850 | 105847 | 40108 | 20204 | 20004 | 30206 | 40008 | 20008 | 20000 | 20100 |
Result (median cycles for code): 3.0059
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
40026 | 30403 | 40168 | 20124 | 20044 | 20075 | 20004 | 115718 | 105896 | 40018 | 20024 | 20004 | 30031 | 40013 | 20010 | 20000 | 20010 |
40024 | 30059 | 40017 | 20017 | 20000 | 20010 | 20000 | 115560 | 105738 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30060 | 40018 | 20018 | 20000 | 20010 | 20000 | 115567 | 105748 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30059 | 40017 | 20017 | 20000 | 20010 | 20000 | 115552 | 105721 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30059 | 40017 | 20017 | 20000 | 20010 | 20000 | 115576 | 105765 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30059 | 40017 | 20017 | 20000 | 20010 | 20000 | 115573 | 105760 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30059 | 40017 | 20017 | 20000 | 20010 | 20000 | 115559 | 105736 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30059 | 40017 | 20017 | 20000 | 20010 | 20000 | 115561 | 105733 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30059 | 40017 | 20017 | 20000 | 20010 | 20000 | 115589 | 105766 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 30059 | 40017 | 20017 | 20000 | 20010 | 20000 | 115566 | 105746 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
Code:
steorh w0, [x6]
mov x7, 8
(fused SUBS/B.cc loop)
Result (median cycles for code): 12.9754
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
30205 | 130154 | 41363 | 21327 | 20036 | 10130 | 20000 | 2450042 | 2311933 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21310 | 20000 | 0 | 10100 |
30204 | 129761 | 41410 | 21410 | 20000 | 10100 | 20000 | 2449943 | 2311807 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21309 | 20000 | 0 | 10100 |
30205 | 125608 | 40993 | 20948 | 20045 | 10129 | 20000 | 2449943 | 2311807 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21309 | 20000 | 0 | 10100 |
30204 | 129754 | 41409 | 21409 | 20000 | 10100 | 20050 | 2389491 | 2258531 | 30180 | 10230 | 20056 | 20200 | 40000 | 0 | 21257 | 20000 | 0 | 10100 |
30204 | 129272 | 41357 | 21357 | 20000 | 10100 | 20000 | 2440080 | 2302974 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21257 | 20000 | 0 | 10100 |
30204 | 129279 | 41381 | 21381 | 20000 | 10100 | 20000 | 2440080 | 2302974 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21257 | 20000 | 0 | 10100 |
30204 | 128793 | 41323 | 21323 | 20000 | 10100 | 20049 | 2423110 | 2288770 | 30178 | 10229 | 20055 | 20200 | 40000 | 0 | 21223 | 20000 | 0 | 10100 |
30204 | 128784 | 41313 | 21313 | 20000 | 10100 | 20000 | 2382064 | 2251984 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21309 | 20000 | 0 | 10100 |
30204 | 129754 | 41409 | 21409 | 20000 | 10100 | 20000 | 2449891 | 2311789 | 30100 | 10200 | 20000 | 20200 | 40000 | 0 | 21309 | 20000 | 0 | 10100 |
30204 | 129754 | 41409 | 21409 | 20000 | 10100 | 20000 | 2449942 | 2311807 | 30100 | 10200 | 20000 | 20256 | 40110 | 0 | 20781 | 20000 | 0 | 10100 |
Result (median cycles for code): 12.7197
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
30025 | 134903 | 41813 | 21761 | 20052 | 10045 | 20000 | 2467601 | 2323517 | 30010 | 10020 | 20000 | 20020 | 40000 | 0 | 21361 | 20000 | 0 | 10010 |
30024 | 130350 | 41372 | 21371 | 20001 | 10010 | 20000 | 2467605 | 2323517 | 30010 | 10020 | 20000 | 20020 | 40000 | 0 | 21273 | 20000 | 0 | 10010 |
30025 | 130623 | 41362 | 21316 | 20046 | 10038 | 20000 | 2499138 | 2351681 | 30010 | 10020 | 20000 | 20020 | 40000 | 0 | 21453 | 20000 | 0 | 10010 |
30024 | 131899 | 41463 | 21462 | 20001 | 10010 | 20000 | 2499138 | 2351681 | 30010 | 10020 | 20000 | 20076 | 40109 | 0 | 21113 | 20000 | 0 | 10010 |
30024 | 129759 | 41275 | 21275 | 20000 | 10010 | 20000 | 2456248 | 2313083 | 30010 | 10020 | 20000 | 20020 | 40000 | 0 | 21273 | 20000 | 0 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20000 | 2456248 | 2313083 | 30010 | 10020 | 20000 | 20020 | 40000 | 0 | 21273 | 20000 | 0 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20049 | 2418537 | 2280214 | 30087 | 10050 | 20058 | 20020 | 40000 | 0 | 21273 | 20000 | 0 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20040 | 2456996 | 2313858 | 30071 | 10041 | 20042 | 20020 | 40000 | 0 | 21273 | 20000 | 0 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20000 | 2456248 | 2313083 | 30010 | 10020 | 20000 | 20020 | 40000 | 0 | 21273 | 20000 | 0 | 10010 |
30024 | 129754 | 41282 | 21282 | 20000 | 10010 | 20000 | 2456248 | 2313083 | 30010 | 10020 | 20000 | 20020 | 40000 | 0 | 21253 | 20000 | 0 | 10010 |