Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
steorlb w0, [x6] nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop
mov x0, 0
(no loop instructions)
Retires (minus 70 nops): 3.000
Issues: 3.002
Integer unit issues: 1.003
Load/store unit issues: 2.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
73005 | 34382 | 3033 | 1023 | 0 | 2010 | 1005 | 0 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34122 | 3003 | 1003 | 0 | 2000 | 1000 | 0 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34093 | 3003 | 1003 | 0 | 2000 | 1000 | 0 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34096 | 3003 | 1003 | 0 | 2000 | 1000 | 0 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34101 | 3003 | 1003 | 0 | 2000 | 1000 | 0 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34455 | 3003 | 1003 | 0 | 2000 | 1000 | 0 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34108 | 3003 | 1003 | 0 | 2000 | 1000 | 0 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34112 | 3003 | 1003 | 0 | 2000 | 1000 | 0 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34305 | 3003 | 1003 | 0 | 2000 | 1000 | 0 | 2000 | 7769 | 10520 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
73004 | 34290 | 3003 | 1003 | 0 | 2000 | 1000 | 0 | 2000 | 7770 | 10521 | 3000 | 1000 | 2000 | 2000 | 4000 | 1003 | 2000 | 1000 |
Code:
steorlb w0, [x6] add x6, x6, 2
(fused SUBS/B.cc loop)
Result (median cycles for code): 6.0065
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
40207 | 60420 | 40299 | 20229 | 20070 | 20184 | 20002 | 115719 | 95587 | 40104 | 20202 | 20002 | 30203 | 40004 | 20006 | 20000 | 20100 |
40204 | 60065 | 40111 | 20108 | 20003 | 20105 | 20002 | 115717 | 95581 | 40104 | 20202 | 20002 | 30203 | 40004 | 20006 | 20000 | 20100 |
40204 | 60065 | 40106 | 20106 | 20000 | 20102 | 20036 | 99901 | 102713 | 40172 | 20236 | 20036 | 30203 | 40004 | 20006 | 20000 | 20100 |
40205 | 60121 | 40172 | 20144 | 20028 | 20133 | 20002 | 115719 | 95588 | 40104 | 20202 | 20002 | 30203 | 40004 | 20006 | 20000 | 20100 |
40204 | 60065 | 40106 | 20106 | 20000 | 20102 | 20002 | 115725 | 95596 | 40104 | 20202 | 20002 | 30203 | 40004 | 20006 | 20000 | 20100 |
40204 | 60065 | 40106 | 20106 | 20000 | 20102 | 20002 | 115721 | 95589 | 40104 | 20202 | 20002 | 30203 | 40004 | 20006 | 20000 | 20100 |
40204 | 60065 | 40106 | 20106 | 20000 | 20102 | 20002 | 115719 | 95587 | 40104 | 20202 | 20002 | 30203 | 40004 | 20006 | 20000 | 20100 |
40204 | 60065 | 40106 | 20106 | 20000 | 20102 | 20002 | 115717 | 95584 | 40104 | 20202 | 20002 | 30203 | 40004 | 20006 | 20000 | 20100 |
40204 | 60067 | 40106 | 20106 | 20000 | 20102 | 20002 | 115727 | 95599 | 40104 | 20202 | 20002 | 30203 | 40004 | 20006 | 20000 | 20100 |
40204 | 60065 | 40106 | 20106 | 20000 | 20102 | 20002 | 115713 | 95575 | 40104 | 20202 | 20002 | 30203 | 40004 | 20006 | 20000 | 20100 |
Result (median cycles for code): 6.0055
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
40027 | 60324 | 40203 | 20138 | 20065 | 20088 | 20002 | 115422 | 95432 | 40014 | 20022 | 20002 | 30020 | 40000 | 20005 | 20000 | 20010 |
40024 | 60062 | 40015 | 20015 | 20000 | 20010 | 20000 | 115281 | 95291 | 40010 | 20020 | 20000 | 30020 | 40000 | 20004 | 20000 | 20010 |
40024 | 60055 | 40015 | 20015 | 20000 | 20010 | 20000 | 115275 | 95287 | 40010 | 20020 | 20000 | 30020 | 40000 | 20004 | 20000 | 20010 |
40024 | 60055 | 40016 | 20016 | 20000 | 20010 | 20000 | 115299 | 95318 | 40010 | 20020 | 20000 | 30020 | 40000 | 20005 | 20000 | 20010 |
40024 | 60055 | 40014 | 20014 | 20000 | 20010 | 20000 | 115275 | 95288 | 40010 | 20020 | 20000 | 30020 | 40000 | 20004 | 20000 | 20010 |
40024 | 60055 | 40015 | 20015 | 20000 | 20010 | 20000 | 115279 | 95296 | 40010 | 20020 | 20000 | 30020 | 40000 | 20004 | 20000 | 20010 |
40024 | 60055 | 40015 | 20015 | 20000 | 20010 | 20000 | 115274 | 95287 | 40010 | 20020 | 20000 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 60055 | 40015 | 20015 | 20000 | 20010 | 20000 | 115285 | 95299 | 40010 | 20020 | 20000 | 30020 | 40000 | 20004 | 20000 | 20010 |
40024 | 60055 | 40014 | 20014 | 20000 | 20010 | 20000 | 115276 | 95290 | 40010 | 20020 | 20000 | 30068 | 40060 | 20039 | 20000 | 20010 |
40024 | 60055 | 40014 | 20014 | 20000 | 20010 | 20000 | 115282 | 95290 | 40010 | 20020 | 20000 | 30020 | 40000 | 20005 | 20000 | 20010 |
Code:
steorlb w0, [x6]
mov x7, 8
(fused SUBS/B.cc loop)
Result (median cycles for code): 10.7485
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30205 | 111149 | 43450 | 21038 | 22412 | 12573 | 20759 | 1992686 | 1931308 | 31502 | 10843 | 21274 | 20716 | 40967 | 19659 | 20000 | 10100 |
30204 | 110765 | 43190 | 21170 | 22020 | 12090 | 20337 | 1984510 | 1922733 | 30715 | 10478 | 20507 | 20788 | 41120 | 19362 | 20000 | 10100 |
30204 | 107980 | 41397 | 20281 | 21116 | 11113 | 20930 | 1961774 | 1910894 | 31800 | 10978 | 21477 | 21264 | 42076 | 19698 | 20000 | 10100 |
30204 | 106628 | 39706 | 19406 | 20300 | 10316 | 20320 | 1939629 | 1918942 | 30662 | 10443 | 20460 | 20204 | 40010 | 19275 | 20000 | 10100 |
30204 | 105989 | 39683 | 19322 | 20361 | 10382 | 20327 | 1885593 | 1873715 | 30716 | 10512 | 20536 | 20514 | 40603 | 19287 | 20000 | 10100 |
30204 | 108124 | 40980 | 19934 | 21046 | 11011 | 20598 | 1900403 | 1894788 | 31160 | 10662 | 20910 | 21964 | 43156 | 19814 | 20000 | 10100 |
30204 | 107595 | 40847 | 19991 | 20856 | 10828 | 20396 | 1894939 | 1881580 | 30848 | 10555 | 20704 | 20288 | 40166 | 19513 | 20000 | 10100 |
30204 | 109526 | 40386 | 19798 | 20588 | 10453 | 20758 | 1906519 | 1896273 | 31485 | 10827 | 21209 | 22170 | 43668 | 19929 | 20000 | 10100 |
30205 | 108934 | 42067 | 20504 | 21563 | 11628 | 20582 | 1966212 | 1914562 | 31179 | 10701 | 20975 | 21002 | 41570 | 19416 | 20000 | 10100 |
30204 | 107607 | 40120 | 19681 | 20439 | 10358 | 22064 | 1998461 | 1941082 | 33925 | 11969 | 23303 | 23729 | 46455 | 20587 | 20000 | 10100 |
Result (median cycles for code): 11.4098
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30025 | 114636 | 47058 | 22544 | 24514 | 15174 | 24632 | 2089414 | 2011858 | 38886 | 14271 | 28031 | 28190 | 54831 | 22894 | 20000 | 10010 |
30024 | 115385 | 45665 | 23417 | 22248 | 11919 | 24020 | 2092038 | 2015359 | 37560 | 13551 | 26770 | 28124 | 54551 | 22539 | 20000 | 10010 |
30024 | 114391 | 46533 | 22689 | 23844 | 13903 | 24527 | 2074589 | 2000247 | 38453 | 13939 | 27606 | 27294 | 53239 | 22448 | 20000 | 10010 |
30024 | 114119 | 46567 | 22403 | 24164 | 14078 | 24475 | 2086145 | 2009575 | 38514 | 14050 | 27552 | 28260 | 54817 | 22551 | 20000 | 10010 |
30024 | 114334 | 46401 | 22254 | 24147 | 14257 | 24911 | 2082059 | 2006608 | 39419 | 14521 | 28391 | 28278 | 54554 | 22341 | 20000 | 10010 |
30024 | 114218 | 46631 | 22228 | 24403 | 14289 | 23987 | 2089831 | 2012894 | 37510 | 13534 | 26640 | 28227 | 54779 | 22357 | 20000 | 10010 |
30024 | 113628 | 45840 | 21937 | 23903 | 13900 | 25174 | 2083520 | 2008086 | 39813 | 14654 | 28541 | 29429 | 56349 | 22428 | 20000 | 10010 |
30025 | 114603 | 46177 | 22413 | 23764 | 14028 | 24598 | 2085387 | 2009899 | 38548 | 13961 | 27756 | 29608 | 56542 | 22486 | 20000 | 10010 |
30024 | 113965 | 47174 | 24365 | 22809 | 13344 | 25073 | 2081006 | 2005679 | 39603 | 14541 | 28306 | 28154 | 55024 | 22334 | 20000 | 10010 |
30024 | 114375 | 46826 | 22642 | 24184 | 14222 | 24637 | 2076314 | 2001068 | 38713 | 14090 | 27795 | 28710 | 55674 | 22489 | 20000 | 10010 |