Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
steorl w0, [x6] nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop
mov x0, 0
(no loop instructions)
Retires (minus 70 nops): 3.000
Issues: 3.001
Integer unit issues: 1.002
Load/store unit issues: 2.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
73006 | 34581 | 3040 | 1026 | 2014 | 1007 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 0 | 1002 | 2000 | 0 | 1000 |
73004 | 34174 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 0 | 1002 | 2000 | 0 | 1000 |
73004 | 34178 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 0 | 1002 | 2000 | 0 | 1000 |
73004 | 34153 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 0 | 1002 | 2000 | 0 | 1000 |
73004 | 34149 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 0 | 1002 | 2000 | 0 | 1000 |
73004 | 34167 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2002 | 4004 | 0 | 1003 | 2000 | 0 | 1000 |
73004 | 34371 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 0 | 1002 | 2000 | 0 | 1000 |
73004 | 34116 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 0 | 1002 | 2000 | 0 | 1000 |
73004 | 34155 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 0 | 1002 | 2000 | 0 | 1000 |
73004 | 34346 | 3002 | 1002 | 2000 | 1000 | 2000 | 7762 | 10513 | 3000 | 1000 | 2000 | 2000 | 4000 | 0 | 1002 | 2000 | 0 | 1000 |
Code:
steorl w0, [x6] add x6, x6, 4
(fused SUBS/B.cc loop)
Result (median cycles for code): 6.0055
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
40207 | 60399 | 40332 | 20277 | 20055 | 20168 | 20002 | 115780 | 95691 | 40104 | 20202 | 20002 | 30208 | 40009 | 20007 | 20000 | 20100 |
40204 | 60058 | 40105 | 20105 | 20000 | 20102 | 20002 | 115649 | 95563 | 40104 | 20202 | 20002 | 30203 | 40004 | 20005 | 20000 | 20100 |
40204 | 60058 | 40105 | 20105 | 20000 | 20102 | 20002 | 115643 | 95552 | 40104 | 20202 | 20002 | 30203 | 40004 | 20005 | 20000 | 20100 |
40204 | 60058 | 40105 | 20105 | 20000 | 20102 | 20002 | 115653 | 95568 | 40104 | 20202 | 20002 | 30203 | 40004 | 20005 | 20000 | 20100 |
40204 | 60058 | 40105 | 20105 | 20000 | 20102 | 20002 | 115659 | 95582 | 40104 | 20202 | 20002 | 30203 | 40004 | 20005 | 20000 | 20100 |
40205 | 60106 | 40172 | 20140 | 20032 | 20134 | 20002 | 115645 | 95556 | 40104 | 20202 | 20002 | 30203 | 40004 | 20005 | 20000 | 20100 |
40204 | 60058 | 40105 | 20105 | 20000 | 20102 | 20002 | 115665 | 95595 | 40104 | 20202 | 20002 | 30203 | 40004 | 20005 | 20000 | 20100 |
40204 | 60058 | 40105 | 20105 | 20000 | 20102 | 20002 | 115659 | 95581 | 40104 | 20202 | 20002 | 30203 | 40004 | 20005 | 20000 | 20100 |
40204 | 60058 | 40105 | 20105 | 20000 | 20102 | 20002 | 115649 | 95564 | 40104 | 20202 | 20002 | 30203 | 40004 | 20005 | 20000 | 20100 |
40204 | 60058 | 40105 | 20105 | 20000 | 20102 | 20002 | 115657 | 95578 | 40104 | 20202 | 20002 | 30203 | 40004 | 20005 | 20000 | 20100 |
Result (median cycles for code): 6.0058
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
40027 | 60400 | 40210 | 20158 | 20052 | 20074 | 20002 | 115418 | 95460 | 40014 | 20022 | 20002 | 30028 | 40009 | 20007 | 20000 | 20010 |
40024 | 60058 | 40015 | 20015 | 20000 | 20012 | 20002 | 115420 | 95467 | 40014 | 20022 | 20002 | 30023 | 40004 | 20005 | 20000 | 20010 |
40024 | 60058 | 40015 | 20015 | 20000 | 20010 | 20000 | 115410 | 95453 | 40010 | 20020 | 20000 | 30020 | 40000 | 20005 | 20000 | 20010 |
40024 | 60058 | 40015 | 20015 | 20000 | 20010 | 20000 | 115410 | 95455 | 40010 | 20020 | 20000 | 30020 | 40000 | 20005 | 20000 | 20010 |
40024 | 60058 | 40015 | 20015 | 20000 | 20010 | 20000 | 115414 | 95462 | 40010 | 20020 | 20000 | 30020 | 40000 | 20005 | 20000 | 20010 |
40024 | 60058 | 40015 | 20015 | 20000 | 20010 | 20000 | 115416 | 95465 | 40010 | 20020 | 20000 | 30020 | 40000 | 20005 | 20000 | 20010 |
40024 | 60058 | 40015 | 20015 | 20000 | 20010 | 20000 | 115406 | 95443 | 40010 | 20020 | 20000 | 30020 | 40000 | 20005 | 20000 | 20010 |
40024 | 60058 | 40015 | 20015 | 20000 | 20010 | 20000 | 115406 | 95445 | 40010 | 20020 | 20000 | 30020 | 40000 | 20005 | 20000 | 20010 |
40024 | 60058 | 40015 | 20015 | 20000 | 20010 | 20034 | 105682 | 99635 | 40078 | 20054 | 20034 | 30020 | 40000 | 20007 | 20000 | 20010 |
40024 | 60058 | 40015 | 20015 | 20000 | 20010 | 20000 | 115400 | 95435 | 40010 | 20020 | 20000 | 30020 | 40000 | 20005 | 20000 | 20010 |
Code:
steorl w0, [x6]
mov x7, 8
(fused SUBS/B.cc loop)
Result (median cycles for code): 10.7438
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30207 | 112774 | 44410 | 21706 | 0 | 22704 | 12752 | 0 | 21274 | 1971891 | 1925969 | 32366 | 11205 | 21898 | 21574 | 42398 | 20006 | 20000 | 10100 |
30204 | 107509 | 40902 | 19980 | 0 | 20922 | 10950 | 0 | 20209 | 1899928 | 1882525 | 30491 | 10401 | 20370 | 23325 | 45624 | 20087 | 20000 | 10100 |
30204 | 108134 | 41018 | 20036 | 0 | 20982 | 11069 | 0 | 21260 | 1967847 | 1912743 | 32466 | 11307 | 22163 | 24346 | 47566 | 20757 | 20000 | 10100 |
30204 | 107821 | 40755 | 19920 | 0 | 20835 | 10921 | 0 | 20426 | 1923555 | 1886831 | 30884 | 10559 | 20698 | 21836 | 43043 | 19691 | 20000 | 10100 |
30204 | 107176 | 40193 | 19585 | 0 | 20608 | 10585 | 0 | 21541 | 1969232 | 1919076 | 32904 | 11498 | 22444 | 21949 | 43275 | 20031 | 20000 | 10100 |
30204 | 106194 | 39683 | 19422 | 0 | 20261 | 10177 | 0 | 20324 | 1949134 | 1905018 | 30705 | 10496 | 20532 | 20474 | 40497 | 19508 | 20000 | 10100 |
30204 | 106460 | 39746 | 19285 | 0 | 20461 | 10285 | 0 | 20682 | 1921210 | 1887436 | 31383 | 10801 | 21175 | 21633 | 42608 | 19713 | 20000 | 10100 |
30204 | 108042 | 40070 | 19540 | 0 | 20530 | 10437 | 0 | 20195 | 1926291 | 1898922 | 30462 | 10367 | 20325 | 22732 | 44341 | 19990 | 20000 | 10100 |
30204 | 108144 | 41044 | 20078 | 0 | 20966 | 11022 | 0 | 20610 | 1977648 | 1918126 | 31233 | 10724 | 21046 | 21240 | 42007 | 19896 | 20000 | 10100 |
30204 | 107649 | 40615 | 19956 | 0 | 20659 | 10720 | 0 | 20144 | 1953879 | 1926410 | 30377 | 10333 | 20254 | 21989 | 43361 | 19952 | 20000 | 10100 |
Result (median cycles for code): 11.4198
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
30025 | 114873 | 46622 | 22356 | 24266 | 14736 | 24659 | 2072347 | 1998032 | 38615 | 13969 | 27789 | 27381 | 53318 | 22410 | 20000 | 0 | 10010 |
30024 | 114347 | 46479 | 22463 | 24016 | 14081 | 24873 | 2081691 | 2006469 | 39167 | 14306 | 28010 | 27707 | 54176 | 22395 | 20000 | 0 | 10010 |
30024 | 113946 | 46497 | 22238 | 24259 | 14296 | 25060 | 2071495 | 1998250 | 39483 | 14440 | 28343 | 28149 | 54799 | 22516 | 20000 | 0 | 10010 |
30024 | 114364 | 46667 | 22552 | 24115 | 14095 | 24424 | 2079403 | 2004233 | 38330 | 13916 | 27554 | 27874 | 53982 | 22461 | 20000 | 0 | 10010 |
30024 | 113946 | 46561 | 22313 | 24248 | 14491 | 24788 | 2085492 | 2009681 | 38976 | 14204 | 28146 | 28474 | 55299 | 21975 | 20000 | 0 | 10010 |
30024 | 113865 | 46128 | 22293 | 23835 | 13745 | 25233 | 2087803 | 2012101 | 39925 | 14708 | 28526 | 29210 | 56325 | 21988 | 20000 | 0 | 10010 |
30025 | 114410 | 46908 | 22854 | 24054 | 13939 | 25621 | 2087949 | 2012147 | 40611 | 15003 | 29211 | 27936 | 54112 | 22111 | 20000 | 0 | 10010 |
30024 | 113733 | 45796 | 22017 | 23779 | 14006 | 24718 | 2075643 | 2000428 | 39009 | 14301 | 28039 | 27682 | 53754 | 22066 | 20000 | 0 | 10010 |
30024 | 114270 | 46538 | 22459 | 24079 | 13994 | 24974 | 2075091 | 2000107 | 39437 | 14476 | 28396 | 28656 | 55558 | 21724 | 20000 | 0 | 10010 |
30024 | 113981 | 46419 | 22316 | 24103 | 14077 | 24406 | 2075912 | 2001819 | 38292 | 13900 | 27249 | 28044 | 54488 | 22261 | 20000 | 0 | 10010 |