Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
ldrh w0, [x6], #8
mov x0, 1 mov x1, 2 mov x8, 0
(no loop instructions)
Retires: 2.000
Issues: 2.000
Integer unit issues: 1.001
Load/store unit issues: 1.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
2005 | 1239 | 2040 | 1021 | 1019 | 1042 | 1000 | 20945 | 17520 | 2000 | 1000 | 1000 | 1000 | 1000 | 1001 | 1000 | 1000 |
2004 | 1073 | 2001 | 1001 | 1000 | 1000 | 1000 | 21231 | 17625 | 2000 | 1000 | 1000 | 1000 | 1000 | 1001 | 1000 | 1000 |
2004 | 1082 | 2001 | 1001 | 1000 | 1000 | 1000 | 21290 | 17983 | 2000 | 1000 | 1000 | 1000 | 1000 | 1001 | 1000 | 1000 |
2004 | 1078 | 2001 | 1001 | 1000 | 1000 | 1000 | 21444 | 17625 | 2000 | 1000 | 1000 | 1000 | 1000 | 1001 | 1000 | 1000 |
2004 | 1073 | 2001 | 1001 | 1000 | 1000 | 1000 | 21319 | 17857 | 2000 | 1000 | 1000 | 1000 | 1000 | 1001 | 1000 | 1000 |
2004 | 1079 | 2001 | 1001 | 1000 | 1000 | 1000 | 20935 | 17986 | 2000 | 1000 | 1000 | 1000 | 1000 | 1001 | 1000 | 1000 |
2004 | 1100 | 2001 | 1001 | 1000 | 1000 | 1000 | 21423 | 17626 | 2000 | 1000 | 1000 | 1000 | 1000 | 1001 | 1000 | 1000 |
2004 | 1097 | 2001 | 1001 | 1000 | 1000 | 1000 | 21455 | 17626 | 2000 | 1000 | 1000 | 1000 | 1000 | 1001 | 1000 | 1000 |
2004 | 1082 | 2001 | 1001 | 1000 | 1000 | 1000 | 21421 | 17605 | 2000 | 1000 | 1000 | 1000 | 1000 | 1001 | 1000 | 1000 |
2004 | 1078 | 2001 | 1001 | 1000 | 1000 | 1000 | 20807 | 17828 | 2000 | 1000 | 1000 | 1000 | 1000 | 1001 | 1000 | 1000 |
Chain cycles: 3
Code:
ldrh w0, [x6], #8 eor x8, x8, x0 eor x8, x8, x0 add x6, x6, x8
mov x0, 1 mov x1, 2 mov x8, 0
(fused SUBS/B.cc loop)
Result (median cycles for code, minus 3 chain cycles): 4.0109
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
50209 | 71643 | 50161 | 40156 | 10005 | 40247 | 10003 | 1850177 | 534646 | 50109 | 40212 | 10004 | 70221 | 10004 | 40004 | 10000 | 40100 |
50204 | 70104 | 50103 | 40103 | 10000 | 40106 | 10003 | 1849956 | 534620 | 50109 | 40212 | 10004 | 70221 | 10004 | 40003 | 10000 | 40100 |
50204 | 70092 | 50103 | 40103 | 10000 | 40106 | 10003 | 1850010 | 534635 | 50109 | 40212 | 10004 | 70221 | 10004 | 40003 | 10000 | 40100 |
50204 | 70090 | 50103 | 40103 | 10000 | 40106 | 10003 | 1849929 | 534611 | 50109 | 40212 | 10004 | 70221 | 10004 | 40003 | 10000 | 40100 |
50204 | 70094 | 50103 | 40103 | 10000 | 40106 | 10003 | 1849956 | 534620 | 50109 | 40212 | 10004 | 70221 | 10004 | 40003 | 10000 | 40100 |
50204 | 70094 | 50103 | 40103 | 10000 | 40106 | 10003 | 1850037 | 534647 | 50109 | 40212 | 10004 | 70221 | 10004 | 40003 | 10000 | 40100 |
50204 | 70093 | 50103 | 40103 | 10000 | 40106 | 10003 | 1849983 | 534628 | 50109 | 40212 | 10004 | 70221 | 10004 | 40003 | 10000 | 40100 |
50204 | 70091 | 50103 | 40103 | 10000 | 40106 | 10003 | 1849929 | 534611 | 50109 | 40212 | 10004 | 70221 | 10004 | 40003 | 10000 | 40100 |
50204 | 70091 | 50103 | 40103 | 10000 | 40106 | 10003 | 1849983 | 534627 | 50109 | 40212 | 10004 | 70221 | 10004 | 40003 | 10000 | 40100 |
50204 | 70119 | 50103 | 40103 | 10000 | 40106 | 10003 | 1850037 | 534646 | 50109 | 40212 | 10004 | 70221 | 10004 | 40003 | 10000 | 40100 |
Result (median cycles for code, minus 3 chain cycles): 4.0246
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
50029 | 71242 | 50069 | 40064 | 10005 | 40156 | 10003 | 1851887 | 535428 | 50019 | 40032 | 10004 | 70020 | 10000 | 40007 | 10000 | 0 | 40010 |
50024 | 70164 | 50017 | 40017 | 10000 | 40010 | 10000 | 1851929 | 535498 | 50010 | 40020 | 10000 | 70020 | 10000 | 40007 | 10000 | 0 | 40010 |
50024 | 70162 | 50017 | 40017 | 10000 | 40010 | 10012 | 1853872 | 536036 | 50062 | 40071 | 10014 | 70020 | 10000 | 40007 | 10000 | 0 | 40010 |
50025 | 70248 | 50030 | 40028 | 10002 | 40050 | 10012 | 1856027 | 536734 | 50061 | 40071 | 10013 | 70020 | 10000 | 40007 | 10000 | 0 | 40010 |
50024 | 70162 | 50017 | 40017 | 10000 | 40010 | 10000 | 1851929 | 535498 | 50010 | 40020 | 10000 | 70109 | 10013 | 40020 | 10000 | 0 | 40010 |
50024 | 70185 | 50017 | 40017 | 10000 | 40010 | 10000 | 1852577 | 535703 | 50010 | 40020 | 10000 | 70020 | 10000 | 40007 | 10000 | 0 | 40010 |
50024 | 70175 | 50017 | 40017 | 10000 | 40010 | 10000 | 1852199 | 535584 | 50010 | 40020 | 10000 | 70020 | 10000 | 40007 | 10000 | 0 | 40010 |
50024 | 70185 | 50017 | 40017 | 10000 | 40010 | 10000 | 1852496 | 535676 | 50010 | 40020 | 10000 | 70020 | 10000 | 40007 | 10000 | 0 | 40010 |
50024 | 70169 | 50017 | 40017 | 10000 | 40010 | 10000 | 1852280 | 535610 | 50010 | 40020 | 10000 | 70020 | 10000 | 40007 | 10000 | 0 | 40010 |
50024 | 70188 | 50017 | 40017 | 10000 | 40010 | 10000 | 1852550 | 535690 | 50010 | 40020 | 10000 | 70020 | 10000 | 40007 | 10000 | 0 | 40010 |
Count: 8
Code:
ldrh w0, [x6], #8 ldrh w0, [x7], #8 ldrh w0, [x8], #8 ldrh w0, [x9], #8 ldrh w0, [x10], #8 ldrh w0, [x11], #8 ldrh w0, [x12], #8 ldrh w0, [x13], #8
mov x7, x6 mov x8, x6 mov x9, x6 mov x10, x6 mov x11, x6 mov x12, x6 mov x13, x6
(fused SUBS/B.cc loop)
Result (median cycles for code divided by count): 0.5402
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
160209 | 44740 | 160433 | 80313 | 80120 | 80316 | 80011 | 240578 | 645823 | 160123 | 80212 | 80012 | 80212 | 80012 | 80009 | 80000 | 80100 |
160204 | 43224 | 160109 | 80109 | 80000 | 80112 | 80012 | 240485 | 639297 | 160124 | 80212 | 80012 | 80212 | 80012 | 80009 | 80000 | 80100 |
160204 | 43215 | 160109 | 80109 | 80000 | 80112 | 80008 | 240485 | 643103 | 160120 | 80212 | 80012 | 80212 | 80012 | 80009 | 80000 | 80100 |
160204 | 43218 | 160109 | 80109 | 80000 | 80112 | 80012 | 240485 | 646959 | 160124 | 80212 | 80012 | 80212 | 80012 | 80009 | 80000 | 80100 |
160204 | 43218 | 160109 | 80109 | 80000 | 80112 | 80012 | 240485 | 643008 | 160124 | 80212 | 80012 | 80254 | 80054 | 80051 | 80000 | 80100 |
160204 | 43225 | 160110 | 80109 | 80001 | 80112 | 80010 | 240485 | 645939 | 160122 | 80212 | 80012 | 80212 | 80012 | 80009 | 80000 | 80100 |
160204 | 43216 | 160109 | 80109 | 80000 | 80112 | 80010 | 240485 | 638801 | 160122 | 80212 | 80012 | 80212 | 80012 | 80009 | 80000 | 80100 |
160204 | 43217 | 160109 | 80109 | 80000 | 80112 | 80012 | 240485 | 636377 | 160124 | 80212 | 80012 | 80212 | 80012 | 80009 | 80000 | 80100 |
160204 | 43217 | 160109 | 80109 | 80000 | 80112 | 80010 | 240485 | 642149 | 160122 | 80212 | 80012 | 80208 | 80008 | 80007 | 80000 | 80100 |
160204 | 43216 | 160109 | 80109 | 80000 | 80112 | 80012 | 240485 | 642848 | 160124 | 80212 | 80012 | 80212 | 80012 | 80009 | 80000 | 80100 |
Result (median cycles for code divided by count): 0.5407
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
160030 | 44494 | 160408 | 80261 | 80147 | 80264 | 80012 | 240516 | 644538 | 160034 | 80032 | 80012 | 80032 | 80012 | 80009 | 80000 | 80010 |
160024 | 43254 | 160011 | 80011 | 80000 | 80010 | 80000 | 240503 | 645315 | 160010 | 80020 | 80000 | 80020 | 80000 | 80001 | 80000 | 80010 |
160024 | 43253 | 160011 | 80011 | 80000 | 80010 | 80000 | 240500 | 641473 | 160010 | 80020 | 80000 | 80020 | 80000 | 80001 | 80000 | 80010 |
160024 | 43253 | 160011 | 80011 | 80000 | 80010 | 80000 | 240469 | 642330 | 160010 | 80020 | 80000 | 80020 | 80000 | 80001 | 80000 | 80010 |
160024 | 43253 | 160011 | 80011 | 80000 | 80010 | 80000 | 240500 | 643611 | 160010 | 80020 | 80000 | 80020 | 80000 | 80001 | 80000 | 80010 |
160024 | 43253 | 160011 | 80011 | 80000 | 80010 | 80000 | 240502 | 646150 | 160010 | 80020 | 80000 | 80020 | 80000 | 80001 | 80000 | 80010 |
160024 | 43253 | 160011 | 80011 | 80000 | 80010 | 80000 | 240495 | 640490 | 160010 | 80020 | 80000 | 80020 | 80000 | 80001 | 80000 | 80010 |
160024 | 43253 | 160011 | 80011 | 80000 | 80010 | 80000 | 240500 | 641567 | 160010 | 80020 | 80000 | 80020 | 80000 | 80001 | 80000 | 80010 |
160024 | 43253 | 160011 | 80011 | 80000 | 80010 | 80000 | 240496 | 643421 | 160010 | 80020 | 80000 | 80020 | 80000 | 80001 | 80000 | 80010 |
160024 | 43253 | 160011 | 80011 | 80000 | 80010 | 80000 | 240480 | 641547 | 160010 | 80020 | 80000 | 80020 | 80000 | 80001 | 80000 | 80010 |