Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
str d0, [x6, #0x10]!
(no loop instructions)
Retires: 1.000
Issues: 2.000
Integer unit issues: 1.001
Load/store unit issues: 1.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
1005 | 1766 | 2059 | 1041 | 1018 | 1040 | 1000 | 4637 | 18487 | 2000 | 1000 | 0 | 2000 | 0 | 1001 | 1000 | 0 | 0 |
1004 | 1141 | 2001 | 1001 | 1000 | 1000 | 1000 | 4789 | 18505 | 2000 | 1000 | 0 | 2000 | 0 | 1001 | 1000 | 0 | 0 |
1004 | 1181 | 2001 | 1001 | 1000 | 1000 | 1000 | 4789 | 18415 | 2000 | 1000 | 0 | 2000 | 0 | 1001 | 1000 | 0 | 0 |
1004 | 1160 | 2001 | 1001 | 1000 | 1000 | 1000 | 4789 | 18721 | 2000 | 1000 | 0 | 2000 | 0 | 1001 | 1000 | 0 | 0 |
1004 | 1170 | 2001 | 1001 | 1000 | 1000 | 1000 | 4789 | 19333 | 2000 | 1000 | 0 | 2000 | 0 | 1001 | 1000 | 0 | 0 |
1004 | 1178 | 2001 | 1001 | 1000 | 1000 | 1000 | 4789 | 18900 | 2000 | 1000 | 0 | 2000 | 0 | 1001 | 1000 | 0 | 0 |
1004 | 1168 | 2001 | 1001 | 1000 | 1000 | 1000 | 4725 | 19135 | 2000 | 1000 | 0 | 2000 | 0 | 1001 | 1000 | 0 | 0 |
1004 | 1172 | 2001 | 1001 | 1000 | 1000 | 1000 | 4789 | 19801 | 2000 | 1000 | 0 | 2000 | 0 | 1001 | 1000 | 0 | 0 |
1004 | 1179 | 2001 | 1001 | 1000 | 1000 | 1000 | 4789 | 19277 | 2000 | 1000 | 0 | 2000 | 0 | 1001 | 1000 | 0 | 0 |
1004 | 1125 | 2001 | 1001 | 1000 | 1000 | 1000 | 4793 | 19495 | 2000 | 1000 | 0 | 2000 | 0 | 1001 | 1000 | 0 | 0 |
Code:
str d0, [x6, #0x10]!
(fused SUBS/B.cc loop)
Result (median cycles for code): 1.1376
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
10214 | 14999 | 20703 | 10523 | 10180 | 10524 | 10000 | 43466 | 195977 | 20100 | 200 | 10004 | 200 | 20020 | 10005 | 10000 | 100 |
10204 | 11462 | 20105 | 10105 | 10000 | 10106 | 10001 | 43479 | 197608 | 20105 | 200 | 10008 | 200 | 20016 | 10003 | 10000 | 100 |
10204 | 11578 | 20104 | 10104 | 10000 | 10104 | 10001 | 43480 | 198401 | 20105 | 200 | 10008 | 200 | 20016 | 10003 | 10000 | 100 |
10204 | 11542 | 20104 | 10104 | 10000 | 10104 | 10004 | 43478 | 198430 | 20112 | 200 | 10012 | 200 | 20016 | 10003 | 10000 | 100 |
10204 | 11374 | 20101 | 10101 | 10000 | 10104 | 10002 | 43522 | 194786 | 20106 | 200 | 10008 | 200 | 20008 | 10001 | 10000 | 100 |
10204 | 11378 | 20103 | 10103 | 10000 | 10104 | 10002 | 43522 | 194345 | 20106 | 200 | 10008 | 200 | 20016 | 10003 | 10000 | 100 |
10204 | 11451 | 20104 | 10104 | 10000 | 10104 | 10002 | 43486 | 195344 | 20106 | 200 | 10008 | 200 | 20016 | 10004 | 10000 | 100 |
10204 | 11504 | 20104 | 10104 | 10000 | 10104 | 10000 | 43493 | 195680 | 20104 | 200 | 10008 | 200 | 20016 | 10004 | 10000 | 100 |
10204 | 11346 | 20103 | 10103 | 10000 | 10104 | 10000 | 43487 | 194313 | 20100 | 200 | 10004 | 200 | 20016 | 10004 | 10000 | 100 |
10204 | 11438 | 20104 | 10104 | 10000 | 10104 | 10001 | 43511 | 194432 | 20105 | 200 | 10008 | 200 | 20008 | 10001 | 10000 | 100 |
Result (median cycles for code): 1.1269
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
10034 | 15469 | 20613 | 10433 | 10180 | 10435 | 10002 | 44113 | 191162 | 20016 | 20 | 10008 | 20 | 20000 | 10001 | 10000 | 10 |
10024 | 11295 | 20011 | 10011 | 10000 | 10010 | 10000 | 43090 | 191683 | 20010 | 20 | 10000 | 20 | 20000 | 10001 | 10000 | 10 |
10024 | 11229 | 20011 | 10011 | 10000 | 10010 | 10000 | 43097 | 191673 | 20010 | 20 | 10000 | 20 | 20000 | 10001 | 10000 | 10 |
10024 | 11280 | 20011 | 10011 | 10000 | 10010 | 10000 | 43096 | 190773 | 20010 | 20 | 10000 | 20 | 20000 | 10001 | 10000 | 10 |
10024 | 11340 | 20011 | 10011 | 10000 | 10010 | 10000 | 43093 | 192205 | 20010 | 20 | 10000 | 20 | 20000 | 10001 | 10000 | 10 |
10024 | 11274 | 20011 | 10011 | 10000 | 10010 | 10000 | 43097 | 192203 | 20010 | 20 | 10000 | 20 | 20000 | 10001 | 10000 | 10 |
10024 | 11329 | 20011 | 10011 | 10000 | 10010 | 10000 | 43091 | 194039 | 20010 | 20 | 10000 | 20 | 20000 | 10001 | 10000 | 10 |
10024 | 11467 | 20011 | 10011 | 10000 | 10010 | 10000 | 43073 | 198344 | 20010 | 20 | 10000 | 20 | 20000 | 10001 | 10000 | 10 |
10024 | 11449 | 20011 | 10011 | 10000 | 10010 | 10000 | 43084 | 194527 | 20010 | 20 | 10000 | 20 | 20000 | 10001 | 10000 | 10 |
10024 | 11294 | 20011 | 10011 | 10000 | 10010 | 10000 | 43097 | 191791 | 20010 | 20 | 10000 | 20 | 20000 | 10001 | 10000 | 10 |
Count: 8
Code:
str d0, [x6, #0x10]! str d0, [x7, #0x10]! str d0, [x8, #0x10]! str d0, [x9, #0x10]! str d0, [x10, #0x10]! str d0, [x11, #0x10]! str d0, [x12, #0x10]! str d0, [x13, #0x10]!
mov x7, x6 mov x8, x6 mov x9, x6 mov x10, x6 mov x11, x6 mov x12, x6 mov x13, x6
(fused SUBS/B.cc loop)
Result (median cycles for code divided by count): 1.0010
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
80214 | 82441 | 160692 | 80512 | 80180 | 80511 | 80002 | 240312 | 1360630 | 160106 | 200 | 80008 | 200 | 160016 | 80005 | 80000 | 100 |
80204 | 80085 | 160105 | 80105 | 80000 | 80104 | 80002 | 240312 | 1360524 | 160106 | 200 | 80008 | 200 | 160016 | 80005 | 80000 | 100 |
80204 | 80077 | 160105 | 80105 | 80000 | 80104 | 80002 | 240312 | 1360573 | 160106 | 200 | 80008 | 200 | 160016 | 80005 | 80000 | 100 |
80204 | 80072 | 160105 | 80105 | 80000 | 80104 | 80002 | 240312 | 1360573 | 160106 | 200 | 80008 | 200 | 160016 | 80005 | 80000 | 100 |
80204 | 80077 | 160105 | 80105 | 80000 | 80104 | 80002 | 240312 | 1360573 | 160106 | 200 | 80008 | 200 | 160016 | 80005 | 80000 | 100 |
80204 | 80077 | 160105 | 80105 | 80000 | 80104 | 80035 | 240419 | 1361259 | 160175 | 200 | 80048 | 200 | 160016 | 80005 | 80000 | 100 |
80204 | 80077 | 160105 | 80105 | 80000 | 80104 | 80002 | 240312 | 1360573 | 160106 | 200 | 80008 | 200 | 160016 | 80005 | 80000 | 100 |
80204 | 80077 | 160105 | 80105 | 80000 | 80104 | 80002 | 240312 | 1360573 | 160106 | 200 | 80008 | 200 | 160016 | 80005 | 80000 | 100 |
80204 | 80077 | 160105 | 80105 | 80000 | 80104 | 80002 | 240312 | 1360573 | 160106 | 200 | 80008 | 200 | 160016 | 80005 | 80000 | 100 |
80204 | 80084 | 160105 | 80105 | 80000 | 80104 | 80002 | 240312 | 1360573 | 160106 | 200 | 80008 | 200 | 160016 | 80005 | 80000 | 100 |
Result (median cycles for code divided by count): 1.0007
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
80034 | 82110 | 160612 | 80432 | 0 | 80180 | 80432 | 0 | 80003 | 240048 | 1360292 | 160019 | 20 | 80010 | 20 | 160016 | 80005 | 80000 | 10 |
80024 | 80066 | 160011 | 80011 | 0 | 80000 | 80010 | 0 | 80000 | 240030 | 1360225 | 160010 | 20 | 80000 | 20 | 160000 | 80001 | 80000 | 10 |
80025 | 80086 | 160065 | 80048 | 0 | 80017 | 80052 | 0 | 80000 | 240030 | 1361758 | 160010 | 20 | 80000 | 20 | 160000 | 80001 | 80000 | 10 |
80024 | 80137 | 160011 | 80011 | 0 | 80000 | 80010 | 0 | 80000 | 240030 | 1361665 | 160010 | 20 | 80000 | 20 | 160000 | 80001 | 80000 | 10 |
80024 | 80137 | 160011 | 80011 | 0 | 80000 | 80010 | 0 | 80000 | 240030 | 1361629 | 160010 | 20 | 80000 | 20 | 160000 | 80001 | 80000 | 10 |
80024 | 80137 | 160011 | 80011 | 0 | 80000 | 80010 | 0 | 80000 | 240030 | 1361647 | 160010 | 20 | 80000 | 20 | 160000 | 80001 | 80000 | 10 |
80024 | 80137 | 160011 | 80011 | 0 | 80000 | 80010 | 0 | 80000 | 240030 | 1361647 | 160010 | 20 | 80000 | 20 | 160000 | 80001 | 80000 | 10 |
80024 | 80137 | 160011 | 80011 | 0 | 80000 | 80010 | 0 | 80000 | 240030 | 1361647 | 160010 | 20 | 80000 | 20 | 160000 | 80001 | 80000 | 10 |
80024 | 80141 | 160011 | 80011 | 0 | 80000 | 80010 | 0 | 80000 | 240030 | 1361683 | 160010 | 20 | 80000 | 20 | 160096 | 80037 | 80000 | 10 |
80024 | 80066 | 160011 | 80011 | 0 | 80000 | 80010 | 0 | 80000 | 240030 | 1360225 | 160010 | 20 | 80000 | 20 | 160000 | 80001 | 80000 | 10 |