Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
tbx v0.8b, { v1.16b, v2.16b, v3.16b }, v4.8b
movi v0.16b, 1 movi v1.16b, 2 movi v2.16b, 3 movi v3.16b, 4 movi v4.16b, 5
(no loop instructions)
Retires: 3.000
Issues: 3.000
Integer unit issues: 0.001
Load/store unit issues: 0.000
SIMD/FP unit issues: 3.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | dispatch simd uop (57) | ldst uops in schedulers (5b) | dispatch uop (78) | map simd uop (7e) | map simd uop inputs (81) | ? int output thing (e9) | ? simd retires (ee) |
3004 | 6033 | 3001 | 1 | 3000 | 3000 | 152248 | 3000 | 3000 | 9000 | 1 | 3000 |
3004 | 6033 | 3001 | 1 | 3000 | 3000 | 152248 | 3000 | 3000 | 9000 | 1 | 3000 |
3004 | 6033 | 3001 | 1 | 3000 | 3000 | 152248 | 3000 | 3000 | 9000 | 1 | 3000 |
3004 | 6033 | 3001 | 1 | 3000 | 3000 | 152248 | 3000 | 3000 | 9000 | 1 | 3000 |
3004 | 6033 | 3001 | 1 | 3000 | 3000 | 152248 | 3000 | 3000 | 9000 | 1 | 3000 |
3004 | 6033 | 3001 | 1 | 3000 | 3000 | 152248 | 3000 | 3000 | 9000 | 1 | 3000 |
3004 | 6033 | 3001 | 1 | 3000 | 3000 | 152248 | 3000 | 3000 | 9000 | 1 | 3000 |
3004 | 6033 | 3001 | 1 | 3000 | 3000 | 152248 | 3000 | 3000 | 9000 | 1 | 3000 |
3004 | 6033 | 3001 | 1 | 3000 | 3000 | 152248 | 3000 | 3000 | 9000 | 1 | 3000 |
3004 | 6033 | 3001 | 1 | 3000 | 3000 | 152248 | 3000 | 3000 | 9000 | 1 | 3000 |
Code:
tbx v0.8b, { v1.16b, v2.16b, v3.16b }, v4.8b
movi v0.16b, 1 movi v1.16b, 2 movi v2.16b, 3 movi v3.16b, 4 movi v4.16b, 5
(fused SUBS/B.cc loop)
Result (median cycles for code): 6.0033
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | dispatch int uop (56) | dispatch simd uop (57) | int uops in schedulers (59) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map simd uop (7e) | map int uop inputs (7f) | map simd uop inputs (81) | ? int output thing (e9) | ? simd retires (ee) | ? int retires (ef) |
30204 | 60174 | 30236 | 200 | 30036 | 199 | 30108 | 700 | 1529247 | 30200 | 200 | 30006 | 200 | 90018 | 101 | 30000 | 100 |
30204 | 60033 | 30201 | 201 | 30000 | 200 | 30000 | 700 | 1529248 | 30200 | 200 | 30004 | 200 | 90012 | 101 | 30000 | 100 |
30204 | 60033 | 30201 | 201 | 30000 | 200 | 30000 | 700 | 1529248 | 30200 | 200 | 30004 | 200 | 90012 | 101 | 30000 | 100 |
30204 | 60033 | 30201 | 201 | 30000 | 200 | 30000 | 687 | 1529580 | 30233 | 200 | 30048 | 200 | 90012 | 101 | 30000 | 100 |
30204 | 60033 | 30201 | 201 | 30000 | 200 | 30000 | 700 | 1529247 | 30200 | 200 | 30006 | 200 | 90012 | 101 | 30000 | 100 |
30205 | 60066 | 30210 | 202 | 30008 | 201 | 30034 | 700 | 1529248 | 30200 | 200 | 30004 | 200 | 90012 | 101 | 30000 | 100 |
30204 | 60033 | 30201 | 201 | 30000 | 200 | 30000 | 700 | 1529248 | 30200 | 200 | 30004 | 200 | 90012 | 101 | 30000 | 100 |
30204 | 60033 | 30201 | 201 | 30000 | 200 | 30000 | 700 | 1529248 | 30200 | 200 | 30004 | 200 | 90012 | 101 | 30000 | 100 |
30204 | 60033 | 30201 | 201 | 30000 | 200 | 30000 | 700 | 1529248 | 30200 | 200 | 30004 | 200 | 90012 | 101 | 30000 | 100 |
30204 | 60033 | 30201 | 201 | 30000 | 200 | 30000 | 700 | 1529248 | 30200 | 200 | 30004 | 200 | 90144 | 101 | 30000 | 100 |
Result (median cycles for code): 6.0033
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | dispatch int uop (56) | dispatch simd uop (57) | int uops in schedulers (59) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map simd uop (7e) | map int uop inputs (7f) | map simd uop inputs (81) | ? int output thing (e9) | ? simd retires (ee) | ? int retires (ef) |
30024 | 60033 | 30011 | 11 | 30000 | 10 | 30000 | 30 | 1529248 | 30010 | 20 | 30004 | 20 | 90012 | 1 | 30000 | 10 |
30024 | 60033 | 30011 | 11 | 30000 | 10 | 30000 | 30 | 1529248 | 30010 | 20 | 30000 | 20 | 90000 | 1 | 30000 | 10 |
30024 | 60033 | 30011 | 11 | 30000 | 10 | 30000 | 30 | 1529248 | 30010 | 20 | 30000 | 20 | 90000 | 1 | 30000 | 10 |
30024 | 60033 | 30011 | 11 | 30000 | 10 | 30000 | 30 | 1529248 | 30010 | 20 | 30000 | 20 | 90000 | 1 | 30000 | 10 |
30024 | 60033 | 30011 | 11 | 30000 | 10 | 30000 | 30 | 1529248 | 30010 | 20 | 30000 | 20 | 90000 | 1 | 30000 | 10 |
30024 | 60033 | 30011 | 11 | 30000 | 10 | 30000 | 30 | 1529248 | 30010 | 20 | 30000 | 20 | 90144 | 1 | 30000 | 10 |
30024 | 60033 | 30011 | 11 | 30000 | 10 | 30000 | 34 | 1529580 | 30045 | 20 | 30047 | 20 | 90000 | 1 | 30000 | 10 |
30024 | 60033 | 30011 | 11 | 30000 | 10 | 30000 | 30 | 1529248 | 30010 | 20 | 30000 | 20 | 90000 | 1 | 30000 | 10 |
30024 | 60033 | 30011 | 11 | 30000 | 10 | 30000 | 30 | 1529248 | 30010 | 20 | 30000 | 20 | 90000 | 1 | 30000 | 10 |
30024 | 60033 | 30011 | 11 | 30000 | 10 | 30000 | 30 | 1529248 | 30010 | 20 | 30000 | 20 | 90000 | 1 | 30000 | 10 |
Chain cycles: 2
Code:
movi v0.16b, 0 tbx v0.8b, { v1.16b, v2.16b, v3.16b }, v4.8b add v1.16b, v0.16b, v0.16b
movi v0.16b, 1 movi v1.16b, 2 movi v2.16b, 3 movi v3.16b, 4 movi v4.16b, 5
(fused SUBS/B.cc loop)
Result (median cycles for code, minus 2 chain cycles): 6.0033
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | dispatch int uop (56) | dispatch simd uop (57) | int uops in schedulers (59) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map simd uop (7e) | map int uop inputs (7f) | map simd uop inputs (81) | ? int output thing (e9) | ? simd retires (ee) | ? int retires (ef) |
50204 | 80033 | 40101 | 101 | 40000 | 100 | 40000 | 300 | 2039248 | 40100 | 200 | 40003 | 200 | 110009 | 1 | 50000 | 100 |
50205 | 80066 | 40109 | 101 | 40008 | 100 | 40034 | 300 | 2039248 | 40100 | 200 | 40003 | 200 | 110009 | 1 | 50000 | 100 |
50204 | 80033 | 40101 | 101 | 40000 | 100 | 40000 | 300 | 2039248 | 40100 | 200 | 40003 | 200 | 110009 | 1 | 50000 | 100 |
50204 | 80033 | 40101 | 101 | 40000 | 100 | 40000 | 300 | 2039248 | 40100 | 200 | 40003 | 200 | 110009 | 1 | 50000 | 100 |
50204 | 80033 | 40101 | 101 | 40000 | 100 | 40000 | 300 | 2039248 | 40100 | 200 | 40003 | 200 | 110009 | 1 | 50000 | 100 |
50204 | 80033 | 40101 | 101 | 40000 | 100 | 40000 | 300 | 2039248 | 40100 | 200 | 40003 | 200 | 110009 | 1 | 50000 | 100 |
50204 | 80033 | 40101 | 101 | 40000 | 100 | 40000 | 300 | 2039580 | 40134 | 200 | 40044 | 200 | 110009 | 1 | 50000 | 100 |
50204 | 80033 | 40101 | 101 | 40000 | 100 | 40000 | 300 | 2039248 | 40100 | 200 | 40003 | 200 | 110009 | 1 | 50000 | 100 |
50204 | 80033 | 40101 | 101 | 40000 | 100 | 40000 | 300 | 2039248 | 40100 | 200 | 40003 | 200 | 110009 | 1 | 50000 | 100 |
50204 | 80033 | 40101 | 101 | 40000 | 100 | 40000 | 300 | 2039248 | 40100 | 200 | 40003 | 200 | 110009 | 1 | 50000 | 100 |
Result (median cycles for code, minus 2 chain cycles): 6.0033
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | dispatch int uop (56) | dispatch simd uop (57) | int uops in schedulers (59) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map simd uop (7e) | map int uop inputs (7f) | map simd uop inputs (81) | ? int output thing (e9) | ? simd retires (ee) | ? int retires (ef) |
50024 | 80033 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 2039248 | 40010 | 20 | 40003 | 20 | 110009 | 1 | 50000 | 10 |
50024 | 80033 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 2039248 | 40010 | 20 | 40000 | 20 | 110000 | 1 | 50000 | 10 |
50025 | 80066 | 40020 | 12 | 40008 | 11 | 40034 | 30 | 2039248 | 40010 | 20 | 40000 | 20 | 110000 | 1 | 50000 | 10 |
50024 | 80033 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 2039248 | 40010 | 20 | 40000 | 20 | 110000 | 1 | 50000 | 10 |
50024 | 80033 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 2039248 | 40010 | 20 | 40000 | 20 | 110000 | 1 | 50000 | 10 |
50024 | 80033 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 2039248 | 40010 | 20 | 40000 | 20 | 110000 | 1 | 50000 | 10 |
50024 | 80033 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 2039248 | 40010 | 20 | 40000 | 20 | 110000 | 1 | 50000 | 10 |
50024 | 80033 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 2039248 | 40010 | 20 | 40000 | 20 | 110000 | 1 | 50000 | 10 |
50024 | 80033 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 2039248 | 40010 | 20 | 40000 | 20 | 110000 | 1 | 50000 | 10 |
50024 | 80033 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 2039248 | 40010 | 20 | 40000 | 20 | 110000 | 1 | 50000 | 10 |
Chain cycles: 2
Code:
movi v0.16b, 0 tbx v0.8b, { v1.16b, v2.16b, v3.16b }, v4.8b add v2.16b, v0.16b, v0.16b
movi v0.16b, 1 movi v1.16b, 2 movi v2.16b, 3 movi v3.16b, 4 movi v4.16b, 5
(fused SUBS/B.cc loop)
Result (median cycles for code, minus 2 chain cycles): 4.0035
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | dispatch int uop (56) | dispatch simd uop (57) | int uops in schedulers (59) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map simd uop (7e) | map int uop inputs (7f) | map simd uop inputs (81) | ? int output thing (e9) | ? simd retires (ee) | ? int retires (ef) |
50204 | 60035 | 40101 | 101 | 40000 | 100 | 40000 | 300 | 1519242 | 40100 | 200 | 40008 | 200 | 110017 | 1 | 50000 | 100 |
50204 | 60035 | 40101 | 101 | 40000 | 100 | 40000 | 300 | 1519254 | 40100 | 200 | 40006 | 200 | 110017 | 1 | 50000 | 100 |
50204 | 60035 | 40101 | 101 | 40000 | 100 | 40000 | 300 | 1519254 | 40100 | 200 | 40006 | 200 | 110017 | 1 | 50000 | 100 |
50204 | 60035 | 40101 | 101 | 40000 | 100 | 40000 | 300 | 1519254 | 40100 | 200 | 40006 | 200 | 110017 | 1 | 50000 | 100 |
50204 | 60035 | 40101 | 101 | 40000 | 100 | 40000 | 300 | 1519254 | 40100 | 200 | 40006 | 200 | 110017 | 1 | 50000 | 100 |
50204 | 60035 | 40101 | 101 | 40000 | 100 | 40000 | 300 | 1519254 | 40100 | 200 | 40006 | 200 | 110017 | 1 | 50000 | 100 |
50204 | 60035 | 40101 | 101 | 40000 | 100 | 40000 | 300 | 1519254 | 40100 | 200 | 40006 | 200 | 110017 | 1 | 50000 | 100 |
50204 | 60035 | 40101 | 101 | 40000 | 100 | 40000 | 300 | 1519254 | 40100 | 200 | 40006 | 200 | 110017 | 1 | 50000 | 100 |
50205 | 60070 | 40121 | 102 | 40019 | 101 | 40045 | 307 | 1519581 | 40147 | 202 | 40059 | 200 | 110017 | 1 | 50000 | 100 |
50204 | 60035 | 40101 | 101 | 40000 | 100 | 40000 | 300 | 1519254 | 40100 | 200 | 40006 | 200 | 110017 | 1 | 50000 | 100 |
Result (median cycles for code, minus 2 chain cycles): 4.0035
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | dispatch int uop (56) | dispatch simd uop (57) | int uops in schedulers (59) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map simd uop (7e) | map int uop inputs (7f) | map simd uop inputs (81) | ? int output thing (e9) | ? simd retires (ee) | ? int retires (ef) |
50024 | 60035 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 1519254 | 40010 | 20 | 40006 | 20 | 110000 | 1 | 50000 | 10 |
50024 | 60035 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 1519254 | 40010 | 20 | 40000 | 20 | 110566 | 2 | 50000 | 10 |
50024 | 60035 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 1519254 | 40010 | 20 | 40000 | 20 | 110000 | 1 | 50000 | 10 |
50024 | 60035 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 1519776 | 40057 | 20 | 40051 | 20 | 110000 | 1 | 50000 | 10 |
50024 | 60035 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 1519254 | 40010 | 20 | 40000 | 20 | 110000 | 1 | 50000 | 10 |
50024 | 60035 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 1519254 | 40010 | 20 | 40000 | 20 | 110000 | 1 | 50000 | 10 |
50024 | 60035 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 1519776 | 40057 | 20 | 40051 | 20 | 110000 | 1 | 50000 | 10 |
50024 | 60142 | 40060 | 12 | 40048 | 11 | 40094 | 30 | 1519254 | 40010 | 20 | 40000 | 20 | 110143 | 1 | 50000 | 10 |
50024 | 60297 | 40131 | 11 | 40120 | 10 | 40235 | 30 | 1521741 | 40292 | 20 | 40312 | 20 | 110560 | 2 | 50000 | 10 |
50024 | 60298 | 40131 | 11 | 40120 | 10 | 40235 | 30 | 1520650 | 40198 | 20 | 40203 | 20 | 111000 | 1 | 50000 | 10 |
Chain cycles: 2
Code:
movi v0.16b, 0 tbx v0.8b, { v1.16b, v2.16b, v3.16b }, v4.8b add v3.16b, v0.16b, v0.16b
movi v0.16b, 1 movi v1.16b, 2 movi v2.16b, 3 movi v3.16b, 4 movi v4.16b, 5
(fused SUBS/B.cc loop)
Result (median cycles for code, minus 2 chain cycles): 2.0037
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | dispatch int uop (56) | dispatch simd uop (57) | int uops in schedulers (59) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map simd uop (7e) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
50204 | 40037 | 40101 | 101 | 40000 | 100 | 40001 | 300 | 999232 | 40102 | 200 | 40008 | 200 | 0 | 110017 | 1 | 0 | 50000 | 100 |
50204 | 40037 | 40101 | 101 | 40000 | 100 | 40001 | 300 | 999270 | 40101 | 200 | 40006 | 200 | 0 | 110017 | 1 | 0 | 50000 | 100 |
50204 | 40037 | 40101 | 101 | 40000 | 100 | 40001 | 300 | 999270 | 40101 | 200 | 40006 | 200 | 0 | 110017 | 1 | 0 | 50000 | 100 |
50204 | 40037 | 40101 | 101 | 40000 | 100 | 40001 | 300 | 999270 | 40101 | 200 | 40006 | 200 | 0 | 110017 | 1 | 0 | 50000 | 100 |
50205 | 40072 | 40138 | 101 | 40037 | 100 | 40063 | 300 | 999270 | 40101 | 200 | 40006 | 200 | 0 | 110017 | 1 | 0 | 50000 | 100 |
50204 | 40037 | 40101 | 101 | 40000 | 100 | 40001 | 300 | 999270 | 40101 | 200 | 40006 | 200 | 0 | 110017 | 1 | 0 | 50000 | 100 |
50204 | 40037 | 40101 | 101 | 40000 | 100 | 40001 | 300 | 999270 | 40101 | 200 | 40006 | 200 | 0 | 110210 | 1 | 0 | 50000 | 100 |
50204 | 40037 | 40101 | 101 | 40000 | 100 | 40001 | 300 | 999270 | 40101 | 200 | 40006 | 200 | 0 | 110017 | 1 | 0 | 50000 | 100 |
50204 | 40037 | 40101 | 101 | 40000 | 100 | 40001 | 300 | 999270 | 40101 | 200 | 40006 | 200 | 0 | 110017 | 1 | 0 | 50000 | 100 |
50204 | 40037 | 40101 | 101 | 40000 | 100 | 40001 | 300 | 999270 | 40101 | 200 | 40006 | 200 | 0 | 110017 | 1 | 0 | 50000 | 100 |
Result (median cycles for code, minus 2 chain cycles): 2.0074
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | dispatch int uop (56) | dispatch simd uop (57) | int uops in schedulers (59) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map simd uop (7e) | map int uop inputs (7f) | map simd uop inputs (81) | ? int output thing (e9) | ? simd retires (ee) | ? int retires (ef) |
50024 | 40037 | 40011 | 11 | 40000 | 10 | 40002 | 30 | 999229 | 40010 | 20 | 40000 | 20 | 110204 | 1 | 50000 | 10 |
50024 | 40094 | 40056 | 11 | 40045 | 10 | 40069 | 30 | 999235 | 40010 | 20 | 40000 | 20 | 110000 | 1 | 50000 | 10 |
50024 | 40037 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 999265 | 40010 | 20 | 40000 | 20 | 110573 | 1 | 50000 | 10 |
50024 | 40037 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 999265 | 40010 | 20 | 40000 | 20 | 110000 | 1 | 50000 | 10 |
50024 | 40037 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 999265 | 40010 | 20 | 40000 | 20 | 110000 | 1 | 50000 | 10 |
50024 | 40346 | 40099 | 11 | 40088 | 10 | 40132 | 30 | 1001043 | 40275 | 20 | 40272 | 20 | 110567 | 1 | 50000 | 10 |
50024 | 40268 | 40189 | 11 | 40178 | 10 | 40266 | 30 | 1000691 | 40210 | 20 | 40208 | 20 | 110570 | 1 | 50000 | 10 |
50025 | 40304 | 40228 | 11 | 40217 | 10 | 40332 | 30 | 1001765 | 40345 | 20 | 40347 | 20 | 110377 | 2 | 50000 | 10 |
50024 | 40355 | 40100 | 11 | 40089 | 10 | 40133 | 30 | 999765 | 40077 | 20 | 40068 | 20 | 110382 | 1 | 50000 | 10 |
50024 | 40322 | 40232 | 11 | 40221 | 10 | 40331 | 37 | 1001146 | 40278 | 22 | 40274 | 20 | 110567 | 1 | 50000 | 10 |
Chain cycles: 2
Code:
movi v0.16b, 0 tbx v0.8b, { v1.16b, v2.16b, v3.16b }, v4.8b add v4.16b, v0.16b, v0.16b
movi v0.16b, 1 movi v1.16b, 2 movi v2.16b, 3 movi v3.16b, 4 movi v4.16b, 5
(fused SUBS/B.cc loop)
Result (median cycles for code, minus 2 chain cycles): 6.0033
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | int uops in schedulers (59) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map simd uop (7e) | map int uop inputs (7f) | map simd uop inputs (81) | ? int output thing (e9) | ? simd retires (ee) | ? int retires (ef) |
50204 | 80033 | 40101 | 101 | 40000 | 0 | 100 | 40000 | 300 | 2039248 | 40100 | 200 | 40004 | 200 | 110009 | 1 | 50000 | 100 |
50204 | 80033 | 40101 | 101 | 40000 | 0 | 100 | 40000 | 300 | 2039248 | 40100 | 200 | 40003 | 200 | 110009 | 1 | 50000 | 100 |
50204 | 80033 | 40101 | 101 | 40000 | 0 | 100 | 40000 | 300 | 2039248 | 40100 | 200 | 40003 | 200 | 110009 | 1 | 50000 | 100 |
50204 | 80033 | 40101 | 101 | 40000 | 0 | 100 | 40000 | 300 | 2039248 | 40100 | 200 | 40003 | 200 | 110121 | 1 | 50000 | 100 |
50204 | 80033 | 40101 | 101 | 40000 | 0 | 100 | 40000 | 300 | 2039248 | 40100 | 200 | 40003 | 200 | 110009 | 1 | 50000 | 100 |
50204 | 80033 | 40101 | 101 | 40000 | 0 | 100 | 40000 | 300 | 2039248 | 40100 | 200 | 40003 | 200 | 110009 | 1 | 50000 | 100 |
50204 | 80033 | 40101 | 101 | 40000 | 0 | 100 | 40000 | 300 | 2039248 | 40100 | 200 | 40003 | 200 | 110009 | 1 | 50000 | 100 |
50204 | 80033 | 40101 | 101 | 40000 | 0 | 100 | 40000 | 300 | 2039248 | 40100 | 200 | 40003 | 200 | 110009 | 1 | 50000 | 100 |
50204 | 80033 | 40101 | 101 | 40000 | 0 | 100 | 40000 | 300 | 2039248 | 40100 | 200 | 40003 | 200 | 110009 | 1 | 50000 | 100 |
50204 | 80033 | 40101 | 101 | 40000 | 0 | 100 | 40000 | 300 | 2039248 | 40100 | 200 | 40003 | 200 | 110009 | 1 | 50000 | 100 |
Result (median cycles for code, minus 2 chain cycles): 6.0033
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | dispatch int uop (56) | dispatch simd uop (57) | int uops in schedulers (59) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map simd uop (7e) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
50024 | 80033 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 2039248 | 40010 | 20 | 40000 | 20 | 0 | 110000 | 1 | 0 | 50000 | 10 |
50024 | 80033 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 2039248 | 40010 | 20 | 40000 | 20 | 0 | 110000 | 1 | 0 | 50000 | 10 |
50024 | 80033 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 2039248 | 40010 | 20 | 40000 | 20 | 0 | 110000 | 1 | 0 | 50000 | 10 |
50024 | 80033 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 2039248 | 40010 | 20 | 40000 | 20 | 0 | 110000 | 1 | 0 | 50000 | 10 |
50024 | 80033 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 2039248 | 40010 | 20 | 40000 | 408 | 336 | 110124 | 197 | 118 | 50001 | 227 |
50024 | 80033 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 2039580 | 40044 | 20 | 40045 | 20 | 0 | 110124 | 2 | 0 | 50000 | 10 |
50024 | 80033 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 2039248 | 40010 | 20 | 40000 | 20 | 0 | 110119 | 1 | 0 | 50000 | 10 |
50024 | 80033 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 2039248 | 40010 | 20 | 40000 | 20 | 0 | 110000 | 1 | 0 | 50000 | 10 |
50024 | 80033 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 2039248 | 40010 | 20 | 40000 | 20 | 0 | 110000 | 1 | 0 | 50000 | 10 |
50024 | 80033 | 40011 | 11 | 40000 | 10 | 40000 | 30 | 2039248 | 40010 | 20 | 40000 | 20 | 0 | 110000 | 1 | 0 | 50000 | 10 |
Count: 8
Code:
movi v0.16b, 0 tbx v0.8b, { v8.16b, v9.16b, v10.16b }, v11.8b movi v1.16b, 0 tbx v1.8b, { v8.16b, v9.16b, v10.16b }, v11.8b movi v2.16b, 0 tbx v2.8b, { v8.16b, v9.16b, v10.16b }, v11.8b movi v3.16b, 0 tbx v3.8b, { v8.16b, v9.16b, v10.16b }, v11.8b movi v4.16b, 0 tbx v4.8b, { v8.16b, v9.16b, v10.16b }, v11.8b movi v5.16b, 0 tbx v5.8b, { v8.16b, v9.16b, v10.16b }, v11.8b movi v6.16b, 0 tbx v6.8b, { v8.16b, v9.16b, v10.16b }, v11.8b movi v7.16b, 0 tbx v7.8b, { v8.16b, v9.16b, v10.16b }, v11.8b
movi v8.16b, 9 movi v9.16b, 10 movi v10.16b, 11 movi v11.16b, 12
(fused SUBS/B.cc loop)
Result (median cycles for code divided by count): 1.5005
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | dispatch int uop (56) | dispatch simd uop (57) | int uops in schedulers (59) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map simd uop (7e) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
320204 | 120059 | 240101 | 101 | 240000 | 100 | 240008 | 300 | 2009644 | 240110 | 200 | 240013 | 200 | 0 | 720036 | 1 | 0 | 320000 | 100 |
320204 | 120039 | 240101 | 101 | 240000 | 100 | 240008 | 300 | 1919894 | 240108 | 200 | 240012 | 200 | 0 | 720036 | 1 | 0 | 320000 | 100 |
320204 | 120039 | 240101 | 101 | 240000 | 100 | 240008 | 300 | 1920150 | 240166 | 200 | 240074 | 200 | 0 | 720036 | 1 | 0 | 320000 | 100 |
320204 | 120039 | 240101 | 101 | 240000 | 100 | 240008 | 300 | 1919894 | 240108 | 200 | 240012 | 200 | 0 | 720036 | 1 | 0 | 320000 | 100 |
320204 | 120039 | 240101 | 101 | 240000 | 100 | 240008 | 300 | 1919894 | 240108 | 200 | 240012 | 200 | 0 | 720036 | 1 | 0 | 320000 | 100 |
320204 | 120039 | 240101 | 101 | 240000 | 100 | 240008 | 300 | 1919894 | 240108 | 200 | 240012 | 200 | 0 | 720036 | 1 | 0 | 320000 | 100 |
320204 | 120039 | 240101 | 101 | 240000 | 100 | 240008 | 300 | 1919894 | 240108 | 200 | 240012 | 200 | 0 | 720036 | 1 | 0 | 320000 | 100 |
320204 | 120039 | 240101 | 101 | 240000 | 100 | 240008 | 300 | 1919894 | 240108 | 200 | 240012 | 200 | 0 | 720039 | 1 | 0 | 320000 | 100 |
320204 | 120049 | 240102 | 101 | 240001 | 100 | 240010 | 300 | 1919758 | 240110 | 200 | 240013 | 200 | 0 | 720036 | 1 | 0 | 320000 | 100 |
320204 | 120039 | 240101 | 101 | 240000 | 100 | 240008 | 300 | 1919894 | 240108 | 200 | 240012 | 200 | 0 | 720216 | 1 | 0 | 320000 | 100 |
Result (median cycles for code divided by count): 1.5005
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | dispatch int uop (56) | dispatch simd uop (57) | int uops in schedulers (59) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map simd uop (7e) | map int uop inputs (7f) | map simd uop inputs (81) | ? int output thing (e9) | ? simd retires (ee) | ? int retires (ef) |
320024 | 120152 | 240012 | 11 | 240001 | 10 | 240008 | 30 | 1919696 | 240010 | 20 | 240000 | 20 | 720000 | 1 | 320000 | 10 |
320024 | 120039 | 240011 | 11 | 240000 | 10 | 240000 | 30 | 1919848 | 240010 | 20 | 240000 | 20 | 720000 | 1 | 320000 | 10 |
320024 | 120039 | 240011 | 11 | 240000 | 10 | 240000 | 30 | 1919848 | 240010 | 20 | 240000 | 20 | 720000 | 1 | 320000 | 10 |
320024 | 120039 | 240011 | 11 | 240000 | 10 | 240000 | 30 | 1920156 | 240124 | 20 | 240114 | 20 | 720180 | 1 | 320000 | 10 |
320024 | 120254 | 240174 | 11 | 240163 | 10 | 240177 | 30 | 1919848 | 240010 | 20 | 240000 | 20 | 720213 | 1 | 320000 | 10 |
320024 | 120100 | 240011 | 11 | 240000 | 10 | 240000 | 30 | 1919848 | 240010 | 20 | 240000 | 20 | 720000 | 1 | 320000 | 10 |
320024 | 120183 | 240118 | 11 | 240107 | 10 | 240117 | 30 | 1919848 | 240010 | 20 | 240000 | 20 | 720171 | 1 | 320000 | 10 |
320025 | 120146 | 240053 | 11 | 240042 | 10 | 240066 | 30 | 1919755 | 240019 | 20 | 240012 | 20 | 720039 | 1 | 320000 | 10 |
320024 | 120039 | 240011 | 11 | 240000 | 10 | 240000 | 30 | 1919848 | 240010 | 20 | 240000 | 20 | 720000 | 1 | 320000 | 10 |
320024 | 120039 | 240011 | 11 | 240000 | 10 | 240000 | 30 | 1919848 | 240010 | 20 | 240000 | 20 | 720216 | 1 | 320000 | 10 |