Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
ccmn x1, #3, #0, hi
mov x0, 1 mov x1, 2 mov x2, 3 mov x3, 4 mov x4, 5
(no loop instructions)
Retires: 1.000
Issues: 1.000
Integer unit issues: 1.001
Load/store unit issues: 0.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | dispatch int uop (56) | int uops in schedulers (59) | dispatch uop (78) | map int uop (7c) | map int uop inputs (7f) | ? int output thing (e9) |
1004 | 1030 | 1001 | 1001 | 1000 | 25192 | 1000 | 1000 | 2000 | 1001 |
1004 | 1030 | 1001 | 1001 | 1000 | 25192 | 1000 | 1000 | 2000 | 1001 |
1004 | 1030 | 1001 | 1001 | 1000 | 25192 | 1000 | 1000 | 2000 | 1001 |
1004 | 1030 | 1001 | 1001 | 1000 | 25192 | 1000 | 1000 | 2000 | 1001 |
1004 | 1030 | 1001 | 1001 | 1000 | 25192 | 1000 | 1000 | 2000 | 1001 |
1004 | 1030 | 1001 | 1001 | 1000 | 25192 | 1000 | 1000 | 2000 | 1001 |
1004 | 1030 | 1001 | 1001 | 1000 | 25192 | 1000 | 1000 | 2000 | 1001 |
1004 | 1030 | 1001 | 1001 | 1000 | 25192 | 1000 | 1000 | 2000 | 1001 |
1004 | 1030 | 1001 | 1001 | 1000 | 25192 | 1000 | 1000 | 2000 | 1001 |
1004 | 1030 | 1001 | 1001 | 1000 | 25192 | 1000 | 1000 | 2000 | 1001 |
Chain cycles: 1
Code:
ccmn x1, #3, #0, hi cset x1, cc
mov x0, 1 mov x1, 2 mov x2, 3 mov x3, 4 mov x4, 5
(fused SUBS/B.cc loop)
Result (median cycles for code, minus 1 chain cycle): 1.0030
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | dispatch int uop (56) | int uops in schedulers (59) | dispatch uop (78) | map int uop (7c) | map int uop inputs (7f) | ? int output thing (e9) | ? int retires (ef) |
20204 | 20030 | 20101 | 20101 | 20108 | 519339 | 20108 | 20216 | 30224 | 20001 | 10100 |
20204 | 20030 | 20101 | 20101 | 20107 | 519548 | 20108 | 20216 | 30224 | 20001 | 10100 |
20204 | 20030 | 20101 | 20101 | 20107 | 519548 | 20108 | 20216 | 30224 | 20001 | 10100 |
20204 | 20030 | 20101 | 20101 | 20108 | 519548 | 20108 | 20216 | 30224 | 20001 | 10100 |
20204 | 20030 | 20101 | 20101 | 20108 | 519548 | 20108 | 20216 | 30224 | 20001 | 10100 |
20204 | 20030 | 20101 | 20101 | 20108 | 519548 | 20108 | 20216 | 30224 | 20001 | 10100 |
20204 | 20030 | 20101 | 20101 | 20108 | 519548 | 20108 | 20216 | 30224 | 20001 | 10100 |
20204 | 20030 | 20101 | 20101 | 20108 | 519548 | 20108 | 20216 | 30224 | 20001 | 10100 |
20204 | 20030 | 20101 | 20101 | 20108 | 519548 | 20108 | 20216 | 30224 | 20001 | 10100 |
20204 | 20030 | 20101 | 20101 | 20108 | 519548 | 20108 | 20216 | 30359 | 20043 | 10100 |
Result (median cycles for code, minus 1 chain cycle): 1.0030
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | dispatch int uop (56) | int uops in schedulers (59) | dispatch uop (78) | map int uop (7c) | map int uop inputs (7f) | ? int output thing (e9) | ? int retires (ef) |
20024 | 20030 | 20011 | 20011 | 20018 | 519598 | 20010 | 20020 | 30020 | 20001 | 10010 |
20024 | 20030 | 20011 | 20011 | 20010 | 519598 | 20010 | 20020 | 30020 | 20001 | 10010 |
20024 | 20030 | 20011 | 20011 | 20010 | 519598 | 20010 | 20020 | 30020 | 20001 | 10010 |
20024 | 20030 | 20011 | 20011 | 20010 | 519598 | 20010 | 20020 | 30020 | 20001 | 10010 |
20024 | 20030 | 20011 | 20011 | 20010 | 519598 | 20010 | 20020 | 30020 | 20001 | 10010 |
20024 | 20030 | 20011 | 20011 | 20010 | 519598 | 20010 | 20020 | 30020 | 20001 | 10010 |
20024 | 20030 | 20011 | 20011 | 20010 | 519598 | 20010 | 20020 | 30020 | 20001 | 10010 |
20024 | 20030 | 20011 | 20011 | 20010 | 519598 | 20010 | 20020 | 30020 | 20001 | 10010 |
20024 | 20030 | 20011 | 20011 | 20010 | 519598 | 20010 | 20020 | 30020 | 20001 | 10010 |
20024 | 20030 | 20011 | 20011 | 20010 | 519598 | 20010 | 20020 | 30020 | 20001 | 10010 |
Code:
ccmn x0, #3, #0, hi
mov x0, 1 mov x1, 2 mov x2, 3 mov x3, 4 mov x4, 5
(non-fused SUB/CBNZ loop)
Result (median cycles for code): 1.0030
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | dispatch int uop (56) | int uops in schedulers (59) | dispatch uop (78) | map int uop (7c) | map int uop inputs (7f) | ? int output thing (e9) | ? int retires (ef) |
10204 | 10030 | 10201 | 10201 | 10212 | 254524 | 10212 | 10214 | 20216 | 10101 | 100 |
10204 | 10030 | 10201 | 10201 | 10208 | 254709 | 10208 | 10208 | 20216 | 10101 | 100 |
10204 | 10030 | 10201 | 10201 | 10208 | 254709 | 10208 | 10208 | 20216 | 10101 | 100 |
10204 | 10030 | 10201 | 10201 | 10208 | 254709 | 10208 | 10208 | 20216 | 10101 | 100 |
10204 | 10030 | 10201 | 10201 | 10208 | 254709 | 10208 | 10208 | 20216 | 10101 | 100 |
10204 | 10030 | 10201 | 10201 | 10208 | 254709 | 10208 | 10208 | 20216 | 10101 | 100 |
10204 | 10030 | 10201 | 10201 | 10208 | 254709 | 10208 | 10208 | 20216 | 10101 | 100 |
10204 | 10030 | 10201 | 10201 | 10208 | 254709 | 10208 | 10208 | 20216 | 10101 | 100 |
10204 | 10030 | 10201 | 10201 | 10208 | 254709 | 10208 | 10208 | 20216 | 10101 | 100 |
10204 | 10030 | 10201 | 10201 | 10208 | 254709 | 10208 | 10208 | 20216 | 10101 | 100 |
Result (median cycles for code): 1.0030
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | dispatch int uop (56) | int uops in schedulers (59) | dispatch uop (78) | map int uop (7c) | map int uop inputs (7f) | ? int output thing (e9) | ? int retires (ef) |
10024 | 10030 | 10021 | 10021 | 10029 | 254996 | 10029 | 10032 | 20020 | 10011 | 10 |
10024 | 10030 | 10021 | 10021 | 10020 | 255193 | 10020 | 10020 | 20020 | 10011 | 10 |
10024 | 10030 | 10021 | 10021 | 10020 | 255193 | 10020 | 10020 | 20020 | 10011 | 10 |
10024 | 10030 | 10021 | 10021 | 10020 | 255193 | 10020 | 10020 | 20020 | 10011 | 10 |
10024 | 10030 | 10021 | 10021 | 10020 | 255193 | 10020 | 10020 | 20020 | 10011 | 10 |
10024 | 10030 | 10021 | 10021 | 10020 | 255193 | 10020 | 10020 | 20020 | 10011 | 10 |
10024 | 10030 | 10021 | 10021 | 10020 | 255193 | 10020 | 10020 | 20020 | 10011 | 10 |
10024 | 10030 | 10021 | 10021 | 10020 | 255193 | 10020 | 10020 | 20020 | 10011 | 10 |
10024 | 10030 | 10021 | 10021 | 10020 | 255193 | 10020 | 10020 | 20020 | 10011 | 10 |
10024 | 10030 | 10021 | 10021 | 10020 | 255236 | 10029 | 10032 | 20020 | 10011 | 10 |
Count: 8
Code:
ands xzr, xzr, xzr ccmn x0, #3, #0, hi ands xzr, xzr, xzr ccmn x0, #3, #0, hi ands xzr, xzr, xzr ccmn x0, #3, #0, hi ands xzr, xzr, xzr ccmn x0, #3, #0, hi ands xzr, xzr, xzr ccmn x0, #3, #0, hi ands xzr, xzr, xzr ccmn x0, #3, #0, hi ands xzr, xzr, xzr ccmn x0, #3, #0, hi ands xzr, xzr, xzr ccmn x0, #3, #0, hi
mov x0, 1
(fused SUBS/B.cc loop)
Result (median cycles for code divided by count): 0.7890
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | dispatch int uop (56) | int uops in schedulers (59) | dispatch uop (78) | map int uop (7c) | map int uop inputs (7f) | ? int output thing (e9) | ? int retires (ef) |
160205 | 63304 | 160149 | 160149 | 160157 | 687542 | 160118 | 160220 | 160220 | 160011 | 100 |
160204 | 63107 | 160119 | 160119 | 160124 | 688661 | 160118 | 160218 | 160224 | 160014 | 100 |
160204 | 63115 | 160112 | 160112 | 160118 | 687376 | 160118 | 160220 | 160218 | 160013 | 100 |
160204 | 63133 | 160115 | 160115 | 160120 | 689396 | 160118 | 160220 | 160220 | 160015 | 100 |
160204 | 63122 | 160113 | 160113 | 160118 | 688887 | 160118 | 160220 | 160220 | 160010 | 100 |
160204 | 63103 | 160114 | 160114 | 160120 | 687260 | 160118 | 160220 | 160220 | 160012 | 100 |
160204 | 63135 | 160114 | 160114 | 160119 | 686593 | 160116 | 160216 | 160220 | 160009 | 100 |
160204 | 63080 | 160115 | 160115 | 160120 | 689787 | 160120 | 160224 | 160220 | 160012 | 100 |
160204 | 63142 | 160114 | 160114 | 160120 | 689502 | 160160 | 160261 | 160220 | 160015 | 100 |
160204 | 63127 | 160112 | 160112 | 160118 | 689285 | 160118 | 160220 | 160224 | 160017 | 100 |
Result (median cycles for code divided by count): 0.7882
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | dispatch int uop (56) | int uops in schedulers (59) | dispatch uop (78) | map int uop (7c) | map int uop inputs (7f) | ? int output thing (e9) | ? int retires (ef) |
160024 | 64637 | 160026 | 160026 | 160030 | 699569 | 160030 | 160042 | 160020 | 160001 | 10 |
160024 | 63264 | 160011 | 160011 | 160010 | 696455 | 160010 | 160020 | 160020 | 160001 | 10 |
160024 | 63044 | 160011 | 160011 | 160010 | 697366 | 160010 | 160020 | 160020 | 160001 | 10 |
160024 | 63055 | 160011 | 160011 | 160010 | 698136 | 160010 | 160020 | 160020 | 160001 | 10 |
160024 | 63071 | 160011 | 160011 | 160010 | 697141 | 160010 | 160020 | 160020 | 160001 | 10 |
160024 | 63060 | 160011 | 160011 | 160010 | 700908 | 160010 | 160020 | 160020 | 160001 | 10 |
160024 | 63031 | 160011 | 160011 | 160010 | 700179 | 160010 | 160020 | 160020 | 160001 | 10 |
160024 | 63076 | 160011 | 160011 | 160010 | 698792 | 160010 | 160020 | 160020 | 160001 | 10 |
160024 | 63049 | 160011 | 160011 | 160010 | 702284 | 160010 | 160020 | 160020 | 160001 | 10 |
160024 | 63041 | 160011 | 160011 | 160010 | 698952 | 160010 | 160020 | 160020 | 160001 | 10 |
Count: 4
Code:
fcmp s0, s0 ccmn x0, #3, #0, hi ccmn x0, #3, #0, hi ccmn x0, #3, #0, hi ccmn x0, #3, #0, hi
mov x0, 1
(fused SUBS/B.cc loop)
Result (median cycles for code divided by count): 0.5998
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | dispatch int uop (56) | dispatch simd uop (57) | int uops in schedulers (59) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map simd uop (7e) | map int uop inputs (7f) | map simd uop inputs (81) | ? int output thing (e9) | ? int retires (ef) |
50204 | 24000 | 50106 | 40103 | 10003 | 40111 | 10003 | 315064 | 40012 | 50112 | 40209 | 10003 | 80234 | 20010 | 40007 | 100 |
50204 | 23991 | 50111 | 40107 | 10004 | 40117 | 10005 | 315220 | 40017 | 50119 | 40216 | 10004 | 80224 | 20008 | 40001 | 100 |
50204 | 23999 | 50106 | 40103 | 10003 | 40109 | 10003 | 315097 | 40012 | 50112 | 40209 | 10003 | 80224 | 20008 | 40001 | 100 |
50204 | 23990 | 50104 | 40101 | 10003 | 40112 | 10004 | 315068 | 40017 | 50116 | 40212 | 10004 | 80224 | 20008 | 40001 | 100 |
50204 | 23997 | 50106 | 40103 | 10003 | 40109 | 10003 | 315300 | 40013 | 50112 | 40209 | 10003 | 80224 | 20008 | 40001 | 100 |
50204 | 23997 | 50106 | 40103 | 10003 | 40109 | 10003 | 315444 | 40017 | 50116 | 40212 | 10004 | 80224 | 20008 | 40001 | 100 |
50204 | 23990 | 50103 | 40101 | 10002 | 40109 | 10003 | 315300 | 40013 | 50112 | 40209 | 10003 | 80224 | 20008 | 40001 | 100 |
50204 | 23987 | 50104 | 40101 | 10003 | 40109 | 10003 | 315334 | 40012 | 50112 | 40209 | 10003 | 80224 | 20008 | 40001 | 100 |
50204 | 24000 | 50103 | 40101 | 10002 | 40112 | 10004 | 315447 | 40012 | 50112 | 40209 | 10003 | 80218 | 20006 | 40003 | 100 |
50204 | 24000 | 50103 | 40101 | 10002 | 40112 | 10004 | 315068 | 40017 | 50116 | 40212 | 10004 | 80224 | 20008 | 40001 | 100 |
Result (median cycles for code divided by count): 0.5995
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | dispatch int uop (56) | dispatch simd uop (57) | int uops in schedulers (59) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map simd uop (7e) | map int uop inputs (7f) | map simd uop inputs (81) | ? int output thing (e9) | ? int retires (ef) |
50024 | 24173 | 50019 | 40016 | 10003 | 40024 | 10004 | 316257 | 40000 | 50010 | 40020 | 10000 | 80020 | 20000 | 40001 | 10 |
50024 | 24008 | 50011 | 40011 | 10000 | 40010 | 10000 | 316420 | 40000 | 50010 | 40020 | 10000 | 80020 | 20000 | 40001 | 10 |
50024 | 23956 | 50011 | 40011 | 10000 | 40010 | 10000 | 315937 | 40000 | 50010 | 40020 | 10000 | 80020 | 20000 | 40001 | 10 |
50024 | 23956 | 50011 | 40011 | 10000 | 40010 | 10000 | 316349 | 40000 | 50010 | 40020 | 10000 | 80082 | 20014 | 40021 | 10 |
50024 | 23938 | 50011 | 40011 | 10000 | 40010 | 10000 | 316296 | 40000 | 50010 | 40020 | 10000 | 80020 | 20000 | 40001 | 10 |
50024 | 23993 | 50011 | 40011 | 10000 | 40010 | 10000 | 316341 | 40000 | 50010 | 40020 | 10000 | 80020 | 20000 | 40001 | 10 |
50024 | 23993 | 50011 | 40011 | 10000 | 40010 | 10000 | 316456 | 40000 | 50010 | 40020 | 10000 | 80020 | 20000 | 40001 | 10 |
50024 | 23993 | 50011 | 40011 | 10000 | 40010 | 10000 | 316363 | 40000 | 50010 | 40020 | 10000 | 80020 | 20000 | 40001 | 10 |
50024 | 24009 | 50011 | 40011 | 10000 | 40010 | 10000 | 316811 | 40000 | 50010 | 40020 | 10000 | 80020 | 20000 | 40001 | 10 |
50024 | 24009 | 50011 | 40011 | 10000 | 40010 | 10000 | 315457 | 40000 | 50010 | 40020 | 10000 | 80020 | 20000 | 40001 | 10 |
Count: 7
Code:
ands xzr, xzr, xzr ccmn x0, #3, #0, hi ccmn x0, #3, #0, hi ccmn x0, #3, #0, hi ccmn x0, #3, #0, hi ccmn x0, #3, #0, hi ccmn x0, #3, #0, hi ccmn x0, #3, #0, hi
mov x0, 1
(fused SUBS/B.cc loop)
Result (median cycles for code divided by count): 0.5568
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | dispatch int uop (56) | int uops in schedulers (59) | dispatch uop (78) | map int uop (7c) | map int uop inputs (7f) | ? int output thing (e9) | ? int retires (ef) |
80204 | 38968 | 80104 | 80104 | 80114 | 552073 | 80108 | 80208 | 140214 | 80004 | 100 |
80204 | 39026 | 80102 | 80102 | 80111 | 550563 | 80111 | 80212 | 140224 | 80006 | 100 |
80204 | 38941 | 80106 | 80106 | 80116 | 547262 | 80116 | 80216 | 140290 | 80038 | 100 |
80204 | 38933 | 80106 | 80106 | 80114 | 549055 | 80111 | 80212 | 140220 | 80003 | 100 |
80204 | 38933 | 80106 | 80106 | 80114 | 550632 | 80116 | 80216 | 140214 | 80004 | 100 |
80204 | 38981 | 80104 | 80104 | 80114 | 547003 | 80108 | 80208 | 140228 | 80006 | 100 |
80204 | 39012 | 80106 | 80106 | 80116 | 551366 | 80108 | 80208 | 140228 | 80006 | 100 |
80204 | 38955 | 80104 | 80104 | 80108 | 551366 | 80108 | 80208 | 140228 | 80007 | 100 |
80204 | 38933 | 80106 | 80106 | 80114 | 547003 | 80108 | 80208 | 140214 | 80004 | 100 |
80204 | 38979 | 80107 | 80107 | 80116 | 551366 | 80108 | 80208 | 140214 | 80004 | 100 |
Result (median cycles for code divided by count): 0.5561
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map simd uop (7e) | map int uop inputs (7f) | ? int output thing (e9) | ? int retires (ef) |
80024 | 39141 | 80032 | 80032 | 0 | 0 | 80044 | 0 | 0 | 549460 | 0 | 0 | 80041 | 80042 | 0 | 0 | 140020 | 80011 | 10 |
80024 | 38936 | 80021 | 80021 | 0 | 0 | 80020 | 0 | 0 | 549871 | 0 | 0 | 80020 | 80020 | 0 | 0 | 140020 | 80011 | 10 |
80024 | 38972 | 80021 | 80021 | 0 | 0 | 80020 | 0 | 0 | 551106 | 0 | 0 | 80020 | 80020 | 0 | 0 | 140020 | 80011 | 10 |
80024 | 38971 | 80021 | 80021 | 0 | 0 | 80020 | 0 | 0 | 550934 | 0 | 0 | 80020 | 80020 | 0 | 0 | 140020 | 80011 | 10 |
80024 | 38972 | 80021 | 80021 | 0 | 0 | 80020 | 0 | 0 | 551068 | 0 | 0 | 80020 | 80020 | 0 | 0 | 140020 | 80011 | 10 |
80024 | 38885 | 80021 | 80021 | 0 | 0 | 80020 | 0 | 0 | 551106 | 0 | 0 | 80020 | 80020 | 0 | 0 | 140020 | 80011 | 10 |
80024 | 38941 | 80021 | 80021 | 0 | 0 | 80020 | 0 | 0 | 551106 | 0 | 0 | 80020 | 80020 | 0 | 0 | 140020 | 80011 | 10 |
80024 | 38897 | 80021 | 80021 | 0 | 0 | 80020 | 0 | 0 | 551233 | 0 | 0 | 80020 | 80020 | 0 | 0 | 140020 | 80011 | 10 |
80024 | 38912 | 80021 | 80021 | 0 | 0 | 80020 | 0 | 0 | 551233 | 0 | 0 | 80020 | 80020 | 0 | 0 | 140020 | 80011 | 10 |
80024 | 38931 | 80021 | 80021 | 0 | 0 | 80020 | 0 | 0 | 548246 | 0 | 0 | 80020 | 80020 | 0 | 0 | 140020 | 80011 | 10 |