Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
ldp q0, q1, [x6], #0x10
mov x0, 1 mov x1, 2 mov x8, 0
(no loop instructions)
Retires: 2.000
Issues: 3.000
Integer unit issues: 1.001
Load/store unit issues: 2.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map ldst uop (7d) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) |
2005 | 1863 | 3057 | 1027 | 2030 | 1026 | 2000 | 3516 | 25641 | 3000 | 2000 | 2000 | 1001 | 2000 |
2004 | 1584 | 3001 | 1001 | 2000 | 1000 | 2000 | 3516 | 24867 | 3000 | 2000 | 2000 | 1001 | 2000 |
2004 | 1541 | 3001 | 1001 | 2000 | 1000 | 2000 | 3516 | 24858 | 3000 | 2000 | 2000 | 1001 | 2000 |
2004 | 1543 | 3001 | 1001 | 2000 | 1000 | 2000 | 3516 | 24759 | 3000 | 2000 | 2000 | 1001 | 2000 |
2004 | 1544 | 3001 | 1001 | 2000 | 1000 | 2000 | 3516 | 24912 | 3000 | 2000 | 2000 | 1001 | 2000 |
2004 | 1543 | 3001 | 1001 | 2000 | 1000 | 2000 | 3516 | 24714 | 3000 | 2000 | 2000 | 1001 | 2000 |
2004 | 1536 | 3001 | 1001 | 2000 | 1000 | 2000 | 3516 | 24975 | 3000 | 2000 | 2000 | 1001 | 2000 |
2004 | 1538 | 3001 | 1001 | 2000 | 1000 | 2000 | 3516 | 24867 | 3000 | 2000 | 2000 | 1001 | 2000 |
2004 | 1553 | 3001 | 1001 | 2000 | 1000 | 2000 | 3516 | 25182 | 3000 | 2000 | 2000 | 1001 | 2000 |
2004 | 1542 | 3001 | 1001 | 2000 | 1000 | 2000 | 3516 | 25362 | 3000 | 2000 | 2000 | 1001 | 2000 |
Chain cycles: 3
Code:
ldp q0, q1, [x6], #0x10 fmov x0, d0 eor x8, x8, x0 eor x8, x8, x0 add x6, x6, x8
mov x0, 1 mov x1, 2 mov x8, 0
(fused SUBS/B.cc loop)
Result (median cycles for code, minus 3 chain cycles): 7.0186
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map simd uop (7e) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
60223 | 101943 | 80286 | 50157 | 10091 | 20038 | 40421 | 10097 | 20006 | 2660747 | 1568560 | 789377 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100094 | 80106 | 50101 | 10005 | 20000 | 40104 | 10003 | 20006 | 2660697 | 1568452 | 789334 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100094 | 80106 | 50101 | 10005 | 20000 | 40104 | 10003 | 20024 | 2662345 | 1569496 | 789875 | 70171 | 30239 | 20024 | 10013 | 60338 | 20048 | 10023 | 50017 | 20000 | 40100 |
60204 | 100111 | 80106 | 50101 | 10005 | 20000 | 40104 | 10003 | 20006 | 2662070 | 1569442 | 789798 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100130 | 80106 | 50101 | 10005 | 20000 | 40104 | 10003 | 20024 | 2662642 | 1569734 | 789975 | 70173 | 30239 | 20028 | 10013 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
65466 | 113925 | 84436 | 52765 | 10053 | 21618 | 42592 | 10045 | 20006 | 2660747 | 1568560 | 789377 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100094 | 80106 | 50101 | 10005 | 20000 | 40104 | 10003 | 20006 | 2660747 | 1568560 | 789377 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100109 | 80106 | 50101 | 10005 | 20000 | 40104 | 10003 | 20006 | 2661341 | 1568956 | 789563 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100102 | 80106 | 50101 | 10005 | 20000 | 40104 | 10003 | 20006 | 2661071 | 1568774 | 789478 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100097 | 80106 | 50101 | 10005 | 20000 | 40104 | 10003 | 20006 | 2661017 | 1568740 | 789464 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
Result (median cycles for code, minus 3 chain cycles): 7.0114
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map simd uop (7e) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
60043 | 102461 | 80198 | 50066 | 10094 | 20038 | 40330 | 10098 | 20006 | 2661319 | 1570462 | 790289 | 70023 | 30029 | 20008 | 10003 | 60038 | 20008 | 10003 | 50001 | 20000 | 40010 |
60024 | 100106 | 80016 | 50011 | 10005 | 20000 | 40010 | 10000 | 20000 | 2661031 | 1570290 | 790200 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100104 | 80016 | 50011 | 10005 | 20000 | 40010 | 10000 | 20000 | 2661031 | 1570290 | 790200 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100104 | 80016 | 50011 | 10005 | 20000 | 40010 | 10000 | 20000 | 2661031 | 1570290 | 790200 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100106 | 80016 | 50011 | 10005 | 20000 | 40010 | 10000 | 20000 | 2661328 | 1570488 | 790294 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100104 | 80016 | 50011 | 10005 | 20000 | 40010 | 10000 | 20000 | 2661031 | 1570290 | 790200 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100104 | 80016 | 50011 | 10005 | 20000 | 40010 | 10000 | 20000 | 2661031 | 1570290 | 790200 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100104 | 80016 | 50011 | 10005 | 20000 | 40010 | 10000 | 20000 | 2661031 | 1570290 | 790200 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100104 | 80016 | 50011 | 10005 | 20000 | 40010 | 10000 | 20000 | 2661058 | 1570306 | 790208 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100104 | 80016 | 50011 | 10005 | 20000 | 40010 | 10000 | 20000 | 2661031 | 1570290 | 790200 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
Chain cycles: 3
Code:
ldp q0, q1, [x6], #0x10 fmov x1, d1 eor x8, x8, x1 eor x8, x8, x1 add x6, x6, x8
mov x0, 1 mov x1, 2 mov x8, 0
(fused SUBS/B.cc loop)
Result (median cycles for code, minus 3 chain cycles): 7.0118
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map simd uop (7e) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
60223 | 102904 | 80236 | 50112 | 10104 | 20020 | 40385 | 10090 | 20006 | 2661157 | 1568726 | 789467 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100111 | 80107 | 50101 | 10006 | 20000 | 40104 | 10003 | 20006 | 2661368 | 1568956 | 789566 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100107 | 80106 | 50101 | 10005 | 20000 | 40104 | 10003 | 20006 | 2661071 | 1568766 | 789470 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100104 | 80106 | 50101 | 10005 | 20000 | 40104 | 10003 | 20006 | 2661206 | 1568856 | 789511 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60205 | 100179 | 80121 | 50109 | 10010 | 20002 | 40136 | 10013 | 20006 | 2662259 | 1569544 | 789844 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100136 | 80106 | 50101 | 10005 | 20000 | 40104 | 10003 | 20006 | 2661395 | 1568968 | 789572 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100121 | 80106 | 50101 | 10005 | 20000 | 40104 | 10003 | 20006 | 2661638 | 1569130 | 789651 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100111 | 80106 | 50101 | 10005 | 20000 | 40104 | 10003 | 20006 | 2662178 | 1569500 | 789813 | 70113 | 30209 | 20008 | 10003 | 60278 | 20028 | 10013 | 50009 | 20000 | 40100 |
60204 | 100118 | 80107 | 50101 | 10006 | 20000 | 40104 | 10003 | 20006 | 2662070 | 1569432 | 789784 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100133 | 80106 | 50101 | 10005 | 20000 | 40104 | 10003 | 20006 | 2661827 | 1569260 | 789713 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
Result (median cycles for code, minus 3 chain cycles): 7.0122
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map simd uop (7e) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
60043 | 102807 | 80134 | 50021 | 10093 | 20020 | 40294 | 10088 | 20006 | 2660050 | 1569656 | 789891 | 70023 | 30029 | 20008 | 10003 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100060 | 80013 | 50011 | 10002 | 20000 | 40010 | 10000 | 20000 | 2659951 | 1569624 | 789857 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100079 | 80013 | 50011 | 10002 | 20000 | 40010 | 10000 | 20024 | 2664487 | 1572590 | 791306 | 70083 | 30059 | 20028 | 10013 | 60098 | 20024 | 10013 | 50009 | 20000 | 40010 |
60025 | 100166 | 80028 | 50019 | 10007 | 20002 | 40046 | 10013 | 20000 | 2660626 | 1570074 | 790070 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100160 | 80013 | 50011 | 10002 | 20000 | 40010 | 10000 | 20000 | 2660140 | 1569748 | 789922 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100060 | 80013 | 50011 | 10002 | 20000 | 40010 | 10000 | 20000 | 2659897 | 1569588 | 789841 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100073 | 80013 | 50011 | 10002 | 20000 | 40010 | 10000 | 20000 | 2659897 | 1569588 | 789841 | 70010 | 30020 | 20000 | 10000 | 60098 | 20028 | 10013 | 50009 | 20000 | 40010 |
62761 | 115343 | 82955 | 51899 | 10016 | 21040 | 41867 | 10027 | 20000 | 2662759 | 1571496 | 790738 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100114 | 80013 | 50011 | 10002 | 20000 | 40010 | 10000 | 20000 | 2660086 | 1569712 | 789895 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100060 | 80013 | 50011 | 10002 | 20000 | 40010 | 10000 | 20000 | 2659897 | 1569588 | 789841 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
Count: 8
Code:
ldp q0, q1, [x6], #0x10 ldp q0, q1, [x7], #0x10 ldp q0, q1, [x8], #0x10 ldp q0, q1, [x9], #0x10 ldp q0, q1, [x10], #0x10 ldp q0, q1, [x11], #0x10 ldp q0, q1, [x12], #0x10 ldp q0, q1, [x13], #0x10
mov x7, x6 mov x8, x6 mov x9, x6 mov x10, x6 mov x11, x6 mov x12, x6 mov x13, x6
(fused SUBS/B.cc loop)
Result (median cycles for code divided by count): 1.0795
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
160223 | 88114 | 240676 | 80367 | 160309 | 80368 | 160008 | 240318 | 1393612 | 240114 | 200 | 160012 | 200 | 160070 | 80032 | 160000 | 100 |
160204 | 86360 | 240106 | 80106 | 160000 | 80107 | 160009 | 240318 | 1392839 | 240115 | 200 | 160012 | 200 | 160012 | 80005 | 160000 | 100 |
160204 | 86344 | 240105 | 80105 | 160000 | 80106 | 160009 | 240318 | 1393391 | 240115 | 200 | 160012 | 200 | 160012 | 80005 | 160000 | 100 |
160204 | 86462 | 240159 | 80129 | 160030 | 80130 | 160009 | 240318 | 1393391 | 240115 | 200 | 160012 | 200 | 160012 | 80005 | 160000 | 100 |
160204 | 86344 | 240105 | 80105 | 160000 | 80106 | 160009 | 240318 | 1393391 | 240115 | 200 | 160012 | 200 | 160012 | 80005 | 160000 | 100 |
160204 | 86692 | 240273 | 80183 | 160090 | 80184 | 160009 | 240318 | 1393391 | 240115 | 200 | 160012 | 200 | 160012 | 80005 | 160000 | 100 |
160204 | 86456 | 240161 | 80131 | 160030 | 80132 | 160009 | 240318 | 1393391 | 240115 | 200 | 160012 | 200 | 160012 | 80005 | 160000 | 100 |
160204 | 86344 | 240105 | 80105 | 160000 | 80106 | 160057 | 240396 | 1388888 | 240189 | 200 | 160068 | 200 | 160012 | 80005 | 160000 | 100 |
160204 | 86344 | 240105 | 80105 | 160000 | 80106 | 160009 | 240318 | 1393463 | 240115 | 200 | 160012 | 200 | 160012 | 80005 | 160000 | 100 |
160204 | 86350 | 240105 | 80105 | 160000 | 80106 | 160057 | 240396 | 1394279 | 240189 | 200 | 160068 | 200 | 160012 | 80005 | 160000 | 100 |
Result (median cycles for code divided by count): 1.0793
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
160043 | 88555 | 240593 | 80284 | 160309 | 80285 | 160009 | 240048 | 1392955 | 240025 | 20 | 160012 | 20 | 160000 | 80001 | 160000 | 10 |
160024 | 86362 | 240011 | 80011 | 160000 | 80010 | 160000 | 240030 | 1393508 | 240010 | 20 | 160000 | 20 | 160008 | 80003 | 160000 | 10 |
160024 | 86623 | 240011 | 80011 | 160000 | 80010 | 160000 | 240030 | 1394050 | 240010 | 20 | 160000 | 20 | 160000 | 80001 | 160000 | 10 |
160024 | 86349 | 240011 | 80011 | 160000 | 80010 | 160000 | 240030 | 1393364 | 240010 | 20 | 160000 | 20 | 160000 | 80001 | 160000 | 10 |
160024 | 86342 | 240011 | 80011 | 160000 | 80010 | 160000 | 240030 | 1393355 | 240010 | 20 | 160000 | 20 | 160000 | 80001 | 160000 | 10 |
160024 | 86342 | 240011 | 80011 | 160000 | 80010 | 160000 | 240030 | 1393355 | 240010 | 20 | 160000 | 20 | 160000 | 80001 | 160000 | 10 |
160024 | 86342 | 240011 | 80011 | 160000 | 80010 | 160000 | 240030 | 1393414 | 240010 | 20 | 160000 | 20 | 160000 | 80001 | 160000 | 10 |
160024 | 86349 | 240011 | 80011 | 160000 | 80010 | 160000 | 240030 | 1393489 | 240010 | 20 | 160000 | 20 | 160000 | 80001 | 160000 | 10 |
160024 | 86343 | 240011 | 80011 | 160000 | 80010 | 160000 | 240030 | 1393373 | 240010 | 20 | 160000 | 20 | 160000 | 80001 | 160000 | 10 |
160024 | 86342 | 240011 | 80011 | 160000 | 80010 | 160000 | 240030 | 1393454 | 240010 | 20 | 160000 | 20 | 160000 | 80001 | 160000 | 10 |