Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
ldp q0, q1, [x6, #0x10]!
mov x0, 1 mov x1, 2 mov x8, 0
(no loop instructions)
Retires: 2.000
Issues: 3.000
Integer unit issues: 1.001
Load/store unit issues: 2.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map ldst uop (7d) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) |
2005 | 1854 | 3057 | 1027 | 2030 | 1026 | 2000 | 3516 | 25254 | 3000 | 2000 | 2000 | 1001 | 2000 |
2004 | 1564 | 3001 | 1001 | 2000 | 1000 | 2000 | 3516 | 24759 | 3000 | 2000 | 2000 | 1001 | 2000 |
2004 | 1510 | 3001 | 1001 | 2000 | 1000 | 2000 | 3516 | 25029 | 3000 | 2000 | 2000 | 1001 | 2000 |
2004 | 1522 | 3001 | 1001 | 2000 | 1000 | 2000 | 3516 | 24759 | 3000 | 2000 | 2000 | 1001 | 2000 |
2004 | 1515 | 3001 | 1001 | 2000 | 1000 | 2000 | 3516 | 24543 | 3000 | 2000 | 2000 | 1001 | 2000 |
2004 | 1540 | 3001 | 1001 | 2000 | 1000 | 2000 | 3516 | 24597 | 3000 | 2000 | 2000 | 1001 | 2000 |
2004 | 1518 | 3001 | 1001 | 2000 | 1000 | 2000 | 3516 | 24498 | 3000 | 2000 | 2000 | 1001 | 2000 |
2004 | 1530 | 3001 | 1001 | 2000 | 1000 | 2000 | 3516 | 24588 | 3000 | 2000 | 2000 | 1001 | 2000 |
2004 | 1525 | 3001 | 1001 | 2000 | 1000 | 2000 | 3516 | 24489 | 3000 | 2000 | 2000 | 1001 | 2000 |
2004 | 1526 | 3001 | 1001 | 2000 | 1000 | 2000 | 3516 | 24786 | 3000 | 2000 | 2000 | 1001 | 2000 |
Chain cycles: 3
Code:
ldp q0, q1, [x6, #0x10]! fmov x0, d0 eor x8, x8, x0 eor x8, x8, x0 add x6, x6, x8
mov x0, 1 mov x1, 2 mov x8, 0
(fused SUBS/B.cc loop)
Result (median cycles for code, minus 3 chain cycles): 7.0076
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map simd uop (7e) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
60223 | 102390 | 80289 | 50157 | 10094 | 20038 | 40421 | 10096 | 20006 | 2660828 | 1568644 | 789393 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100087 | 80104 | 50101 | 10003 | 20000 | 40104 | 10003 | 20006 | 2660477 | 1568410 | 789287 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100076 | 80104 | 50101 | 10003 | 20000 | 40104 | 10003 | 20006 | 2660261 | 1568270 | 789222 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100076 | 80104 | 50101 | 10003 | 20000 | 40104 | 10003 | 20006 | 2660315 | 1568306 | 789238 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100076 | 80104 | 50101 | 10003 | 20000 | 40104 | 10003 | 20006 | 2660261 | 1568270 | 789222 | 70113 | 30209 | 20008 | 10003 | 60278 | 20028 | 10013 | 50009 | 20000 | 40100 |
62981 | 115847 | 82836 | 51837 | 10014 | 20985 | 41749 | 10025 | 20042 | 2664302 | 1570702 | 790470 | 70233 | 30269 | 20048 | 10023 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100111 | 80104 | 50101 | 10003 | 20000 | 40104 | 10003 | 20006 | 2660450 | 1568396 | 789282 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100076 | 80104 | 50101 | 10003 | 20000 | 40104 | 10003 | 20006 | 2660261 | 1568270 | 789222 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100076 | 80104 | 50101 | 10003 | 20000 | 40104 | 10003 | 20006 | 2660261 | 1568270 | 789222 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
60204 | 100076 | 80104 | 50101 | 10003 | 20000 | 40104 | 10003 | 20006 | 2660261 | 1568270 | 789222 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 40100 |
Result (median cycles for code, minus 3 chain cycles): 7.0075
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map simd uop (7e) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
60043 | 102125 | 80195 | 50066 | 10091 | 20038 | 40330 | 10100 | 20006 | 2660513 | 1570036 | 790054 | 70023 | 30029 | 20008 | 10003 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100085 | 80014 | 50011 | 10003 | 20000 | 40010 | 10000 | 20000 | 2661004 | 1570310 | 790185 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100121 | 80014 | 50011 | 10003 | 20000 | 40010 | 10000 | 20000 | 2661355 | 1570544 | 790298 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100111 | 80014 | 50011 | 10003 | 20000 | 40010 | 10000 | 20000 | 2661490 | 1570638 | 790341 | 70010 | 30020 | 20000 | 10000 | 60098 | 20028 | 10013 | 50009 | 20000 | 40010 |
60024 | 100138 | 80020 | 50011 | 10009 | 20000 | 40014 | 10003 | 20000 | 2661378 | 1570388 | 790256 | 70010 | 30020 | 20000 | 10000 | 60080 | 20020 | 10010 | 50012 | 20000 | 40010 |
60024 | 100109 | 80019 | 50011 | 10008 | 20000 | 40010 | 10000 | 20000 | 2661193 | 1570340 | 790223 | 70010 | 30020 | 20000 | 10000 | 60098 | 20028 | 10013 | 50012 | 20000 | 40010 |
60024 | 100307 | 80015 | 50011 | 10004 | 20000 | 40010 | 10000 | 20000 | 2661112 | 1570382 | 790223 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100080 | 80014 | 50011 | 10003 | 20000 | 40010 | 10000 | 20020 | 2661925 | 1570862 | 790458 | 70075 | 30050 | 20020 | 10010 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100073 | 80014 | 50011 | 10003 | 20000 | 40010 | 10000 | 20000 | 2660221 | 1569788 | 789939 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
60024 | 100077 | 80014 | 50011 | 10003 | 20000 | 40010 | 10000 | 20044 | 2663840 | 1570550 | 791558 | 70147 | 30089 | 20046 | 10023 | 60020 | 20000 | 10000 | 50001 | 20000 | 40010 |
Chain cycles: 3
Code:
ldp q0, q1, [x6, #0x10]! fmov x1, d1 eor x8, x8, x1 eor x8, x8, x1 add x6, x6, x8
mov x0, 1 mov x1, 2 mov x8, 0
(fused SUBS/B.cc loop)
Result (median cycles for code, minus 3 chain cycles): 7.0080
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map simd uop (7e) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
60223 | 102636 | 80228 | 50112 | 10096 | 20020 | 40385 | 10089 | 20006 | 2660045 | 1568126 | 789162 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 0 | 40100 |
60204 | 100068 | 80104 | 50101 | 10003 | 20000 | 40104 | 10003 | 20006 | 2659995 | 1568018 | 789119 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 0 | 40100 |
60204 | 100068 | 80104 | 50101 | 10003 | 20000 | 40104 | 10003 | 20006 | 2660099 | 1568162 | 789178 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 0 | 40100 |
60204 | 100104 | 80104 | 50101 | 10003 | 20000 | 40104 | 10003 | 20006 | 2660234 | 1568252 | 789223 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 0 | 40100 |
60204 | 100068 | 80104 | 50101 | 10003 | 20000 | 40104 | 10003 | 20006 | 2660045 | 1568126 | 789162 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 0 | 40100 |
60204 | 100068 | 80104 | 50101 | 10003 | 20000 | 40104 | 10003 | 20006 | 2660045 | 1568126 | 789162 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 0 | 40100 |
60204 | 100068 | 80104 | 50101 | 10003 | 20000 | 40104 | 10003 | 20006 | 2660045 | 1568126 | 789162 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 0 | 40100 |
60204 | 100068 | 80104 | 50101 | 10003 | 20000 | 40104 | 10003 | 20006 | 2660045 | 1568126 | 789162 | 70113 | 30209 | 20008 | 10003 | 60278 | 20025 | 10013 | 50009 | 20000 | 0 | 40100 |
60204 | 100068 | 80104 | 50101 | 10003 | 20000 | 40104 | 10003 | 20006 | 2660045 | 1568126 | 789162 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 0 | 40100 |
60204 | 100070 | 80104 | 50101 | 10003 | 20000 | 40104 | 10003 | 20006 | 2660153 | 1568198 | 789197 | 70113 | 30209 | 20008 | 10003 | 60218 | 20008 | 10003 | 50001 | 20000 | 0 | 40100 |
Result (median cycles for code, minus 3 chain cycles): 7.0075
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | ldst uops in schedulers (5b) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map simd uop (7e) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
60043 | 102678 | 80136 | 50021 | 10095 | 20020 | 40294 | 10084 | 20006 | 2660455 | 1569924 | 790022 | 70023 | 30029 | 20008 | 10003 | 60020 | 20000 | 10000 | 50001 | 20000 | 0 | 40010 |
60024 | 100082 | 80014 | 50011 | 10003 | 20000 | 40010 | 10000 | 20000 | 2660437 | 1569932 | 790017 | 70010 | 30020 | 20000 | 10000 | 60098 | 20028 | 10013 | 50009 | 20000 | 0 | 40010 |
60024 | 100071 | 80013 | 50011 | 10002 | 20000 | 40010 | 10000 | 20000 | 2660518 | 1569986 | 790040 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 0 | 40010 |
60024 | 100151 | 80013 | 50011 | 10002 | 20000 | 40010 | 10000 | 20000 | 2662192 | 1571102 | 790567 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 0 | 40010 |
60024 | 100148 | 80013 | 50011 | 10002 | 20000 | 40010 | 10000 | 20000 | 2662786 | 1571498 | 790753 | 70010 | 30020 | 20000 | 10000 | 52404 | 23022 | 7434 | 40453 | 18057 | 32 | 34069 |
60024 | 100082 | 80014 | 50011 | 10003 | 20000 | 40010 | 10000 | 20000 | 2660599 | 1570040 | 790065 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 0 | 40010 |
60024 | 100144 | 80020 | 50011 | 10009 | 20000 | 40014 | 10002 | 20000 | 2661517 | 1570530 | 790392 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 0 | 40010 |
60024 | 100114 | 80018 | 50011 | 10007 | 20000 | 40010 | 10000 | 20000 | 2662516 | 1571196 | 790713 | 70010 | 30020 | 20000 | 10000 | 60020 | 20000 | 10000 | 50001 | 20000 | 0 | 40010 |
60024 | 100113 | 80018 | 50011 | 10007 | 20000 | 40010 | 10000 | 20024 | 2667380 | 1574256 | 792291 | 70083 | 30059 | 20028 | 10013 | 60038 | 20008 | 10003 | 50001 | 20000 | 0 | 40010 |
60024 | 100101 | 80013 | 50011 | 10002 | 20000 | 40010 | 10000 | 20024 | 2662624 | 1571306 | 790712 | 70083 | 30059 | 20028 | 10013 | 60020 | 20000 | 10000 | 50001 | 20000 | 0 | 40010 |
Count: 8
Code:
ldp q0, q1, [x6, #0x10]! ldp q0, q1, [x7, #0x10]! ldp q0, q1, [x8, #0x10]! ldp q0, q1, [x9, #0x10]! ldp q0, q1, [x10, #0x10]! ldp q0, q1, [x11, #0x10]! ldp q0, q1, [x12, #0x10]! ldp q0, q1, [x13, #0x10]!
mov x7, x6 mov x8, x6 mov x9, x6 mov x10, x6 mov x11, x6 mov x12, x6 mov x13, x6
(fused SUBS/B.cc loop)
Result (median cycles for code divided by count): 1.0796
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
160223 | 88186 | 240676 | 80367 | 160309 | 80368 | 160009 | 240318 | 1393915 | 240115 | 200 | 160012 | 200 | 160012 | 80005 | 160000 | 100 |
160204 | 86364 | 240105 | 80105 | 160000 | 80106 | 160009 | 240318 | 1393753 | 240115 | 200 | 160012 | 200 | 160012 | 80005 | 160000 | 100 |
160204 | 86362 | 240105 | 80105 | 160000 | 80106 | 160009 | 240318 | 1393754 | 240115 | 200 | 160012 | 200 | 160012 | 80005 | 160000 | 100 |
160204 | 86368 | 240105 | 80105 | 160000 | 80106 | 160008 | 240318 | 1393829 | 240114 | 200 | 160012 | 200 | 160012 | 80005 | 160000 | 100 |
160204 | 86430 | 240105 | 80105 | 160000 | 80106 | 160008 | 240318 | 1394063 | 240114 | 200 | 160012 | 200 | 160012 | 80005 | 160000 | 100 |
160205 | 86439 | 240161 | 80131 | 160030 | 80132 | 160008 | 240318 | 1393811 | 240114 | 200 | 160012 | 200 | 160012 | 80005 | 160000 | 100 |
160204 | 86363 | 240105 | 80105 | 160000 | 80106 | 160009 | 240318 | 1393754 | 240115 | 200 | 160012 | 200 | 160012 | 80005 | 160000 | 100 |
160204 | 86362 | 240105 | 80105 | 160000 | 80106 | 160009 | 240318 | 1393754 | 240115 | 200 | 160012 | 200 | 160012 | 80005 | 160000 | 100 |
160204 | 86362 | 240105 | 80105 | 160000 | 80106 | 160009 | 240318 | 1393861 | 240115 | 200 | 160012 | 200 | 160012 | 80005 | 160000 | 100 |
160204 | 86362 | 240105 | 80105 | 160000 | 80106 | 160009 | 240318 | 1393861 | 240115 | 200 | 160012 | 200 | 160012 | 80005 | 160000 | 100 |
Result (median cycles for code divided by count): 1.0795
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | map simd uop inputs (81) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
160044 | 88484 | 240649 | 80310 | 0 | 160339 | 80311 | 0 | 160009 | 240048 | 1393494 | 240025 | 20 | 160012 | 20 | 160012 | 0 | 80005 | 160000 | 0 | 10 |
160024 | 86374 | 240011 | 80011 | 0 | 160000 | 80010 | 0 | 160000 | 240030 | 1393547 | 240010 | 20 | 160000 | 20 | 160000 | 0 | 80001 | 160000 | 0 | 10 |
160024 | 86355 | 240011 | 80011 | 0 | 160000 | 80010 | 0 | 160000 | 240030 | 1393583 | 240010 | 20 | 160000 | 20 | 160000 | 0 | 80001 | 160000 | 0 | 10 |
160024 | 86350 | 240011 | 80011 | 0 | 160000 | 80010 | 0 | 160000 | 240030 | 1393547 | 240010 | 20 | 160000 | 20 | 160000 | 0 | 80001 | 160000 | 0 | 10 |
160024 | 86350 | 240011 | 80011 | 0 | 160000 | 80010 | 0 | 160000 | 240030 | 1393537 | 240010 | 20 | 160000 | 20 | 160000 | 0 | 80001 | 160000 | 0 | 10 |
160024 | 86349 | 240011 | 80011 | 0 | 160000 | 80010 | 0 | 160000 | 240030 | 1393547 | 240010 | 20 | 160000 | 20 | 160000 | 0 | 80001 | 160000 | 0 | 10 |
160024 | 86350 | 240011 | 80011 | 0 | 160000 | 80010 | 0 | 160000 | 240030 | 1393547 | 240010 | 20 | 160000 | 20 | 160000 | 0 | 80001 | 160000 | 0 | 10 |
160024 | 86350 | 240011 | 80011 | 0 | 160000 | 80010 | 0 | 160000 | 240030 | 1393573 | 240010 | 20 | 160000 | 20 | 160000 | 0 | 80001 | 160000 | 0 | 10 |
160024 | 86350 | 240011 | 80011 | 0 | 160000 | 80010 | 0 | 160000 | 240030 | 1393547 | 240010 | 20 | 160000 | 20 | 160000 | 0 | 80001 | 160000 | 0 | 10 |
160024 | 86351 | 240011 | 80011 | 0 | 160000 | 80010 | 0 | 160000 | 240030 | 1393538 | 240010 | 20 | 160000 | 20 | 160000 | 0 | 80001 | 160000 | 0 | 10 |