Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
ldpsw x0, x1, [x6], #8
mov x0, 1 mov x1, 2 mov x8, 0
(no loop instructions)
Retires: 3.000
Issues: 2.000
Integer unit issues: 1.001
Load/store unit issues: 1.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
3005 | 1300 | 2036 | 1020 | 1016 | 1028 | 1000 | 13146 | 14485 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1081 | 2001 | 1001 | 1000 | 1000 | 1000 | 13175 | 14543 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1091 | 2001 | 1001 | 1000 | 1000 | 1000 | 13207 | 14342 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1074 | 2001 | 1001 | 1000 | 1000 | 1000 | 13469 | 14634 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1089 | 2001 | 1001 | 1000 | 1000 | 1000 | 13308 | 14554 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1089 | 2001 | 1001 | 1000 | 1000 | 1000 | 13294 | 14736 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1075 | 2001 | 1001 | 1000 | 1000 | 1000 | 13224 | 14550 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1086 | 2001 | 1001 | 1000 | 1000 | 1000 | 13301 | 14496 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1105 | 2001 | 1001 | 1000 | 1000 | 1000 | 13337 | 14505 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
3004 | 1110 | 2001 | 1001 | 1000 | 1000 | 1000 | 13178 | 14457 | 2000 | 1000 | 2000 | 1000 | 2000 | 1001 | 1000 | 2000 |
Chain cycles: 3
Code:
ldpsw x0, x1, [x6], #8 eor x8, x8, x0 eor x8, x8, x0 add x6, x6, x8
mov x0, 1 mov x1, 2 mov x8, 0
(fused SUBS/B.cc loop)
Result (median cycles for code, minus 3 chain cycles): 4.0106
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
60209 | 71243 | 50262 | 40257 | 10005 | 40348 | 10003 | 1850501 | 549039 | 50209 | 40212 | 20006 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70115 | 50204 | 40204 | 10000 | 40206 | 10003 | 1850096 | 548910 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70106 | 50203 | 40203 | 10000 | 40206 | 10003 | 1850230 | 548985 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70106 | 50203 | 40203 | 10000 | 40206 | 10003 | 1853065 | 549920 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70106 | 50203 | 40203 | 10000 | 40206 | 10003 | 1850176 | 548967 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70106 | 50203 | 40203 | 10000 | 40206 | 10003 | 1850176 | 548967 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70106 | 50203 | 40203 | 10000 | 40206 | 10003 | 1850176 | 548967 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70106 | 50203 | 40203 | 10000 | 40206 | 10003 | 1850176 | 548967 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70106 | 50203 | 40203 | 10000 | 40206 | 10003 | 1850176 | 548967 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
60204 | 70106 | 50203 | 40203 | 10000 | 40206 | 10003 | 1850176 | 548967 | 50209 | 40212 | 20008 | 70221 | 20008 | 40103 | 10000 | 50100 |
Result (median cycles for code, minus 3 chain cycles): 4.0140
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
60029 | 71384 | 50080 | 40075 | 10005 | 40166 | 10003 | 1850410 | 549784 | 50029 | 40032 | 20008 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70104 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850056 | 549652 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70095 | 50023 | 40023 | 10000 | 40020 | 10000 | 1850056 | 549652 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70100 | 50023 | 40023 | 10000 | 40020 | 10000 | 1850056 | 549652 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70092 | 50023 | 40023 | 10000 | 40020 | 10000 | 1849975 | 549625 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70114 | 50023 | 40023 | 10000 | 40020 | 10013 | 1851764 | 550199 | 50073 | 40072 | 20028 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70090 | 50023 | 40023 | 10000 | 40020 | 10000 | 1849975 | 549625 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70091 | 50023 | 40023 | 10000 | 40020 | 10000 | 1850002 | 549634 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70091 | 50023 | 40023 | 10000 | 40020 | 10000 | 1849975 | 549625 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
60024 | 70092 | 50023 | 40023 | 10000 | 40020 | 10000 | 1849948 | 549616 | 50020 | 40020 | 20000 | 70020 | 20000 | 40013 | 10000 | 50010 |
Chain cycles: 3
Code:
ldpsw x0, x1, [x6], #8 eor x8, x8, x1 eor x8, x8, x1 add x6, x6, x8
mov x0, 1 mov x1, 2 mov x8, 0
(fused SUBS/B.cc loop)
Result (median cycles for code, minus 3 chain cycles): 4.0107
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
60209 | 71235 | 50262 | 40257 | 10005 | 40348 | 10003 | 1850043 | 548852 | 50209 | 40212 | 20006 | 70221 | 20008 | 40104 | 10000 | 50100 |
60204 | 70105 | 50204 | 40204 | 10000 | 40206 | 10003 | 1850123 | 548908 | 50209 | 40212 | 20008 | 70221 | 20008 | 40104 | 10000 | 50100 |
60204 | 70105 | 50204 | 40204 | 10000 | 40206 | 10003 | 1850203 | 548965 | 50209 | 40212 | 20008 | 70221 | 20008 | 40104 | 10000 | 50100 |
60204 | 70105 | 50204 | 40204 | 10000 | 40206 | 10003 | 1850203 | 548965 | 50209 | 40212 | 20008 | 70221 | 20008 | 40104 | 10000 | 50100 |
60204 | 70105 | 50204 | 40204 | 10000 | 40206 | 10003 | 1850203 | 548965 | 50209 | 40212 | 20008 | 70221 | 20008 | 40104 | 10000 | 50100 |
60204 | 70105 | 50204 | 40204 | 10000 | 40206 | 10003 | 1850203 | 548965 | 50209 | 40212 | 20008 | 70291 | 20028 | 40115 | 10000 | 50100 |
60204 | 70105 | 50204 | 40204 | 10000 | 40206 | 10003 | 1850203 | 548965 | 50209 | 40212 | 20008 | 70221 | 20008 | 40104 | 10000 | 50100 |
60204 | 70105 | 50204 | 40204 | 10000 | 40206 | 10003 | 1850203 | 548965 | 50209 | 40212 | 20008 | 70221 | 20008 | 40104 | 10000 | 50100 |
60204 | 70105 | 50204 | 40204 | 10000 | 40206 | 10003 | 1850203 | 548965 | 50209 | 40212 | 20008 | 70221 | 20008 | 40104 | 10000 | 50100 |
60204 | 70105 | 50204 | 40204 | 10000 | 40206 | 10003 | 1850500 | 549064 | 50209 | 40212 | 20008 | 70221 | 20008 | 40104 | 10000 | 50100 |
Result (median cycles for code, minus 3 chain cycles): 4.0115
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
60029 | 71243 | 50080 | 40075 | 10005 | 40166 | 10003 | 1850653 | 549869 | 50029 | 40032 | 20008 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70113 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850569 | 549820 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70113 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850569 | 549820 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70115 | 50024 | 40024 | 10000 | 40020 | 10000 | 1852324 | 550388 | 50020 | 40020 | 20000 | 70111 | 20028 | 40025 | 10000 | 50010 |
60024 | 70139 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850569 | 549820 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70113 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850569 | 549820 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70113 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850569 | 549820 | 50020 | 40020 | 20000 | 70111 | 20028 | 40025 | 10000 | 50010 |
60024 | 70115 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850569 | 549820 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70113 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850569 | 549820 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
60024 | 70113 | 50024 | 40024 | 10000 | 40020 | 10000 | 1850569 | 549820 | 50020 | 40020 | 20000 | 70020 | 20000 | 40014 | 10000 | 50010 |
Count: 8
Code:
ldpsw x0, x1, [x6], #8 ldpsw x0, x1, [x7], #8 ldpsw x0, x1, [x8], #8 ldpsw x0, x1, [x9], #8 ldpsw x0, x1, [x10], #8 ldpsw x0, x1, [x11], #8 ldpsw x0, x1, [x12], #8 ldpsw x0, x1, [x13], #8
mov x7, x6 mov x8, x6 mov x9, x6 mov x10, x6 mov x11, x6 mov x12, x6 mov x13, x6
(fused SUBS/B.cc loop)
Result (median cycles for code divided by count): 0.7520
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
240209 | 61196 | 160330 | 80240 | 80090 | 80241 | 80009 | 240487 | 251334 | 160117 | 80208 | 160018 | 80235 | 160070 | 80033 | 80000 | 160100 |
240204 | 60128 | 160111 | 80106 | 80005 | 80108 | 80008 | 240475 | 251186 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60131 | 160111 | 80106 | 80005 | 80108 | 80008 | 240475 | 251328 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240205 | 60213 | 160165 | 80133 | 80032 | 80135 | 80008 | 240476 | 251449 | 160116 | 80208 | 160016 | 80236 | 160072 | 80034 | 80000 | 160100 |
240204 | 60130 | 160111 | 80106 | 80005 | 80108 | 80061 | 240880 | 254291 | 160222 | 80261 | 160123 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60189 | 160111 | 80106 | 80005 | 80108 | 80008 | 240666 | 251662 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60165 | 160111 | 80106 | 80005 | 80108 | 80008 | 240664 | 251668 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60163 | 160111 | 80106 | 80005 | 80108 | 80008 | 240686 | 251713 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60158 | 160111 | 80106 | 80005 | 80108 | 80008 | 240665 | 251660 | 160116 | 80208 | 160016 | 80208 | 160016 | 80006 | 80000 | 160100 |
240204 | 60164 | 160111 | 80106 | 80005 | 80108 | 80008 | 240664 | 251694 | 160116 | 80208 | 160016 | 80235 | 160073 | 80033 | 80000 | 160100 |
Result (median cycles for code divided by count): 0.7511
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
240029 | 61152 | 160241 | 80149 | 80092 | 80151 | 80008 | 240191 | 251335 | 160026 | 80028 | 160016 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60081 | 160011 | 80011 | 80000 | 80010 | 80000 | 240167 | 251278 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60089 | 160011 | 80011 | 80000 | 80010 | 80000 | 240167 | 251252 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60083 | 160011 | 80011 | 80000 | 80010 | 80000 | 240186 | 251247 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60082 | 160011 | 80011 | 80000 | 80010 | 80000 | 240167 | 251278 | 160010 | 80020 | 160000 | 80055 | 160072 | 80034 | 80000 | 160010 |
240024 | 60081 | 160011 | 80011 | 80000 | 80010 | 80036 | 240521 | 254144 | 160081 | 80055 | 160073 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60082 | 160011 | 80011 | 80000 | 80010 | 80000 | 240165 | 251269 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60084 | 160011 | 80011 | 80000 | 80010 | 80000 | 240165 | 251254 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60080 | 160011 | 80011 | 80000 | 80010 | 80000 | 240167 | 251271 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |
240024 | 60083 | 160011 | 80011 | 80000 | 80010 | 80000 | 240167 | 251254 | 160010 | 80020 | 160000 | 80020 | 160000 | 80001 | 80000 | 160010 |