Apple Microarchitecture Research by Dougall Johnson M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions M1/A14 E-core (Icestorm): Overview | Base Instructions | SIMD and FP Instructions
Code:
swpalb w0, w1, [x6] nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop ; nop
mov x0, 0
(no loop instructions)
Retires (minus 70 nops): 2.000
Issues: 2.000
Integer unit issues: 0.001
Load/store unit issues: 2.000
SIMD/FP unit issues: 0.000
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch ldst uop (58) | simd uops in schedulers (5a) | dispatch uop (78) | map ldst uop (7d) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) |
72005 | 34688 | 2005 | 1 | 2004 | 2000 | 11812 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34080 | 2001 | 1 | 2000 | 2000 | 11788 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34078 | 2001 | 1 | 2000 | 2000 | 11788 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34077 | 2001 | 1 | 2000 | 2000 | 11788 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34080 | 2001 | 1 | 2000 | 2000 | 11788 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34345 | 2001 | 1 | 2000 | 2000 | 11788 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34187 | 2001 | 1 | 2000 | 2000 | 11788 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34077 | 2001 | 1 | 2000 | 2000 | 11788 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34078 | 2001 | 1 | 2000 | 2000 | 11788 | 2000 | 2000 | 4000 | 1 | 2000 |
72004 | 34077 | 2001 | 1 | 2000 | 2000 | 11788 | 2000 | 2000 | 4001 | 1 | 2000 |
Code:
swpalb w0, w1, [x6] add x6, x6, 2
(fused SUBS/B.cc loop)
Result (median cycles for code): 6.0061
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30206 | 60323 | 30157 | 10119 | 20038 | 10120 | 20004 | 35252 | 125367 | 30106 | 10202 | 20004 | 10202 | 40008 | 10001 | 20000 | 10100 |
30204 | 60061 | 30101 | 10101 | 20000 | 10101 | 20002 | 35249 | 125289 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60061 | 30101 | 10101 | 20000 | 10101 | 20002 | 35249 | 125323 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60064 | 30101 | 10101 | 20000 | 10101 | 20002 | 35249 | 125288 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60061 | 30101 | 10101 | 20000 | 10101 | 20002 | 35249 | 125300 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60061 | 30101 | 10101 | 20000 | 10101 | 20002 | 35249 | 125319 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60061 | 30101 | 10101 | 20000 | 10101 | 20002 | 35249 | 125369 | 30103 | 10201 | 20003 | 10213 | 40053 | 10013 | 20000 | 10100 |
30204 | 60061 | 30101 | 10101 | 20000 | 10101 | 20002 | 35249 | 125311 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60061 | 30101 | 10101 | 20000 | 10101 | 20002 | 35249 | 125322 | 30103 | 10201 | 20003 | 10201 | 40005 | 10001 | 20000 | 10100 |
30204 | 60061 | 30101 | 10101 | 20000 | 10101 | 20002 | 35249 | 125338 | 30103 | 10201 | 20003 | 10211 | 40049 | 10011 | 20000 | 10100 |
Result (median cycles for code): 6.0064
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
30026 | 60326 | 30061 | 10027 | 20034 | 10027 | 20002 | 35069 | 125622 | 30013 | 10021 | 20003 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60061 | 30011 | 10011 | 20000 | 10010 | 20000 | 35066 | 125555 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60061 | 30011 | 10011 | 20000 | 10010 | 20002 | 35069 | 126103 | 30013 | 10021 | 20003 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60064 | 30011 | 10011 | 20000 | 10010 | 20000 | 35066 | 125836 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60064 | 30011 | 10011 | 20000 | 10010 | 20000 | 35066 | 125803 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60064 | 30011 | 10011 | 20000 | 10010 | 20000 | 35066 | 125801 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60064 | 30011 | 10011 | 20000 | 10010 | 20000 | 35066 | 125808 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60064 | 30011 | 10011 | 20000 | 10010 | 20000 | 35066 | 125656 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60064 | 30011 | 10011 | 20000 | 10010 | 20000 | 35066 | 125650 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
30024 | 60064 | 30011 | 10011 | 20000 | 10010 | 20000 | 35066 | 125636 | 30010 | 10020 | 20000 | 10020 | 40000 | 10001 | 20000 | 10010 |
Code:
swpalb w0, w1, [x6]
mov x7, 8
(fused SUBS/B.cc loop)
Result (median cycles for code): 24.0039
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule ldst uop (55) | dispatch int uop (56) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? int retires (ef) |
20205 | 240385 | 20127 | 101 | 20026 | 100 | 20004 | 300 | 2365820 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240046 | 20105 | 101 | 20004 | 100 | 20004 | 300 | 2365842 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240039 | 20105 | 101 | 20004 | 100 | 20004 | 300 | 2365842 | 20104 | 200 | 20004 | 200 | 40048 | 1 | 20000 | 100 |
20204 | 240039 | 20105 | 101 | 20004 | 100 | 20004 | 300 | 2365842 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240039 | 20105 | 101 | 20004 | 100 | 20004 | 300 | 2365842 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240039 | 20105 | 101 | 20004 | 100 | 20004 | 300 | 2365842 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240039 | 20105 | 101 | 20004 | 100 | 20004 | 300 | 2365842 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240041 | 20105 | 101 | 20004 | 100 | 20024 | 300 | 2366004 | 20124 | 200 | 20024 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240039 | 20105 | 101 | 20004 | 100 | 20004 | 300 | 2365842 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
20204 | 240039 | 20105 | 101 | 20004 | 100 | 20004 | 300 | 2365842 | 20104 | 200 | 20004 | 200 | 40008 | 1 | 20000 | 100 |
Result (median cycles for code): 24.0037
retire uop (01) | cycle (02) | schedule uop (52) | schedule int uop (53) | schedule simd uop (54) | schedule ldst uop (55) | dispatch int uop (56) | dispatch simd uop (57) | dispatch ldst uop (58) | int uops in schedulers (59) | simd uops in schedulers (5a) | dispatch uop (78) | map int uop (7c) | map ldst uop (7d) | map int uop inputs (7f) | map ldst uop inputs (80) | ? int output thing (e9) | ? ldst retires (ed) | ? simd retires (ee) | ? int retires (ef) |
20025 | 240150 | 20035 | 11 | 0 | 20024 | 10 | 0 | 20024 | 30 | 2362774 | 20034 | 20 | 20024 | 20 | 40048 | 1 | 20000 | 0 | 10 |
20024 | 240037 | 20011 | 11 | 0 | 20000 | 10 | 0 | 20056 | 30 | 2363612 | 20066 | 20 | 20056 | 20 | 40052 | 1 | 20000 | 0 | 10 |
29768 | 257656 | 28774 | 5109 | 32 | 23633 | 4769 | 30 | 22116 | 52996 | 2399424 | 25473 | 4863 | 22409 | 8924 | 34644 | 4112 | 17790 | 4 | 5411 |
20025 | 240098 | 20036 | 11 | 0 | 20025 | 10 | 0 | 20000 | 30 | 2362616 | 20010 | 20 | 20000 | 20 | 40000 | 1 | 20000 | 0 | 10 |
20024 | 240037 | 20011 | 11 | 0 | 20000 | 10 | 0 | 20000 | 30 | 2362604 | 20010 | 20 | 20000 | 20 | 40040 | 1 | 20000 | 0 | 10 |
20024 | 240037 | 20011 | 11 | 0 | 20000 | 10 | 0 | 20000 | 30 | 2362604 | 20010 | 20 | 20000 | 20 | 40000 | 1 | 20000 | 0 | 10 |
20024 | 240037 | 20011 | 11 | 0 | 20000 | 10 | 0 | 20000 | 30 | 2362604 | 20010 | 20 | 20000 | 20 | 40000 | 1 | 20000 | 0 | 10 |
20025 | 240064 | 20037 | 11 | 0 | 20026 | 10 | 0 | 20000 | 30 | 2362634 | 20010 | 20 | 20000 | 20 | 40000 | 1 | 20000 | 0 | 10 |
20024 | 240037 | 20011 | 11 | 0 | 20000 | 10 | 0 | 20000 | 30 | 2362634 | 20010 | 20 | 20000 | 20 | 40000 | 1 | 20000 | 0 | 10 |
20024 | 240037 | 20011 | 11 | 0 | 20000 | 10 | 0 | 20000 | 30 | 2362604 | 20010 | 20 | 20000 | 20 | 40048 | 1 | 20000 | 0 | 10 |