Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

DSB (NSHLD)

Test 1: uops

Code:

  dsb nshld

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? simd retires (ee)? int retires (ef)
100416033100111000100040001000100000199900
10041603310011100010104040101010101207181315294473576501
1004160281001110001000400010001000001100000
1004160281001110001000400010001000001100000
1004160281001110001000400010001000001100000
1004160281001110001000400010001000001100000
1004160281001110001000400010001000001100000
1004160281001110001000400010001000001100000
1004160281001110001000400010001000001100000
1004160281001110001000400010001000001100000

Test 2: throughput

Code:

  dsb nshld

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 16.0028

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)? int output thing (e9)? ldst retires (ed)? int retires (ef)
10204160033101051011000410010014300400561011420010014200110000100
10204160055101111011001010010004300400161010420010004200110000100
1020416003310105101100041001001230040048101122001001220019999100
10204160028101051011000410010012300400481011220010012200110000100
10204160028101051011000410010016300400641011620010016200110000100
10204160046101131011001210010004300400161010420010004200110000100
10204160040101131011001210010004300400161010420010004200110000100
10204160028101051011000410010004300400161010420010004200110000100
10204160028101051011000410010004300400161010420010004200110000100
1020416002810105101100041001001630040064101162001001620019999100

1000 unrolls and 10 iterations

Result (median cycles for code): 16.0028

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)map simd uop inputs (81)? int output thing (e9)? ldst retires (ed)? simd retires (ee)? int retires (ef)
10024160033100151110004101000430400161001420100042000110000010
10024160050100241110013101000030400001001020100002000110000010
10024160028100111110000101000030400001001020100002000110000010
10024160028100111110000101000830400321001820100082000110000010
10024160028100111110000101000030400001001020100002000110000010
10024160028100111110000101000030400001001020100002000110000010
10024160028100111110000101000030400001001020100002000110000010
10024160028100111110000101000030400001001020100002000110000010
10024160028100111110000101000030400001001020100002000110000010
10024160028100111110000101000030400001001020100002000110000010