Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

SDIV (slow, 64-bit)

Test 1: uops

Code:

  sdiv x0, x1, x2
  mov x1, #0x7fffffffffffffff
  mov x2, #3

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 2.000

Integer unit issues: 2.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
10042103020012001100018724410001000200020011000
10042103020012001100018724410001000200020011000
10042103020012001100018724410001000200020011000
10042103020012001100018724410001000200020011000
10042103020012001100018724410001000200020011000
10042103020012001100018724410001000200020011000
10042103020012001100018724410001000200020011000
10042103020012001100018724410001000200020011000
10042103020012001100018724410001000200020011000
10042103020012001100018724410001000200020011000

Test 2: Latency 1->2

Chain cycles: 2

Code:

  sdiv x0, x1, x2
  eor x1, x1, x0
  eor x1, x1, x0
  mov x1, #0x7fffffffffffffff
  mov x2, #3

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 2 chain cycles): 21.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
3020423003040201402013020361771783020330210602264010130100
3020423003040201402013020361771783020330210602244010130100
3020523006040203402033023161772063020330212602244010130100
3020423003040201402013020361772063020330212602244010130100
3020423003040201402013020361772063020330212602964010430100
3020423003040201402013020361772063020330212602244010130100
3020423003040201402013020361772063020330212602244010130100
3020523006040202402023023061772063020330212602204010130100
3020423003040201402013020361771783020330210602244010130100
3020423003040201402013020361775663023030248602244010130100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 2 chain cycles): 21.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
3002423003040011400113001361774653001030020600204000130010
3002423003040011400113001061774653001030020601164000230010
3002423003040011400113001061774653001030020600204000130010
3002423003040011400113001061774653001030020600204000130010
3002423003040011400113001061777873004230069600204000130010
3002423003040011400113001061774653001030020600204000130010
3002423003040011400113001061778353004030068600204000130010
3002623009040016400163007061778353004030068600204000130010
3002423003040011400113001061774653001030020600204000130010
3002423003040011400113001061774653001030020601144000230010

Test 3: Latency 1->3

Chain cycles: 2

Code:

  sdiv x0, x1, x2
  eor x2, x2, x0
  eor x2, x2, x0
  mov x1, #0x7fffffffffffffff
  mov x2, #3

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 2 chain cycles): 21.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
3020423003040201402013020361772063020330212602204010130100
3020523010740205402053025861772063020330212602244010130100
3020423007740203402033023061772063020330212602244010130100
3020423003040201402013020361777053023230252602964010330100
3020423003040201402013020361772063020330212603044010430100
3020423003040201402013020361772063020330212602984010330100
3020423003040201402013020361775333023130250602244010130100
3020423007740204402043023261772063020330212602244010130100
3020423003040201402013020361777233023030248602244010130100
3020523006040202402023023061777113023230248602244010130100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 2 chain cycles): 21.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
3002423003040011400113001361774533001330030600204000130010
3002423003040011400113001061778353004030068600204000130010
3002423003040011400113001061774653001030020600204000130010
3002423003040011400113001061774653001030020600204000130010
3002523006040012400123004061774653001030020600204000130010
3002423003040011400113001061774653001030020600204000130010
3002423003040011400113001061774653001030020600944000330010
3002423003040011400113001061774653001030020600204000130010
3002423003040011400113001061774653001030020600204000130010
3002423003040011400113001061778163003730056600204000130010

Test 4: throughput

Code:

  sdiv x0, x1, x2
  mov x1, #0x7fffffffffffffff
  mov x2, #3

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 21.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
1020421003020101201011010018797781011810238202162000110100
1020421003020101201011010018795441010010208202162000110100
1020421003020101201011010018795441010010208202162000110100
1020421003020101201011010018795441010010208202162000110100
1020421003020101201011010018795441010010208202162000110100
1020521006020102201021010918795441010010208202162000110100
1020421003020101201011010018795441010010208202162000110100
1020421003020101201011010018795441010010208202162000110100
1020421003020101201011010018795441010010208202162000110100
1020421003020101201011010018795441010010208202162000110100

1000 unrolls and 10 iterations

Result (median cycles for code): 21.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? simd retires (ee)? int retires (ef)
1002421003020021200211002018793041002010020200560200120010010
1002421003020021200211002018793041002010020200200200110010010
1002421003020021200211002018793041002010020200200200110010010
1002421003020021200211002018794211002910044200200200110010010
1002421003020021200211002018793041002010020200200200110010010
1002421003020021200211002018793041002010020200200200110010010
1002421003020021200211002018794211002910044200200200110010010
1002421003020021200211002018793041002010020200200200110010010
1002421003020021200211002018793041002010020200200200110010010
1002421003020021200211002018794211002910044200200200110010010