Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

SDIV (slow, 64-bit)

Test 1: uops

Code:

  sdiv x0, x1, x2
  mov x1, #0x8000000000000000
  mov x2, #3

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 2.000

Integer unit issues: 2.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
10042103020012001100018724410001000200020011000
10042103020012001100018724410001000200020011000
10042103020012001100018724410001000200020011000
10042103020012001100018724410001000200020011000
10042103020012001100018724410001000200020011000
10042103020012001100018724410001000200020011000
10042103020012001100018724410001000200020011000
10042103020012001100018724410001000200020011000
10042103020012001100018724410001000200020011000
10042103020012001100018724410001000200020011000

Test 2: Latency 1->2

Chain cycles: 2

Code:

  sdiv x0, x1, x2
  eor x1, x1, x0
  eor x1, x1, x0
  mov x1, #0x8000000000000000
  mov x2, #3

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 2 chain cycles): 21.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
3020423003040201402013020361771503020330210602204010130100
3020423003040201402013020361772063020330212602244010130100
3020423003040201402013020361775283023430252602244010130100
3020423003040201402013020361772063020330212602244010130100
3020423003040201402013020361775663023030244602984010330100
3020423003040201402013020361772063020330212603044010430100
3020423003040201402013020361772063020330212602244010130100
3020423003040201402013020361775423023230248602244010130100
3020423003040201402013020361772063020330212602244010130100
3020423003040201402013020361772063020330212602244010130100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 2 chain cycles): 21.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
3002523006040014400143004261774533001330030600204000130010
3002523006040013400133004161778153004230068600204000130010
3002523006040014400143004261774653001030020600204000130010
3002423003040011400113001061774653001030020601164000230010
3002423003040011400113001061774653001030020600204000130010
3002423003040011400113001061774653001030020600204000130010
3002423003040011400113001061778153004230068600204000130010
3002623009040016400163007061774653001030020600204000130010
3002423003040011400113001061774653001030020600204000130010
3002523006040012400123003761774653001030020600204000130010

Test 3: Latency 1->3

Chain cycles: 2

Code:

  sdiv x0, x1, x2
  eor x2, x2, x0
  eor x2, x2, x0
  mov x1, #0x8000000000000000
  mov x2, #3

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 2 chain cycles): 21.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
3020423003040201402013020361771503020330210602244010130100
3020423003040201402013020361775663023030248602244010130100
3020523006040204402043023261772063020330212602244010130100
3020423003040201402013020361772063020330212602244010130100
3020423003040201402013020361772063020330212602944010230100
3020423003040201402013020361772063020330212602244010130100
3020423003040201402013020361772063020330212602244010130100
3020423003040201402013020361775233023030245602244010130100
3020423003040201402013020361772063020330212602244010130100
3020423003040201402013020361772063020330212602244010130100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 2 chain cycles): 21.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? simd retires (ee)? int retires (ef)
3002523006040012400123004061774843001330032600200400010030010
3002423003040011400113001061774653001030020600200400010030010
3002523006040014400143004261774653001030020601140400030030010
3002423003040011400113001061774653001030020601080400020030010
3002423003040011400113001061774653001030020601220400030030010
3002523006040012400123004061774843001330032600200400010030010
3002423003040011400113001061774653001030020600200400010030010
3002423003040011400113001061774653001030020601160400030030010
3002423003040011400113001061774653001030020600200400010030010
3002423003040011400113001061774653001030020600200400010030010

Test 4: throughput

Code:

  sdiv x0, x1, x2
  mov x1, #0x8000000000000000
  mov x2, #3

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 21.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
1020421003020101201011010001879544010100102060202122000110100
1020421003020101201011010001879544010100102080202162000110100
1020621009020103201031011801879544010100102080202162000110100
1020421003020101201011010001879544010100102080202162000110100
1020421003020101201011010001879544010100102080202162000110100
1020521006020102201021010901879544010100102080202162000110100
1020521006020104201041011101879544010100102080202162000110100
1020421003020101201011010001879661010109102240202162000110100
1020421003020101201011010001879544010100102060202162000110100
1020421003020101201011010001879544010100102080202162000110100

1000 unrolls and 10 iterations

Result (median cycles for code): 21.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule simd uop (54)schedule ldst uop (55)dispatch int uop (56)dispatch simd uop (57)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
1002421003020021200210010020018793041002010026200202001110010
1002421003020021200210010020018794211002910044200202001110010
1002421003020021200210010020018793041002010020200202001110010
1002421003020021200210010020018793041002010020200202001110010
1002421003020021200210010020018794211002910042200202001110010
1002421003020021200210010020018793041002010020200202001110010
1002421003020021200210010020018793041002010020200202001110010
1002521006020022200220010029018793041002010020200202001110010
1002421003020021200210010020018793041002010020200202001110010
1002421003020021200210010020018793041002010020200202001110010