Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

SDIV (fast, 64-bit)

Test 1: uops

Code:

  sdiv x0, x1, x2
  mov x1, #0
  mov x2, #0

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 2.000

Integer unit issues: 2.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
100470302001200110006174810001000200020011000
100470302001200110006174810001000200020011000
100470302001200110006174810001000200020011000
100470302001200110006174810001000200020011000
100470302001200110006174810001000200020011000
100470302001200110006174810001000200020011000
100470302001200110006174810001000200020011000
100470302001200110006174810001000200020011000
100470302001200110006174810001000200020011000
100470302001200110006174810001000200020011000

Test 2: Latency 1->2

Chain cycles: 2

Code:

  sdiv x0, x1, x2
  eor x1, x1, x0
  eor x1, x1, x0
  mov x1, #0
  mov x2, #0

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 2 chain cycles): 7.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
302049003040201402013020323986343020330210602204010130100
302049003040201402013020323987183020330212602244010130100
302059006040206402063023323987183020330212602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323987183020330212602964010630100
302049003040201402013020323990753023330252602244010130100
302049003040201402013020323987183020330212602244010130100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 2 chain cycles): 7.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
300249003040011400113001323989543001330030600204000130010
300249003040011400113001023989773001030020600204000130010
300249003040011400113001023989773001030020600204000130010
300249003040011400113001023989773001030020600204000130010
300259006040017400173004423989773001030020600204000130010
300249003040011400113001023989773001030020600204000130010
300259006040018400183004523989773001030020600204000130010
300249003040011400113001023989773001030020600204000130010
300249003040011400113001024009683013730179600204000130010
300249003040011400113001023989773001030020600204000130010

Test 3: Latency 1->3

Chain cycles: 2

Code:

  sdiv x0, x1, x2
  eor x2, x2, x0
  eor x2, x2, x0
  mov x1, #0
  mov x2, #0

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 2 chain cycles): 7.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
302059006040209402093023623986363020330210602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323987183020330213602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323987183020330212602244010130100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 2 chain cycles): 7.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)ldst uops in schedulers (5b)dispatch uop (78)map int uop (7c)map ldst uop (7d)map simd uop (7e)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
30024900304001140011300130239893500300103002000600204000130010
30024900304001140011300100239897700300103002000600204000130010
30024900304001140011300100239897700300103002000600204000130010
300249003040011400113001014162432229247321193329403224416808600204000130010
30024900304001140011300100239897700300103002000600204000130010
30024900304001140011300100239897700300103002000600204000130010
30024900304001140011300100239897700300103002000600204000130010
30024900304001140011300100239897700300103002000600204000130010
30024900304001140011300100239897700300103002000600204000130010
30024900304001140011300100239897700300103002000601304000830010

Test 4: throughput

Code:

  sdiv x0, x1, x2
  mov x1, #0
  mov x2, #0

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 7.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
10204700302010120101101006200481010010206202122000110100
10205700602010520105101106200481010010208202162000110100
10204700302010120101101006200481010010208202162000110100
10204700302010120101101006200481010010208202162000110100
10204700302010120101101006200481010010208202162000110100
10204700302010120101101006200481010010208202162000110100
10204700302010120101101006200481010010208202162000110100
10205700602010520105101106200481010010208202162000110100
10204700302010120101101006200481010010208202162000110100
10205700602010520105101106200481010010208202162000110100

1000 unrolls and 10 iterations

Result (median cycles for code): 7.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
10024700302002120021100206198081002010020200682001510010
10024700302002120021100206198081002010020200202001110010
10024700302002120021100206198081002010020200202001110010
10024700302002120021100206198081002010020200202001110010
10024700302002120021100206198081002010020200202001110010
10024700302002120021100206198081002010020200202001110010
10024700302002120021100206198081002010020200202001110010
10024700302002120021100206198081002010020200202001110010
10024700302002120021100206198081002010020200682001510010
10024700302002120021100206198081002010020200202001110010