Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

SDIV (fast, 32-bit)

Test 1: uops

Code:

  sdiv w0, w1, w2
  mov w1, #0
  mov w2, #0

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 2.000

Integer unit issues: 2.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
100470302001200110006174810001000200020011000
100470302001200110006174810001000200020011000
100470302001200110006174810001000200020011000
100470302001200110006174810001000200020011000
100470302001200110006174810001000200020011000
100470302001200110006174810001000200020011000
100470302001200110006174810001000200020011000
100470302001200110006174810001000200020011000
100470302001200110006174810001000200020011000
100470302001200110006174810001000200020011000

Test 2: Latency 1->2

Chain cycles: 2

Code:

  sdiv w0, w1, w2
  eor x1, x1, x0
  eor x1, x1, x0
  mov w1, #0
  mov w2, #0

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 2 chain cycles): 7.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
302049003040201402013020323986343020330210602244010130100
302049003040201402013020323990193023430248602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323987183020330212602244010130100
302049022240242402423033023987183020330212602244010130100
302049012440220402203026623990483023530252603764012030100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323997103026630290602244010130100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 2 chain cycles): 7.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
300249003040011400113001323989353001030020600204000130010
300249003040011400113001023989773001030020600204000130010
300249003040011400113001023989773001030020600204000130010
300249003040011400113001023989773001030020601244000630010
300249003040011400113001023989773001030020600204000130010
300259006040016400163004323989773001030020600204000130010
300249003040011400113001023989773001030020600204000130010
300249003040011400113001023989773001030020600204000130010
300249003040011400113001023989773001030020600204000130010
300249003040011400113001023989773001030020600204000130010

Test 3: Latency 1->3

Chain cycles: 2

Code:

  sdiv w0, w1, w2
  eor x2, x2, x0
  eor x2, x2, x0
  mov w1, #0
  mov w2, #0

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 2 chain cycles): 7.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
302049003040201402013020323986773020330210602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323990753023330251602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323987183020330212602244010130100
302049003040201402013020323987183020330212602244010130100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 2 chain cycles): 7.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
3002490030400114001103001323989353001030020600204000130010
3002490030400114001103001023989773001030020600204000130010
3002490030400114001103001023989773001030020600204000130010
3002490030400114001103001023989773001030020600204000130010
3002490030400114001103001023993383004430072600204000130010
3002490030400114001103001023989773001030020600204000130010
3002490030400114001103001023989773001030020600204000130010
3002490030400114001103001023989773001030020600204000130010
3002490030400114001103001023989773001030020600204000130010
3002490030400114001103001023989773001030020600204000130010

Test 4: throughput

Code:

  sdiv w0, w1, w2
  mov w1, #0
  mov w2, #0

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 7.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
10204700302010120101101006200481010010206202122000110100
10205700602010720107101126200481010010206202162000110100
10204700302010120101101006200481010010208202162000110100
10204700302010120101101006200481010010208202162000110100
10204700302010120101101006200481010010208202162000110100
10204700302010120101101006200481010010208202162000110100
10204700302010120101101006200481010010208202162000110100
10204700302010120101101006200481010010208202162000110100
10204700302010120101101006200481010010208202162000110100
10205700602010520105101106200481010010208202162000110100

1000 unrolls and 10 iterations

Result (median cycles for code): 7.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
10024700302002120021100206198081002010020200202001110010
10024700302002120021100206198081002010020200202001110010
10024700302002120021100206198081002010020200202001110010
10024700302002120021100206198081002010020200202001110010
10024700302002120021100206198081002010020200202001110010
10024700302002120021100206198081002010020200202001110010
10024700302002120021100206198081002010020200202001110010
10024700302002120021100206198081002010020200202001110010
10024700302002120021100206198081002010020200742001610010
10024700302002120021100206198081002010020200202001110010