Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

EXTR (register, 64-bit)

Test 1: uops

Code:

  extr x0, x0, x1, 13
  mov x0, 1
  mov x1, 2

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 2.000

Issues: 2.000

Integer unit issues: 2.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20041031200120012000700020002000300020012000
20041031200120012000700020002000300020012000
20041031200120012000700020002000300020012000
20041031200120012000700020002000300020012000
20041031200120012000700020002000300020012000
20041031200120012000700020002000300020012000
20041031200120012000700020002000300020012000
20041031200120012000700020002000300020012000
20041031200120012000700020002000300020012000
20041031200120012000700020002000300020012000

Test 2: Latency 1->2

Code:

  extr x0, x0, x1, 13
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.0035

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
2020410057201102011020113703432011320218302272001020100
2020410035201102011020113703432011320218302242001020100
2020410035201102011020113703432011320216302242001020100
2020410035201102011020113703432011320216302242001020100
2020410035201102011020113703432011320216302242001020100
2020410035201102011020113703432011320216302242001020100
2020410035201102011020113703432011320216302242001020100
2020410035201102011020113703432011320216302242001020100
2020410035201102011020113703432011320216302242001020100
2020410035201102011020113703432011320216302242001020100

1000 unrolls and 10 iterations

Result (median cycles for code): 1.0031

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
2002410057200202002020023700732002320038300202000120010
2002410031200112001120010700302001020020300202000120010
2002410031200112001120010700302001020020300202000120010
2002410031200112001120010700302001020020300202000120010
2002410031200112001120010700302001020020300202000120010
2002410031200112001120010700302001020020300202000120010
2002410031200112001120010700302001020020300202000120010
2002410031200112001120010700302001020020300202000120010
2002410031200112001120010700302001020020300202000120010
2002410031200112001120010700302001020020300202000120010

Test 3: Latency 1->3

Code:

  extr x0, x1, x0, 13
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 2.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201053799182010520210302152000120100
20205200602011520115201383801932013820248302212000120100
20204200302010120101201063799792010520212302182000120100
20204200302010120101201053799792010520212302182000120100
20204200302010120101201053799792010520212302722001520100
20204200302010120101201053799792010520212302182000120100
20204200302010120101201053799792010520212302182000120100
20204200302010120101201053799792010520212302182000120100
20204200302010120101201053799792010520212302182000120100
20204200302010120101201053799792010520212302182000120100

1000 unrolls and 10 iterations

Result (median cycles for code): 2.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200153797992001520032300202000120010
20024200302001120011200103797762001020020300202000120010
20024200302001120011200103797762001020020300202000120010
20024200302001120011200103797762001020020300202000120010
20024200302001120011200103797762001020020300202000120010
20024200302001120011200103797762001020020300202000120010
20024200302001120011200103797762001020020300202000120010
20024200302001120011200103797762001020020300202000120010
20024200302001120011200103797762001020020300202000120010
20024200302001120011200103797762001020020300202000120010

Test 4: throughput

Count: 8

Code:

  extr x0, x8, x9, 13
  extr x1, x8, x9, 13
  extr x2, x8, x9, 13
  extr x3, x8, x9, 13
  extr x4, x8, x9, 13
  extr x5, x8, x9, 13
  extr x6, x8, x9, 13
  extr x7, x8, x9, 13
  mov x8, 9
  mov x9, 10

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 1.0004

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
16020480057160110160110160113560343160113160218240224160010160100
16020480035160110160110160113560343160113160216240284160040160100
16020480035160110160110160113560343160113160216240224160010160100
16020480035160110160110160113560343160113160216240224160010160100
16020480035160110160110160113560343160113160216240224160010160100
16020580066160140160140160150560343160113160216240224160010160100
16020480035160110160110160113560343160113160216240224160010160100
16020480035160110160110160113560343160113160216240224160010160100
16020480035160110160110160113560343160113160216240224160010160100
16020480035160110160110160113560343160113160216240224160010160100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 1.0004

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
16002480053160020160020160023560073160023160036240047160010160010
16002480031160011160011160010560030160010160020240020160001160010
16002580066160050160050160060560030160010160020240020160001160010
16002480031160011160011160010560030160010160020240020160001160010
16002480031160011160011160010560030160010160020240020160001160010
16002480031160011160011160010560030160010160020240020160001160010
16002480031160011160011160010560193160060160080240020160001160010
16002480031160011160011160010560030160010160020240107160040160010
16002480063160020160020160023560030160010160020240020160001160010
16002480031160011160011160010560030160010160020240020160001160010