Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

EXTR (register, 32-bit)

Test 1: uops

Code:

  extr w0, w0, w1, 13
  mov x0, 1
  mov x1, 2

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 2.000

Issues: 2.000

Integer unit issues: 2.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20041031200120012000700020002000300020012000
20041031200120012000700020002000300020012000
20041031200120012000700020002000300020012000
20041031200120012000700020002000300020012000
20041031200120012000700020002000300020012000
20041031200120012000700020002000300020012000
20041031200120012000700020002000300020012000
20041031200120012000700020002000300020012000
20041031200120012000700020002000300020012000
20041031200120012000700020002000300020012000

Test 2: Latency 1->2

Code:

  extr w0, w0, w1, 13
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.0035

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
2020410035201102011020113703432011320218302272001020100
2020410035201102011020113703432011320216302242001020100
2020410035201102011020113703432011320216302242001020100
2020410035201102011020113703432011320216302242001020100
2020410035201102011020113703432011320216302242001020100
2020410035201102011020113703432011320216302242001020100
2020410035201102011020113703432011320216302242001020100
2020410035201102011020113703432011320216302242001020100
2020410035201102011020113703432011320216302242001020100
2020410035201102011020113703432011320216302242001020100

1000 unrolls and 10 iterations

Result (median cycles for code): 1.0031

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
2002410057200202002020023700732002320038300202000120010
2002410031200112001120010700302001020020300202000120010
2002410031200112001120010700302001020020300202000120010
2002410031200112001120010700302001020020300202000120010
2002410031200112001120010700302001020020300202000120010
2002410031200112001120010700302001020020300202000120010
2002410031200112001120010700302001020020300202000120010
2002410031200112001120010700302001020020300202000120010
2002410031200112001120010700302001020020300202000120010
2002410031200112001120010700302001020020300202000120010

Test 3: Latency 1->3

Code:

  extr w0, w1, w0, 13
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 2.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20204200302010120101201063798282010520210302152000120100
20204200302010120101201053799792010520212302182000120100
20204200302010120101201053799792010520212302182000120100
20204200302010120101201053799792010520212302182000120100
20204200302010120101201053799102010520210302182000120100
20204200302010120101201053799792010520212302182000120100
20204200302010120101201053799792010520212302182000120100
20204200302010120101201053799792010520212302182000120100
20204200302010120101201053799792010520212302182000120100
20204200302010120101201053799792010520212302182000120100

1000 unrolls and 10 iterations

Result (median cycles for code): 2.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
20024200302001120011200163796852001020020300202000120010
20024200302001120011200103797762001020020300202000120010
20024200302001120011200103797762001020020300202000120010
20024200302001120011200103797762001020020300202000120010
20024200302001120011200103797762001020020300202000120010
20024200302001120011200103797762001020020300202000120010
20024200302001120011200103797762001020020300202000120010
20024200302001120011200103797762001020020300202000120010
20024200302001120011200103797762001020020300202000120010
20024200302001120011200103797762001020020300202000120010

Test 4: throughput

Count: 8

Code:

  extr w0, w8, w9, 13
  extr w1, w8, w9, 13
  extr w2, w8, w9, 13
  extr w3, w8, w9, 13
  extr w4, w8, w9, 13
  extr w5, w8, w9, 13
  extr w6, w8, w9, 13
  extr w7, w8, w9, 13
  mov x8, 9
  mov x9, 10

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 1.0004

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
16020480046160110160110160113560343160113160218240224160010160100
16020480046160110160110160113560343160113160216240224160010160100
16020480035160110160110160113560343160113160216240224160010160100
16020480035160110160110160113560343160113160216240224160010160100
16020480035160110160110160113560343160113160216240284160040160100
16020480035160110160110160113560343160113160216240224160010160100
16020480035160110160110160113560343160113160216240224160010160100
16020580066160140160140160150560343160113160216240224160010160100
16020480035160110160110160113560343160113160216240224160010160100
16020480035160110160110160113560343160113160216240224160010160100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 1.0004

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
16002480057160020160020160023560030160010160020240080160031160010
16002480031160011160011160010560030160010160020240020160001160010
16002480031160011160011160010560030160010160020240020160001160010
16002480031160011160011160010560030160010160020240083160031160010
16002480031160011160011160010560030160010160020240020160001160010
16002480031160011160011160010560030160010160020240020160001160010
16002480031160011160011160010560030160010160020240020160001160010
16002480031160011160011160010560193160060160080240020160001160010
16002480031160011160011160010560030160010160020240020160001160010
16002480031160011160011160010560030160010160020240020160001160010