Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

AUTDZB

Test 1: uops

Code:

  autdzb x0
  mov x0, 1

(requires arm64e binary, with arm64e_preview_abi boot arg)

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)? int output thing (e9)? int retires (ef)
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052825101110011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000

Test 2: Latency 1->1

Code:

  autdzb x0
  mov x0, 1

(requires arm64e binary, with arm64e_preview_abi boot arg)

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
10204600291020110201010200530325102002002001010110100
10204600291020110201010200530325102002002001010110100
10204600291020110201010200530325102002002001010110100
10204600291020110201010200530325102002002001010110100
10204600291020110201010200530325102002002001010110100
10204600291020110201010200530325102002002001010110100
10204600291020110201010200530325102002002001010110100
10204600291020110201010200530325102002002001010410100
10204600291020110201010200530325102002002001010110100
10204600291020110201010200530325102002002001010110100

1000 unrolls and 10 iterations

Result (median cycles for code): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
10024600291002110021100205297851002020201001410010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001410010

Test 3: throughput

Count: 8

Code:

  autdzb x0
  autdzb x1
  autdzb x2
  autdzb x3
  autdzb x4
  autdzb x5
  autdzb x6
  autdzb x7

(requires arm64e binary, with arm64e_preview_abi boot arg)

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 2.0004

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule simd uop (54)schedule ldst uop (55)dispatch int uop (56)dispatch simd uop (57)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
802041600308020180201008020200136037908020220002008010180100
802041600308020180201008020200136048108020220002008010180100
802041600308020180201008020200136048108020220002008010980100
802041600308020180201008020200136048108020220002008010180100
802041600308020180201008020200136048108020220002008010180100
802041600308020180201008020200136048108020220002008010180100
802041600308020180201008020200136058008022020002008010180100
802041600308020180201008020200136056608021920002008010180100
802041600308020180201008020200136048108020220002008010180100
802041600308020180201008020200136048108020220002008010180100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 2.0004

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? simd retires (ee)? int retires (ef)
8002416003080021800218002213598908002220200800110080010
8002516006080029800298003913599318002020200800110080010
8002416003080021800218002013599318002020200800210080010
8002416003080021800218002013599318002020200800110080010
8002416003080021800218002013599318002020200800210080010
8002416003080021800218002213598808002020200800110080010
8002416003080021800218002013599318002020200800110080010
8002416003080021800218002013599318002020200800110080010
8002416003080021800218002013599318002020200800110080010
8002416003080021800218002013599318002020200800110080010