Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

AUTIZB

Test 1: uops

Code:

  autizb x0
  mov x0, 1

(requires arm64e binary, with arm64e_preview_abi boot arg)

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)? int output thing (e9)? int retires (ef)
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000

Test 2: Latency 1->1

Code:

  autizb x0
  mov x0, 1

(requires arm64e binary, with arm64e_preview_abi boot arg)

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
1020460029102011020110200530325102002002001010110100
1020460029102011020110200530325102002002001010110100
1020460029102011020110200530325102002002001010110100
1020460029102011020110200530325102002002001010110100
1020460029102011020110200530325102002002001010110100
1020460029102011020110200530325102002002001010110100
1020560058102041020410211530325102002002001010410100
1020460029102011020110200530325102002002001010110100
1020460029102011020110200530325102002002001010110100
1020460029102011020110200530325102002002001010110100

1000 unrolls and 10 iterations

Result (median cycles for code): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
10025600581002410024100315297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205300851005320201001110010
10024600291002110021100205297851002020201001110010
10024601961003710037100685299441003422201001110010
10024600711002510025100325297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600711002510025100325297851002020201001110010
10024600291002110021100205297851002020201001910010

Test 3: throughput

Count: 8

Code:

  autizb x0
  autizb x1
  autizb x2
  autizb x3
  autizb x4
  autizb x5
  autizb x6
  autizb x7

(requires arm64e binary, with arm64e_preview_abi boot arg)

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 2.0004

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)map simd uop inputs (81)? int output thing (e9)? ldst retires (ed)? simd retires (ee)? int retires (ef)
8020416003080201802018020201360430080202200020000801010080100
8020416003080201802018020201360481080202200020000801010080100
8020416003080201802018020201360481080202200020000801010080100
80205160064802118021180220134136742735628059154716220000801270080100
8020416003080201802018020201360379080202200020000801110080100
8020416003080201802018020201360481080202200020000801010080100
8020416003080201802018020201360481080202200020000801010080100
8020416003080201802018020201360481080202200020000801010080100
8020416003080201802018020201360481080202200020000801010080100
8020416003080201802018020201360481080202200020000801010080100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 2.0004

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
8002416003080021800218002201359890080022200208001180010
8002416003080021800218002001360016080037200208001180010
8002416003080021800218002001359931080020200208001180010
8002416003080021800218002001359931080020200208002180010
8002416003080021800218002001359931080020200208001180010
8002416003080021800218002001359931080020200208002180010
8002416003080021800218002001359931080020200208001180010
8002416003080021800218002001360007080040200208001180010
8002416003080021800218002001359931080020200208001180010
8002416003080021800218002001359931080020200208001180010