Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

PACIB

Test 1: uops

Code:

  pacib x0, x1
  mov x0, 1

(requires arm64e binary, with arm64e_preview_abi boot arg)

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)? int output thing (e9)? int retires (ef)
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000

Test 2: Latency 1->1

Code:

  pacib x0, x1
  mov x0, 1

(requires arm64e binary, with arm64e_preview_abi boot arg)

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
1020460029102011020110200530325102002002001010110100
1020460029102011020110200530325102002002001010110100
1020460029102011020110200530325102002002001010110100
1020460029102011020110200530325102002002001010110100
1020460029102011020110200530325102002002001010110100
1020460029102011020110200530325102002002001010110100
1020460029102011020110200530325102002002001010110100
1020460029102011020110200530325102002002001010110100
1020460029102011020110200530325102002002001010110100
1020460029102011020110200530325102002002001010110100

1000 unrolls and 10 iterations

Result (median cycles for code): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010

Test 3: Latency 1->2

Chain cycles: 1

Code:

  add x1, x0, x0
  mov x0, 0
  pacib x0, x1
  mov x0, 1

(requires arm64e binary, with arm64e_preview_abi boot arg)

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
302047263620522205222156614501322169811137220572045330100
302047262720525205252156614498102168111121220522044830100
302047258020524205242154714498582167011121220662044930100
302047267920545205452160714494192165511114221092045830100
302047268520523205232158514483832156711050220862045630100
302047262720530205302157214481772157011054220902046030100
302047268020537205372159614483422156811052220632045130100
302047258320525205252154914482602157511067220522044930100
302047263020526205262156914492702164811105211202027530100
302047002920201202012020214295682020210204202082010130100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
300247002920021200212002214291892002210024200282001130010
300247002920021200212002214294392004710041200282001130010
300247002920021200212002214292082002210024200282001130010
300247002920021200212002214292082002210024200282001130010
300247002920021200212002214292082002210024200282001130010
300247002920021200212002214292082002210024200282001130010
300247002920021200212002214292082002210024200282001130010
300247002920021200212002214292082002210024200282001130010
300247002920021200212002214292082002210024200282001130010
300247002920021200212002214291812002010020200202001130010

Test 4: throughput

Count: 8

Code:

  pacib x0, x8
  pacib x1, x8
  pacib x2, x8
  pacib x3, x8
  pacib x4, x8
  pacib x5, x8
  pacib x6, x8
  pacib x7, x8

(requires arm64e binary, with arm64e_preview_abi boot arg)

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 2.0004

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)ldst uops in schedulers (5b)dispatch uop (78)map int uop (7c)map ldst uop (7d)map simd uop (7e)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80204160030802018020180202013605660080219200002008010180100
80204160030802018020180202013604810080202200002008010180100
80204160030802018020180202013604810080202200002008010180100
80204160030802018020180202013604810080202200002008010180100
80205160060802098020980219013604300080202200002008010180100
80204160030802018020180202013604810080202200002008010180100
80204160030802018020180202013604810080202200002008010180100
80204160030802018020180202013604810080202200002008010180100
80205160064802118021180220013604810080202200002008010180100
80205160064802118021180220013604810080202200002008010180100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 2.0004

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule simd uop (54)schedule ldst uop (55)dispatch int uop (56)dispatch simd uop (57)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
8002416003080021800210080022013598808002020208001180010
8002416003080021800210080020013599318002020208001180010
8002416003080021800210080020013599318002020208001180010
8002416003080021800210080020013599318002020208001180010
8002416003080021800210080020013599318002020208001180010
8002416003080021800210080020013600408004020208002180010
8002416003080021800210080020013599318002020208001180010
8002416003080021800210080020013599318002020208001180010
8002416003080021800210080020013599318002020208001180010
8002416003080021800210080020013600488004020208001180010