Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

PACGA

Test 1: uops

Code:

  pacga x0, x0, x1
  mov x0, 1
  mov x1, 2

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)? int output thing (e9)? int retires (ef)
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000

Test 2: Latency 1->2

Code:

  pacga x0, x0, x1
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
1020460029101011010110100530025101002002001000110100
1020460029101011010110100530025101002002001000110100
1020460029101011010110100530025101002002001000110100
1020460029101011010110100530025101002002001000410100
1020460029101011010110100530025101002002001000110100
1020460029101011010110100530025101002002001000110100
1020460029101011010110100530025101002002001000110100
1020460029101011010110100530025101002002001000110100
1020460029101011010110100530025101002002001000110100
1020460029101011010110100530025101002002001000110100

1000 unrolls and 10 iterations

Result (median cycles for code): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010

Test 3: Latency 1->3

Code:

  pacga x0, x1, x0
  mov x0, 1
  mov x1, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? simd retires (ee)? int retires (ef)
1020460029101011010110100530025101002002000100010010100
1020460029101011010110100530025101002002000100010010100
1020460029101011010110100530025101002002000100010010100
1020460029101011010110100530025101002002000100010010100
1020460029101011010110100530025101002002000100010010100
1020460029101011010110100530025101002002000100010010100
1020460029101011010110100530025101002002000100040010100
1020460029101011010110100530025101002002000100010010100
1020460029101011010110100530025101002002000100010010100
1020460029101011010110100530025101002002000100010010100

1000 unrolls and 10 iterations

Result (median cycles for code): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)ldst uops in schedulers (5b)dispatch uop (78)map int uop (7c)map ldst uop (7d)map simd uop (7e)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
1002460029100211002110020052978500100202000201001410010
1002460029100211002110020052978500100202000201001110010
1002460029100211002110020052978500100202000201001110010
1002460029100211002110020052978500100202000201001110010
1002460029100211002110020052978500100202000201001110010
1002460029100211002110020052978500100202000201001110010
1002460029100211002110020052978500100202000201001110010
1002460029100211002110020052978500100202000201001110010
1002460029100211002110020052978500100202000201001110010
1002460029100211002110020052978500100202000201001110010

Test 4: throughput

Count: 8

Code:

  pacga x0, x8, x9
  pacga x1, x8, x9
  pacga x2, x8, x9
  pacga x3, x8, x9
  pacga x4, x8, x9
  pacga x5, x8, x9
  pacga x6, x8, x9
  pacga x7, x8, x9
  mov x8, 9
  mov x9, 10

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 2.0004

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
802041600308010180101801011360093801012002008000180100
802041600308010180101801011360135801012002008000180100
802041600308010180101801011360177801012002008000180100
802051600608010980109801181360177801012002008000180100
802041600308010180101801011360177801012002008000180100
802041600308010180101801011360262801182002008000180100
802041600308010180101801011360177801012002008001180100
802041600308010180101801011360177801012002008000180100
802041600308010180101801011360177801012002008000180100
802041600308010180101801011360328801202002008000180100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 2.0004

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80024160030800218002108002213598808002020208001180010
80024160030800218002108002013599318002020208001180010
80024160030800218002108002013599318002020208002080010
80024160030800218002108002013599318002020208001180010
80024160030800218002108002013599318002020208001180010
80024160030800218002108002013599318002020208001180010
80024160030800218002108002013600408004020208002180010
80024160030800218002108002013599318002020208001180010
80025160064800318003108004013599318002020208001180010
80024160030800218002108002013600268003920208001180010