Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

AXFLAG

Test 1: uops

Code:

  axflag

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001

Test 2: Latency 1->1

Code:

  axflag

(non-fused SUB/CBNZ loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
102041003010201102011020725458110211102141021410101100
102041003010201102011020825454910211102141020810101100
102041003010201102011020825470910208102081020810101100
102041003010201102011020825470910208102081020810101100
102041003010201102011020825470910208102081020810101100
102041003010201102011020825470910208102081020810101100
102041003010201102011020825470910208102081020810101100
102041003010201102011020825470910208102081020810101100
102041003010201102011020825470910208102081020810101100
102041003010201102011020825470910208102081020810101100

1000 unrolls and 10 iterations

Result (median cycles for code): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
10024100301002110021100292550521002910030100301001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110

Test 3: throughput

Count: 8

Code:

  ands xzr, xzr, xzr
  axflag
  ands xzr, xzr, xzr
  axflag
  ands xzr, xzr, xzr
  axflag
  ands xzr, xzr, xzr
  axflag
  ands xzr, xzr, xzr
  axflag
  ands xzr, xzr, xzr
  axflag
  ands xzr, xzr, xzr
  axflag
  ands xzr, xzr, xzr
  axflag

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.7888

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
1602046334016011016011016011668942216012116022280209160013100
1602046259816011716011716012268295516027016037280210160015100
1602046308116011416011416012069078716011816022080209160012100
1602056316016015116015116015968953616011816022080210160016100
1602046312516011816011816012368877616019716029780210160016100
1602046313316011516011516012068908116011616021680210160012100
1602046313316011516011516012068830316011816021880209160013100
1602046315416011316011316011768908116011616021680209160012100
1602046309116011216011216011868722016012016022080210160011100
1602046296916026816026816027368756916012016022480210160012100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.7829

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule simd uop (54)schedule ldst uop (55)dispatch int uop (56)dispatch simd uop (57)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
160024645641600231600230016003007013361600301600428002016000110
160024632211600111600110016001006719561600101600208002016000110
17446293935175202170105555042169937466818031600651600778003216001610
160024628721600111600110016001006716711600101600208002016000110
160025627151600631600630016006606708441600101600208002016000110
160024626851600111600110016001006716701600101600208002016000110
160024626551600111600110016001006706381600101600208002016000110
160024626871600111600110016001006711911600101600208002016000110
160024626651600111600110016001006699741600101600208002016000110
160024630751600111600110016001006700921600491600598002016000110

Test 4: throughput

Count: 4

Code:

  fcmp s0, s0
  axflag
  axflag
  axflag
  axflag

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5998

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule simd uop (54)dispatch int uop (56)dispatch simd uop (57)int uops in schedulers (59)ldst uops in schedulers (5b)dispatch uop (78)map int uop (7c)map simd uop (7e)map int uop inputs (7f)map simd uop inputs (81)? int output thing (e9)? int retires (ef)
5020424001501044010110003401091000331571240017501184021410004402122000840002100
5020524046501454013410011401481001331509540012501124020910003402122000840002100
5020423999501064010310003401091000331500140017501164021210004402092000640001100
5020424003501054010210003401091000331506340013501124020910003402122000840003100
5020423993501034010110002401091000331561940017501164021210004402122000840001100
5020424000501044010110003401091000331530640017501164021210004402122000840002100
5020423986501054010210003401091000331486240012501124020910003402162000840005100
5020423999501054010210003401121000431477340012501124020910003402162000840005100
5020423992501044010110003401091000331527140017501164021210004402122000840001100
5020424007501044010110003401091000331543440012501124020910003402122000840002100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5998

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule simd uop (54)dispatch int uop (56)dispatch simd uop (57)int uops in schedulers (59)ldst uops in schedulers (5b)dispatch uop (78)map int uop (7c)map simd uop (7e)map int uop inputs (7f)map simd uop inputs (81)? int output thing (e9)? int retires (ef)
500242409850018400151000340020100033161294002250032400371000540020200004000110
500242401950011400111000040010100003165234000050010400201000040020200004000110
500242400350011400111000040010100003164564000050010400201000040020200004000110
500242399050011400111000040010100003155574000050010400201000040020200004000110
500242395350011400111000040010100003162694000050010400201000040020200004000110
500242402750011400111000040010100003162474000050010400201000040020200004000110
500242399950011400111000040010100003166774000050010400201000040020200004000110
500242395750011400111000040010100003157514000050010400201000040020200004000110
500242395950011400111000040010100003161294000050010400201000040020200004000110
500242397350011400111000040010100003166854000050010400201000040020200004000110

Test 5: throughput

Count: 7

Code:

  ands xzr, xzr, xzr
  axflag
  axflag
  axflag
  axflag
  axflag
  axflag
  axflag

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5567

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
802043905780104801048010855029980120802227021380007100
802053894480134801348014855000580117802187021380008100
802053898180139801398015254882680108802087021480008100
802043896480106801068011655050380116802167021080005100
802043898980108801088011955000080116802167021480004100
802043897380103801038011155199980116802167021480004100
802043898080103801038011155018380116802167021480004100
802043900980108801088011655042080116802167021080003100
802043891880106801068011654846280114802167021480004100
802043897080104801048011454996880111802127021480007100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5557

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80024391348002780027800365481108002080020700208001110
80024388858002180021800205475448002080020700208001110
80024388788002180021800205466328002080020700208001110
80024388868002180021800205440948002080020700208001110
80024388868002180021800205464328002080020700878008810
80024388718002180021800205466918002080020700208001110
80024388858002180021800205468228002080020700558005110
80024388718002180021800205469388002080020700208001110
80024389018002180021800205462288002080020700208001110
80024389208002180021800205466918002080020700208001110