Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

XAFLAG

Test 1: uops

Code:

  xaflag

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001

Test 2: Latency 1->1

Code:

  xaflag

(non-fused SUB/CBNZ loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
102041003010201102011020725458310211102141020810101100
102041003010201102011020825470910208102081020810101100
102041003010201102011020825470910208102081020810101100
102041003010201102011020825470910208102081020810101100
102041003010201102011020825470910208102081020810101100
102041003010201102011020825470910208102081020810101100
102041003010201102011020825470910208102081020810101100
102041003010201102011020825470910208102081020810101100
102041003010201102011020825470910208102081020810101100
102041003010201102011020825470910208102081020810101100

1000 unrolls and 10 iterations

Result (median cycles for code): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
10024100301002110021100292550051002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110

Test 3: throughput

Count: 8

Code:

  ands xzr, xzr, xzr
  xaflag
  ands xzr, xzr, xzr
  xaflag
  ands xzr, xzr, xzr
  xaflag
  ands xzr, xzr, xzr
  xaflag
  ands xzr, xzr, xzr
  xaflag
  ands xzr, xzr, xzr
  xaflag
  ands xzr, xzr, xzr
  xaflag
  ands xzr, xzr, xzr
  xaflag

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.7888

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
1602046331916011416011416012068816216011816021880211160012100
1602046311816011916011916012469037116011816021880210160015100
1602046311116011116011116011568928516011816022080210160012100
1602046315516011516011516012068773516011816022080208160010100
1602056312516014816014816015867337616015716025980209160012100
1602046309816011216011216011869193516011816022080210160014100
1602046312716011516011516012069001216011816022080210160012100
1602046310316011116011116011568982516011816021880210160012100
1602046321416011516011516012068895916012016022080211160016100
1602046308516011416011416011868679716011816022080208160011100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.7825

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? simd retires (ee)? int retires (ef)
160024644441600231600231600306979321600301600428002001600010010
160024631431600111600111600106717161600101600208002001600010010
160024625471600111600111600106720431600101600208002001600010010
160024625511600111600111600106720431600101600208002001600010010
160024625881600111600111600106719071600101600208002001600010010
160024624601600111600111600106719301600101600208002001600010010
160025626251600591600591600646719891600101600208002001600010010
160024625341600111600111600106719741600101600208002001600010010
160024626001600111600111600106731071600101600208002001600010010
160024625331600111600111600106715041600101600208002001600010010

Test 4: throughput

Count: 4

Code:

  fcmp s0, s0
  xaflag
  xaflag
  xaflag
  xaflag

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5997

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule simd uop (54)dispatch int uop (56)dispatch simd uop (57)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)ldst uops in schedulers (5b)dispatch uop (78)map int uop (7c)map ldst uop (7d)map simd uop (7e)map int uop inputs (7f)map simd uop inputs (81)? int output thing (e9)? int retires (ef)
5020423999501054010210003401091000303148540400175011840214010004402142000840003100
5020423981501064010310003401141000403155480400175011640212010004402092000640001100
5020424001501054010210003401091000303154340400125011240209010003402092000640001100
5020424013501044010110003401121000403153540400175011640212010004402122000840002100
5020423991501054010210003401121000403155110400185011640212010004402092000640002100
5020423982501094010510004401161000403156840400445015240241010011402092000640002100
5020424004501064010310003401121000403153880400185011640212010004402092000640003100
5020423990501044010110003401121000403148650400125011240209010003402092000640001100
5020423983501064010310003401121000403149360400165012040216010004402092000640003100
5020423993501034010110002401091000303157040400125011240209010003402092000640001100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5997

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule simd uop (54)dispatch int uop (56)dispatch simd uop (57)int uops in schedulers (59)ldst uops in schedulers (5b)dispatch uop (78)map int uop (7c)map simd uop (7e)map int uop inputs (7f)map simd uop inputs (81)? int output thing (e9)? int retires (ef)
500242405850017400141000340022100043164904001750028400341000440020200004000110
500242400150011400111000040010100003159664000050010400201000040020200004000110
500242399150011400111000040010100003150654000050010400201000040020200004000110
500242397550011400111000040010100003159254000050010400201000040020200004000110
500242400250011400111000040010100003170334002850048400511000740020200004000110
500242398150011400111000040010100003156284000050010400201000040020200004000110
500242401750011400111000040010100003166814000050010400201000040020200004000110
500242401550011400111000040010100003154574000050010400201000040020200004000110
500252398350053400421001140052100113157474000050010400201000040020200004000110
500242398250011400111000040010100003161414000050010400201000040020200004000110

Test 5: throughput

Count: 7

Code:

  ands xzr, xzr, xzr
  xaflag
  xaflag
  xaflag
  xaflag
  xaflag
  xaflag
  xaflag

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5567

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
802043903680103801038011454961880114802147020980005100
802043893580110801108011554918680114802167020780004100
802043894180104801048011454624480115802157021480007100
802043895980104801048011155064280114802167021480007100
802053900780136801368015155128880111802127021080002100
802043898680104801048011454932480108802087020780003100
802043898080103801038011155048180111802127021480004100
802043890780104801048010854887280114802167021080005100
802043893980102801028011154884480114802167021480004100
802043896980103801038011454888580114802167020780003100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5555

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80024391388002980029800395501048004680046700208001110
80024389338002180021800205466468002080020700208001110
80024388448002180021800205459148002080020700208001110
80024389808002180021800205469118002080020700208001110
80024388608002180021800205461828002080020700208001110
80024389388002180021800205463558007780078700208001110
80024388558002180021800205464168002080020700208001110
80024388478002180021800205451238002080020700208001110
80024389048002180021800205472148002080020700208001110
80024388948002180021800205466918002080020700208001110