Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

CFINV

Test 1: uops

Code:

  cfinv

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001
10041030100110011000251921000100010001001

Test 2: Latency 1->1

Code:

  cfinv

(non-fused SUB/CBNZ loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
102041003010201102011020825441110214102141021410101100
102041003010201102011020825470910208102081020810101100
102041007810222102221024725470910208102081020810101100
102041003010201102011020825470910208102081020810101100
102041003010201102011020825519110247102471024710122100
102041003010201102011020825470910208102081020810101100
102041003010201102011020825456810211102141021410101100
102041003010201102011021125470910208102081020810101100
102041003010201102011020825470910208102081020810101100
102041003010201102011020825470910208102081020810101100

1000 unrolls and 10 iterations

Result (median cycles for code): 1.0030

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
10024100301002110021100292550521002910030100321001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110
10024100301002110021100202551931002010020100201001110

Test 3: throughput

Count: 8

Code:

  ands xzr, xzr, xzr
  cfinv
  ands xzr, xzr, xzr
  cfinv
  ands xzr, xzr, xzr
  cfinv
  ands xzr, xzr, xzr
  cfinv
  ands xzr, xzr, xzr
  cfinv
  ands xzr, xzr, xzr
  cfinv
  ands xzr, xzr, xzr
  cfinv
  ands xzr, xzr, xzr
  cfinv

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.7890

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
1602046329316011216011216011868548616011716021880209160015100
1602046314316011316011316011868965416012116022280210160015100
1602056318316015316015316015967098216011916022180209160014100
1602046313416011516011516012068965216011816022080210160012100
1602046314416011516011516012068889316012016022080210160013100
1602046313416011516011516012068673716012416022480210160012100
1602046312216011216011216011868791916011816022080209160017100
1602046311916011116011116011567204616012516022780210160012100
1602046309616011216011216011868737616011816022080209160012100
1602046311116011216011216011868748516011816022080210160012100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.7825

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
160024644131600261600261600336856791600101600208002016000110
160024630981600111600111600106714321600101600208002016000110
160024625451600111600111600106719571600101600208002016000110
160024625611600111600111600106716981600101600208002016000110
160024625731600111600111600106710011600101600208002016000110
160025625861600641600641600686712501600101600208002016000110
160024626591600111600111600106709291600101600208002016000110
160024626261600111600111600106696161600101600208002016000110
160024626571600111600111600106707301600101600208002016000110
160024626531600111600111600106706721600101600208002016000110

Test 4: throughput

Count: 4

Code:

  fcmp s0, s0
  cfinv
  cfinv
  cfinv
  cfinv

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5998

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule simd uop (54)dispatch int uop (56)dispatch simd uop (57)int uops in schedulers (59)ldst uops in schedulers (5b)dispatch uop (78)map int uop (7c)map simd uop (7e)map int uop inputs (7f)map simd uop inputs (81)? int output thing (e9)? int retires (ef)
5020424019501094010610003401171000531558040022501224021710005402112000640003100
5020423994501064010310003401141000431589640017501164021210004402122000840002100
5020424013501044010110003401121000431526940018501164021210004402092000640001100
5020423979501064010310003401091000331528940012501124020910003402122000840001100
5020423993501034010110002401091000331556340012501124020910003402092000640003100
5020423987501044010110003401121000431546540017501164021210004402122000840002100
5020423994501034010110002401121000431486240012501124020910003402122000840001100
5020423984501034010110002401091000331506840017501164021210004402092000640001100
5020423993501034010110002401091000331556340012501124020910003402122000840002100
5020423992501054010210003401121000431528340012501124020910003402092000640003100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5998

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule simd uop (54)dispatch int uop (56)dispatch simd uop (57)int uops in schedulers (59)ldst uops in schedulers (5b)dispatch uop (78)map int uop (7c)map simd uop (7e)map int uop inputs (7f)map simd uop inputs (81)? int output thing (e9)? int retires (ef)
500242419550013400111000240019100033163494000050010400201000040020200004000110
500242397350011400111000040010100003164434000050010400201000040020200004000110
500242399950011400111000040010100003164634000050010400201000040020200004000110
500242399350011400111000040010100003155474000050010400201000040059200224002510
500242395450011400111000040010100003150674000050010400201000040020200004000110
500242397450011400111000040010100003170024000050010400201000040020200004000110
500242397950011400111000040010100003162274000050010400201000040020200004000110
500242397150011400111000040010100003154554000050010400201000040020200004000110
500242399350011400111000040010100003156144000050010400201000040020200004000110
500242397350011400111000040010100003163174000050010400201000040020200004000110

Test 5: throughput

Count: 7

Code:

  ands xzr, xzr, xzr
  cfinv
  cfinv
  cfinv
  cfinv
  cfinv
  cfinv
  cfinv

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5568

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
802043905380105801058011755068480114802147021980011100
802043892680106801068011554846880108802087020780003100
802043898580104801048011454905580111802127021480007100
802043904280105801058011154763380115802157021480004100
802043894480106801068011654799680114802167021480008100
802043897680103801038011454978580116802167020780006100
802043898580103801038011154673080115802157021480004100
802043891880106801068011654983880114802167021080003100
802043897080104801048011454996880111802127021480007100
802043898580104801048011454949880116802167021080005100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5557

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)ldst uops in schedulers (5b)dispatch uop (78)map int uop (7c)map ldst uop (7d)map simd uop (7e)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
8002439245800318003180041054556700800208002000700208001110
8002438943800218002180020054690700800208002000700208001110
8002438951800218002180020054690400800208002000700208001110
8002438890800218002180020054518700800208002000700208001110
8002438862800218002180020054756700800208002000700208001110
8002438895800218002180020054466300800208002000700208001110
8002438878800218002180020054655900800208002000700208001110
8002538924800618006180083054660300800208002000700208001110
8002438903800218002180020054815100800208002000700208001110
8002438931800218002180020054813800800208002000700208001110