Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

TBZ (not taken)

Test 1: uops

Code:

  tbz x0, #1, .+4

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)
1004994258925892810065042168222710091
1004339110011001100030001000100010001
1004261410011001100030001000100010001
1004255110011001100030001000100010001
1004275210011001100030001000100089811
1004408112541254132330091003100310001
1004281510011001100030001000100010001
1004262910011001100030001000100010001
1004268310011001100030001000100010001
1004273710011001100030001000100093781

Test 2: throughput

Count: 8

Code:

  tbz x0, #1, .+4
  tbz x0, #1, .+4
  tbz x0, #1, .+4
  tbz x0, #1, .+4
  tbz x0, #1, .+4
  tbz x0, #1, .+4
  tbz x0, #1, .+4
  tbz x0, #1, .+4
  mov x0, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5836

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80204538058392283922855612418098060380822804931100
80204467878016980169801972403728012480228802121100
80204466868010780107801102403308011080212802121100
80204466868010780107801102403308011080212803871100
80204466878010780107801102403308011080212802121100
80204466868010780107801102403308011080212802121100
80204466868010780107801102403308011080212802121100
80204466868010780107801102403308011080212802121100
80204466868010780107801102403308011080212802121100
80204466868010780107801102403308011080212802121100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5837

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
80024121101119493119493136308250281834278438882304110
8002548764812898128981854241734805788073280263110
8002446791800778007780104240171800578007780044110
8002446719800298002980034240066800228003880285110
8002446719800298002980034240066800228003880038110
8002446709800198001980022240066800228003280038110
8002446709800198001980022240066800228003880038110
8002446709800198001980022240066800228003880038110
8002446709800198001980022240066800228003880038110
8002446709800198001980022240066800228003880038110