Exercise 1 — AI model metadata with LAMBDA

Exercise 1 lives on the AIMODEL sheet of exam_part1.xlsx. It is a table of large-scale AI models with columns for parameter count, FLOP count, training tokens, release date, organisation, domain, and country. The seven subparts walk you up the LAMBDA ladder in a natural order: first a tiny two-argument function that computes a ratio, then a one-argument function that filters and averages, then the same function used inside a dynamic array, then rows-where-a-column-is-max, a trickier clean-missing-values version, a two-condition filter, and finally an interactive version driven by dropdown lists. Every subpart pays off the one before it, so if you're stuck on 1d it is worth making sure 1b really works first. Partial answers earn partial credit here — there is almost always a simpler path that gets you most of the way, and the subpart write-ups below flag those.

The dataset you're working with

AIMODEL is an Excel table; a block of rows with a banner row of column names, marked so that Excel treats the columns as named (AIMODEL[FLOP], AIMODEL[Domain], and so on) instead of by cell address. Each row is one AI model. The columns you'll care about most are Model (name), Domain (Language, Multimodal, Video, Biology, etc.), Date (release date, stored as a real Excel date), Organization (the lab that made it), Parameters (how many internal knobs it has), FLOP (floating point operations used to train it; see the README for the definition), Tokens (how much training data it saw — text for language models, frames for video, etc.), and Country.

Two quirks will bite you if you don't notice them early. First, FLOP and Tokens are missing for some rows — OpenAI and Google rarely publish these, and Excel doesn't always show missing values as blanks. When you divide FLOP by Tokens and one of them is blank, Excel silently treats the blank as 0, so 0 / 0 can become a #DIV/0! error and FLOP / 0 becomes #DIV/0! as well. Sometimes blanks get coerced to 0 and you end up with a legitimate-looking 0 that isn't really zero — it's missing. This breaks the obvious "just find the smallest ratio" approach in 1e, so the README flags it and the hint for 1e nudges you toward ISNUMBER. Second, Date is a real date value, not a year number. YEAR(AIMODEL[Date]) is how you extract the year — you will need this in Exercise 2.

A note on row order. Sample spills throughout this report match the solutions workbook. Yours may look different. Click for why.

Excel Labs, the tool where your LAMBDAs live

Excel Labs is an add-in from Microsoft Research. It opens a side panel called the Advanced Formula Environment where you can write, save, and reuse LAMBDA functions under a chosen name. Once you save a LAMBDA there as MEAN_FLOP, any cell in the workbook can call it as =MEAN_FLOP("Language") just like a built-in function such as SUM. The lecture that introduces Excel Labs is Lecture 4. Open 04lecturefilled.xlsx and look at sheets "6 Excel labs" and "7 LAMBDA"; the worked examples include TWOSUM, THREESUM, POSITIVESUM, COUNT_UNIQUE, and DOTPRODUCT. Every exercise file from Lecture 4 onward keeps its own worked solutions as LAMBDAs named Solutions.* inside Excel Labs; you can open those to study the exact same patterns used in the exam.

If Excel Labs isn't installed on your machine, the fallback is Name Manager (Formulas → Name Manager → New), which is built into Excel. You paste the same LAMBDA(...) expression there, give it a name, and it behaves identically from a cell. The two are interchangeable for these exam questions.

One reading convention throughout this report: when you see

Excel

MEAN_FLOP = LAMBDA(domain, ...)

the NAME = part is for your eyes — you do not type that into a cell. In Excel, you create the name MEAN_FLOP in Excel Labs or Name Manager, and its value is the right-hand side starting from LAMBDA(...). Then you call it from a cell as =MEAN_FLOP("Language").

1a — `COMPUTE_INTENSITY` and the FLOP per token column

The exam asks you to build a function COMPUTE_INTENSITY(flop, tokens) that computes flop / tokens, rounded to two decimals, and then add a new column to the AIMODEL table called FLOP per token filled by this function.

At heart this is a two-input LAMBDA that returns one number. The only thing that lifts it above a literal division is the missing-value handling: some rows have no Tokens, so the division blows up, and you want #N/A in those cells rather than #DIV/0! or a nonsensical number.

The official version:

Excel

COMPUTE_INTENSITY = LAMBDA(flop, tokens,
    IFERROR(
        ROUND(flop / tokens, 2),
        NA()
    )
)

IFERROR(value, alt) — if value evaluates to any error, return alt instead; otherwise return value. Here alt is NA(), an Excel function that produces the #N/A error on purpose; NA() takes no arguments and is how you mark "no answer available".

ROUND(number, digits) rounds number to digits decimal places. ROUND(1.2345, 2) is 1.23.

Save this LAMBDA in Excel Labs as COMPUTE_INTENSITY. Then click inside the AIMODEL table, go to the empty column just to the right of Country, type the header FLOP per token in row 1, and in the first data cell of that column enter =COMPUTE_INTENSITY([@FLOP], [@Tokens]). The [@FLOP] syntax means "the FLOP cell on the current row" — it's how Excel tables refer to same-row values. Because you typed it inside a table, Excel auto-fills the column for every row, and the new column becomes part of the table as AIMODEL[FLOP per token].

One gotcha the solution workbook explicitly flags: when FLOP is blank, Excel coerces it to 0, so COMPUTE_INTENSITY returns 0 rather than #N/A. The solution author manually replaced those zeros with #N/A. A cleaner fix is to test the blanks inside the function, for example:

Excel

COMPUTE_INTENSITY = LAMBDA(flop, tokens,
    IF(OR(flop = 0, tokens = 0, ISBLANK(flop), ISBLANK(tokens)),
       NA(),
       ROUND(flop / tokens, 2)))

Both should give full marks; the examiner note says the hand-fix is acceptable.

Pause.If you dropped COMPUTE_INTENSITY entirely and just wrote =[@FLOP]/[@Tokens] in the new column, would you still get partial credit? (Yes — it loses the rounding and the error handling, but the column itself is there. Get something in before polishing.)

1b — `MEAN_FLOP`, filter then average

For 1b you need a function MEAN_FLOP(domain) that returns the mean FLOP across all AIMODEL rows whose Domain equals the argument.

The core idea: pull the FLOP column, pull the Domain column, keep only the FLOP values where the domain matches, and average what's left. Two steps — filter, then average.

The official version uses LET to hold the two columns in named local variables, then wraps AVERAGE around a FILTER:

Excel

MEAN_FLOP = LAMBDA(domain,
    LET(
        flops,   AIMODEL[FLOP],
        domains, AIMODEL[Domain],
        AVERAGE(FILTER(flops, domains = domain))
    )
)

A few functions to know. LET(name1, value1, name2, value2, ..., final_expression) — LET binds local names to values, and the last argument is the expression that uses those names. Here in MEAN_FLOP, LET is just for readability — flops and domains are each used once. In more complex formulas LET also lets you avoid retyping the same range or expression repeatedly (and lets Excel evaluate it once instead of every time). Inside this LAMBDA, flops means AIMODEL[FLOP] everywhere below, and domains means AIMODEL[Domain].

FILTER(array, keep_condition); the heart of Part I. It takes an array (one or more columns) and a keep_condition (a boolean array the same height as array) and returns only the rows where the condition is TRUE. Example: FILTER({10; 20; 30}, {TRUE; FALSE; TRUE}) returns {10; 30}. Here the condition domains = domain compares every entry of AIMODEL[Domain] with the supplied domain argument — you get a column of TRUE/FALSE the same height as the table.

AVERAGE(range) is the built-in mean. Fed a filtered array, it computes the mean of whatever survived.

Save this as MEAN_FLOP in Excel Labs. Then =MEAN_FLOP("Language") in any cell returns the single number you want. The exam statement adds "include the model name in the table"; the solution puts the result beside a short list; you can just show MEAN_FLOP("Language") next to the label "Language" and you're done.

Try it first.How would you compute the average FLOP only for Language models without writing a LAMBDA? Sketch the steps in plain English.

1c — `DOMAINFLOPS`, one row per domain

1c asks for a dynamic array: first column is every unique domain, second column is MEAN_FLOP applied to that domain. Call the named result DOMAINFLOPS.

A dynamic array is a formula that returns more than one cell — you enter it once, and Excel "spills" the output into the surrounding block. The spill block shows a thin blue or grey border; if any of those cells are already occupied, you get a #SPILL! error until you clear them. You can't manually edit a spilled cell without breaking the formula.

The official version:

Excel

DOMAINFLOPS = HSTACK(
    UNIQUE(AIMODEL[Domain]),
    MAP(UNIQUE(AIMODEL[Domain]),
        LAMBDA(d, MEAN_FLOP(d)))
)

Three new functions:

UNIQUE(range) returns the distinct values of range as a dynamic array. UNIQUE({"A";"B";"A";"C"}) returns {"A";"B";"C"}. In this workbook it gives you the column of nine distinct domains in the order they first appear (Robotics, Language, Multimodal, Biology, Speech, …).

MAP(array, LAMBDA) applies the LAMBDA to every element of array and returns an array of the results. Toy example: MAP({1;2;3}, LAMBDA(x, x*x)) returns {1;4;9}. Here, MAP(UNIQUE(AIMODEL[Domain]), LAMBDA(d, MEAN_FLOP(d))) walks every unique domain and calls MEAN_FLOP on it, yielding a one-column array of mean FLOP values in the same order as the domains.

HSTACK(a, b, ...) stacks arrays horizontally; its output has as many rows as the tallest input and columns equal to the sum of input columns. HSTACK({"A";"B"}, {1;2}) returns the 2x2 block {"A",1; "B",2}. Combined with UNIQUE on the left and MAP on the right, you get a two-column block: domain on the left, mean FLOP on the right.

Because DOMAINFLOPS is stored in Excel Labs, you don't put it in a cell — you refer to it from one. Typing =DOMAINFLOPS in an empty cell spills the nine-by-two block of results. The order is whatever order each domain first appears in the AIMODEL table — UNIQUE keeps first-seen order, not alphabetical. In the starter workbook the spill looks like:

Language            2.42E+24
Multimodal          3.47E+25
Video               1.97E+24
Biology             1.01E+24
Robotics            6.16E+24
Audio               1.40E+23
Image generation    3.96E+23
Vision              1.93E+23
Speech              1.90E+23

Partial-credit alternative: if MAP feels unfamiliar, you can cheat by listing the nine domains by hand in one column and calling MEAN_FLOP on each one in the next column. You lose the "dynamic" in dynamic array — new domains added later won't show up but the numbers are right and the pattern is visible.

A trap to avoid: calling UNIQUE(AIMODEL[Domain]) twice in HSTACK (once directly, once inside MAP) is fine and is what the official solution does. Some students try to stash it in a LET local, which also works, but make sure the local name isn't UNIQUE, since that shadows the built-in.

From class. MAP and HSTACK are introduced in 05lecturefilled.xlsx, sheets "HSTACK and VSTACK" and "The MAP function". The HSTACK(UNIQUE(col), MAP(UNIQUE(col), LAMBDA(x, f(x)))) shape is the standard "summarise a column by category" idiom from Lecture 5 — if the MAP line confused you, open that sheet and stare at the worked examples.

1d — `MAX_ROW_BY` and `MIN_ROW_BY`, returning whole rows

1d steps up: instead of a single number, you need the whole AIMODEL row (every column) where some column hits its maximum or minimum. Do this for Parameters, Tokens, and FLOP. The solution workbook defines two LAMBDAs that handle this:

Excel

MAX_ROW_BY = LAMBDA(col,
    FILTER(AIMODEL[], col = MAX(col))
)

MIN_ROW_BY = LAMBDA(col,
    FILTER(AIMODEL[], col = MIN(col))
)

AIMODEL[] refers to every data row of AIMODEL (every column, every row) as one 2D array. MAX(col) and MIN(col) are the standard Excel aggregates over the column argument. The condition col = MAX(col) evaluates to TRUE only on the row where col attains its max. FILTER(AIMODEL[], col = MAX(col)) returns the matching row (or rows, if several tie).

Save both LAMBDAs. Then to get the largest-FLOP model's full row, you enter =MAX_ROW_BY(AIMODEL[FLOP]) and Excel spills the one-row block across nine columns (the eight base AIMODEL columns plus the FLOP per token column you added in 1a). Similarly =MAX_ROW_BY(AIMODEL[Parameters]), =MIN_ROW_BY(AIMODEL[Tokens]), and so on — six calls total for the six rows the exam wants.

In the solution workbook the output lives on the Exercise 1 sheet at rows 62–69 and reads things like

Tokens    Smallest    PaLI                      Multimodal    14/09/2022    Google    1.69E10    1.69E23    1.6E9    USA    1.06E14
Tokens    Largest     Cosmos-1.0-...            Robotics      07/01/2025    NVIDIA    1.4E10     6.16E24    9E15     USA    683942400
FLOP      Largest     Grok-3                     Multimodal    17/02/2025    xAI       0          4.64E26    0        USA    #N/A
...

Note one detail about MIN over a column with blanks: Excel's MIN ignores blank cells entirely, so MIN(AIMODEL[Tokens]) returns the smallest real value (around 1.6E+09 for PaLI), not 0. The row that comes back from MIN_ROW_BY(AIMODEL[Tokens]) is therefore PaLI. You don't need a FILTER(col, col > 0) guard — MIN already handles blanks correctly. The thing to watch out for is the display of the resulting row: cells that were blank in the source render as 0 in the spilled output (e.g. the Parameters smallest row shows FLOP = 0 and Tokens = 0 because those entries were blank for that model). That's a rendering quirk of the spill, not a bug in your formula.

A cheaper way to get the marks: the solution workbook explicitly says "No points are deducted for using XLOOKUP and returning only one model." That means =XLOOKUP(MAX(AIMODEL[FLOP]), AIMODEL[FLOP], AIMODEL[Model]) is a fine simpler path that returns just the model name (not the whole row). If the FILTER-returns-a-row idea is shaky, this will get the marks.

Pause.In plain English: if three models tied for the largest FLOP value, what would MAX_ROW_BY(AIMODEL[FLOP]) return, and what would the XLOOKUP version return?

1e — smallest FLOP per token, guarding against missing values

1e asks for the model with the smallest FLOP / Tokens, returning the model name and the computed value. The hint says: "Some models may have missing FLOP or token values. You may need to check for valid numbers using ISNUMBER." That hint matters; the naive approach finds one of the spurious zeros you created in 1a.

The official version (stored as OUT_MIN_FLOP_PER_TOKEN — it's a LET, not a LAMBDA, because it doesn't need arguments):

Excel

OUT_MIN_FLOP_PER_TOKEN = LET(
    flop,   AIMODEL[FLOP],
    tokens, AIMODEL[Tokens],
    names,  AIMODEL[Model],
    ratio,  IF(flop / tokens, flop / tokens, NA()),
    minval, MIN(FILTER(ratio, ISNUMBER(ratio))),
    model,  XLOOKUP(minval, ratio, names),
    HSTACK(model, minval)
)

Walk the LET locals top to bottom. flop, tokens, names are just convenient aliases for the three columns. ratio is the whole column of flop / tokens values, with a trick: IF(flop / tokens, flop / tokens, NA()) — if the ratio is a non-zero number, keep it; if it's exactly 0 (the spurious value Excel produces when flop is blank but tokens isn't), the IF treats 0 as falsy and returns NA() instead. The #DIV/0! errors that arise when tokens is 0 (or both are blank) propagate straight through the IF rather than being stamped, but that's fine; the next step filters them out by ISNUMBER anyway. The net effect is that every row in ratio is either a real number or some kind of "no answer" marker.

minval is the smallest non-missing ratio. FILTER(ratio, ISNUMBER(ratio)) drops any non-numeric entries; both the #N/A markers from the IF trick and any #DIV/0! errors that propagated through, and MIN over what survives is the true minimum. ISNUMBER(x) returns TRUE if x is a number (any kind, including zero) and FALSE if x is text, blank, a boolean, or any error (#N/A, #DIV/0!, #VALUE!, …).

XLOOKUP(lookup_value, lookup_array, return_array) is the modern replacement for VLOOKUP. It finds lookup_value inside lookup_array and returns the corresponding entry from return_array. Here, find minval inside ratio and return the matching model name.

The final line HSTACK(model, minval) places the name and the value side by side and spills to a one-row, two-column result.

Store this in Excel Labs as a named expression OUT_MIN_FLOP_PER_TOKEN, and type =OUT_MIN_FLOP_PER_TOKEN into a cell. You should see something like Cosmos-1.0-Diffusion-14B Video2World | 683942400 (that particular model has a huge token count and relatively modest FLOP).

One thing that trips people up: skipping the IF(flop/tokens, flop/tokens, NA()) step. Without it, MIN(ratio) gives a useless answer in one of two ways. If any row has 0 / 0 (both blank) or real / 0 (Tokens blank), the ratio is #DIV/0!, and MIN propagates that error rather than returning a real number; the whole formula fails. If you somehow avoid the error rows (say by inspecting the data and trusting that won't happen), the rows where FLOP is blank but Tokens is real still give a ratio of 0, and MIN happily returns 0 as the smallest. Either way the genuine minimum is never surfaced and XLOOKUP either errors or returns the wrong model. The IF trick is what protects you from both cases.

Simpler version: if you manually stamped #N/A in the FLOP per token column in 1a (which the solution author did), you don't need the IF(..., NA()) trick inside ratio — you can reference AIMODEL[FLOP per token] directly and still use MIN(FILTER(col, ISNUMBER(col))). That's cleaner and earns the same marks.

Lecture parallel. The missing-value idiom comes from Lecture 4, specifically Solutions.IFBLANK inside 04exercises.xlsx (Excel Labs): LAMBDA(value, value_if_blank, IF(ISBLANK(value), value_if_blank, value)). Same shape — test a value, swap it for a fallback, push the number through. The XLOOKUP call is Lecture 3 — 03lecturefilled.xlsx sheet "4 The LOOKUP functions".

1f — `OVERENGINEERED_MODELS`, two-condition filter

1f defines an "overengineered" model as one that uses more than 2e24 FLOP and has a FLOP per token greater than 1e12 (the exam statement had a typo that said 0.1; the solution accepts either threshold).

The official version:

Excel

OVERENGINEERED_MODELS = LET(
    cleaned, FILTER(AIMODEL[],
                    ISNUMBER(AIMODEL[FLOP]) *
                    ISNUMBER(AIMODEL[Tokens]) *
                    (AIMODEL[Tokens] <> 0)),
    flop,   INDEX(cleaned, , XMATCH("FLOP",   AIMODEL[#Headers])),
    tokens, INDEX(cleaned, , XMATCH("Tokens", AIMODEL[#Headers])),
    model,  INDEX(cleaned, , XMATCH("Model",  AIMODEL[#Headers])),
    ratio,  flop / tokens,
    mask,   (flop > 2E+24) * (ratio > 1000000000000),
    FILTER(model, mask)
)

This is denser than anything so far, so let's walk it carefully.

cleaned is the table with all the bad rows stripped out. FILTER(AIMODEL[], boolean_mask) keeps only rows where the mask is TRUE. The mask here is a product of three boolean arrays: the FLOP is a number, the Tokens is a number, and the Tokens aren't zero. This is the first time in the report that boolean multiplication acts as AND, and it's the pattern you'll see again in 1g and in Exercise 3. The rule: TRUE * TRUE = 1, TRUE * FALSE = 0, FALSE * FALSE = 0. Multiplying boolean arrays element-by-element gives you the AND of the conditions as a 1/0 array, which FILTER treats as TRUE/FALSE.

INDEX(array, row_number, column_number) pulls out a slice. With row_number left blank (the ,, in the call), it returns the whole column. XMATCH("FLOP", AIMODEL[#Headers]) finds the position of the header "FLOP" in the AIMODEL headers row which is 6 (Model=1, Domain=2, Date=3, Organization=4, Parameters=5, FLOP=6, Tokens=7, Country=8). So INDEX(cleaned, , 6) is the sixth column of the cleaned table, which is the FLOP column restricted to cleaned rows. Same logic for tokens and model.

ratio is flop / tokens across those cleaned rows — no missing-value dance needed because we stripped them at the top.

mask combines the two exam conditions: FLOP above 2e24 and FLOP per token above 1e12. Boolean multiplication again.

FILTER(model, mask) is the final answer: the list of model names that pass both conditions.

Type =OVERENGINEERED_MODELS into a cell and it spills a single column of thirteen names, starting with Llama 4 Behemoth (preview), Llama 3.1-405B, GPT-4, GLM-4 (0116), Nemotron-4 340B, Doubao-pro, and continuing through models like PaLM 2, Falcon-180B, and PaLM (540B).

Easier path: if INDEX(..., XMATCH(...)) looks terrifying, replace it with a simpler version that wraps the division in IFERROR so blank-token rows don't blow up:

Excel

OVERENGINEERED_MODELS_SIMPLE = FILTER(
    AIMODEL[Model],
    (AIMODEL[FLOP] > 2E+24) *
    (IFERROR(AIMODEL[FLOP] / AIMODEL[Tokens], 0) > 1E12)
)

IFERROR(AIMODEL[FLOP] / AIMODEL[Tokens], 0) swaps every per-row #DIV/0! for a 0, which then doesn't pass the > 1E12 threshold. The other condition, (AIMODEL[FLOP] > 2E+24), independently knocks out blank-FLOP rows (a blank acts as 0, which fails the > 2E+24 test). Between them the mask has no errors and FILTER spills a clean list of names that matches the official answer on all the real models.

Easy to get wrong: using AND instead of multiplication. AND(AIMODEL[FLOP] > 2E24, AIMODEL[FLOP]/AIMODEL[Tokens] > 1E12) collapses the two boolean arrays into a single boolean — it returns one TRUE or FALSE, not an array. FILTER then sees a single value and can't decide row-by-row. You must multiply arrays to get an array of per-row booleans.

From the lectures. The (a > x) * (b > y) AND idiom comes from Lecture 3's FILTER examples (03lecturefilled.xlsx, sheet "6 UNIQUE and FILTER"). For the XMATCH/INDEX combo, see the same lecture's sheet "5 INDEX and MATCH". For a cleaner worked parallel with boolean multiplication, look at Solutions.LARGEFILTERBY inside 04exercises.xlsx — it's FILTER driven by a boolean vector built the same way.

1g — `FILTERED_MODELS` driven by dropdowns

1g pulls everything together: give the user two dropdown lists — one for Country, one for Domain, and show every AIMODEL row that matches both selections.

The LAMBDA that does the filtering is short:

Excel

FILTERED_MODELS = LAMBDA(d, c,
    FILTER(AIMODEL[],
           (AIMODEL[Domain] = d) *
           (AIMODEL[Country] = c))
)

It takes a domain d and a country c, and filters rows where both match. Same boolean-multiplication-for-AND pattern from 1f, now as a two-condition filter on two different columns.

The construction-of-the-dropdowns work is where this subpart is where this gets harder. The steps:

On the Exercise 1 sheet, pick a row out of the way and compute the unique domains: =TRANSPOSE(UNIQUE(AIMODEL[Domain])). TRANSPOSE flips a column into a row so the dropdown list reads horizontally. Do the same for countries: =TRANSPOSE(UNIQUE(AIMODEL[Country])). The solution workbook puts these in rows 109 and 112.
Click the cell where you want the "Select domain" dropdown. Data tab → Data Validation → Allow: "List" → Source: select the range that holds the unique domains you just spilled. Tick "In-cell dropdown". Do the same for country in an adjacent cell, pointing at the countries row.
Say the domain-dropdown cell is B118 and the country-dropdown cell is C118. Then the one formula you need is =FILTERED_MODELS(B118, C118), placed in a cell below a header row you type out yourself (Model | Domain | Date | Organization | ...). It spills all matching rows.

Change the dropdowns, the spill updates.

Try it first.If the user picks Video and United States of America, describe in plain English what rows should appear. Then open the solution workbook and check row 122 onwards.

Where this came from. Data validation and dropdowns are Lecture 3 (03lecturefilled.xlsx, sheet "1 Data tables"). The dynamic-dropdown mechanics — UNIQUE piped into a data-validation source range — aren't covered explicitly in any lecture file by name; the construction follows from combining the data-validation list source from Lecture 3 with the UNIQUE examples in 03lecturefilled.xlsx sheet "6 UNIQUE and FILTER". For the two-condition FILTER pattern, it's Lecture 3 again; see 03lecturefilled.xlsx sheet "6 UNIQUE and FILTER" and the FILTER-with-multiplication examples that run through the AIMODEL-adjacent Sales dataset.

If any of this still feels shaky

If the LAMBDA syntax itself is the block, go back to 04lecturefilled.xlsx sheet "7 LAMBDA" and read the five worked examples (TWOSUM, THREESUM, POSITIVESUM, COUNT_UNIQUE, DOTPRODUCT). If FILTER feels like magic, 03lecturefilled.xlsx sheet "6 UNIQUE and FILTER" has five pages of small worked examples on the Sales data. If MAP and HSTACK in 1c didn't click, sheet "The MAP function" of 05lecturefilled.xlsx is the direct reference. If the whole "filter AIMODEL by something, then pull the max row" shape seems unfamiliar, open 04exercises.xlsx, launch Excel Labs, and read Solutions.LARGEBY, Solutions.SMALLBY, Solutions.LARGEFILTERBY, Solutions.SMALLFILTERBY in order that is the same exam subpart rewritten for a different dataset, with the official pattern fully worked.

Glossary

Advanced Formula Environment: The side panel that opens when you enable the Excel Labs add-in. It lets you write, save, and edit LAMBDAs by name. The saved LAMBDAs can be called from any cell in the workbook as if they were built-in functions.
AIMODEL: The Excel table on the sheet of the same name in exam_part1.xlsx. Rows are AI models; columns are Model, Domain, Date, Organization, Parameters, FLOP, Tokens, Country.
AVERAGE(range): Returns the arithmetic mean of the numeric values in range. Ignores text and blanks.
Boolean multiplication as AND: TRUE * TRUE = 1, everything else = 0. Multiplying two boolean arrays element-by-element gives a per-row AND. Used whenever FILTER needs to combine multiple conditions.
Dynamic array: A formula result that spills into more than one cell automatically. Enter it in one cell, Excel fills a rectangular block. The block shows a thin border; cells inside it can't be manually written to without breaking the spill.
Excel Labs: A Microsoft Research add-in that provides the Advanced Formula Environment. Install it from Insert → Get Add-ins → Excel Labs.
Excel table: A block of cells formally marked as a table (Insert → Table, or Ctrl+T). Its columns are referenced by name — AIMODEL[FLOP] means the FLOP column.
FILTER(array, keep_condition): Returns the rows (or entries) of array where keep_condition is TRUE. keep_condition must be a boolean array the same height as array. Central to every LAMBDA in this exercise.
HSTACK(a, b, ...): Stacks arrays horizontally into one dynamic array. All inputs must have the same number of rows.
IFERROR(value, alt): Returns value unless it evaluates to any error (#N/A, #DIV/0!, #VALUE!, etc.), in which case it returns alt. Used in COMPUTE_INTENSITY to swap missing-data divisions for NA().
INDEX(array, row, column): Returns the cell at (row, column) of array. Leaving row blank returns the whole column; leaving column blank returns the whole row. Used in OVERENGINEERED_MODELS together with XMATCH to pull a column by header name.
ISNUMBER(x): Returns TRUE if x is a number (including zero), FALSE if it's text, blank, a boolean, or any error. Used in 1e and 1f to filter out the spurious #N/A and error-propagated cells before aggregating.
LAMBDA: A reusable function defined with LAMBDA(arg1, arg2, ..., expression) and saved by name in Excel Labs or Name Manager. Once saved, it behaves like a built-in function — =MY_LAMBDA(x, y) from any cell.
LET(name1, val1, ..., final): Binds local names to values, then evaluates final. Doesn't change the result, only readability, and Excel evaluates each value once, so reusing the same expression multiple times via LET is also a small performance win.
MAP(array, LAMBDA): Applies the LAMBDA to every element of array and returns an array of the results. Same shape and order as the input. Used in DOMAINFLOPS to call MEAN_FLOP on each unique domain.
MAX(range) / MIN(range): Largest / smallest numeric value in range. Both ignore blanks and text; both treat 0 as a real value. MIN ignoring blanks is what makes MIN_ROW_BY(AIMODEL[Tokens]) return PaLI rather than a zero-token row.
NA(): Produces the #N/A error deliberately. No arguments. Used to mark "no answer available" so downstream ISNUMBER checks can strip those rows out.
Name Manager: Excel's built-in tool (Formulas → Name Manager) for managing named ranges and named LAMBDAs. Acts as a fallback to Excel Labs; the same LAMBDA(...) expression pasted into Name Manager works identically when called from a cell.
Named expression: A name assigned to any formula; a range reference, a constant, or a LAMBDA. Created in Excel Labs or Name Manager. DOMAINFLOPS and OUT_MIN_FLOP_PER_TOKEN are named expressions; you call them by writing =DOMAINFLOPS or =OUT_MIN_FLOP_PER_TOKEN in a cell.
ROUND(number, digits): Rounds number to digits decimal places. ROUND(1.2345, 2) is 1.23. Used in COMPUTE_INTENSITY to keep the FLOP-per-token column at two decimals.
Spill range: The block of cells a dynamic-array formula fills. The top-left holds the formula; the rest is filled by Excel and can't be manually edited without breaking the spill. The whole block is referenced as C5# if the formula is in C5.
UNIQUE(range): Returns the distinct values of range as a dynamic array, in first-seen order. UNIQUE(AIMODEL[Domain]) gives the column of nine domains used in DOMAINFLOPS.
XLOOKUP(find, in_array, return_array): Modern replacement for VLOOKUP. Finds find inside in_array and returns the corresponding entry from return_array. Returns the first match if there are ties. Used in OUT_MIN_FLOP_PER_TOKEN to map the smallest ratio back to its model name.
XMATCH(find, array): Returns the 1-based position of find inside array. Used in OVERENGINEERED_MODELS with INDEX to look up columns by header name rather than by hardcoded position.

The dataset you're working with

Excel Labs, the tool where your LAMBDAs live

1a — COMPUTE_INTENSITY and the FLOP per token column

1b — MEAN_FLOP, filter then average

1c — DOMAINFLOPS, one row per domain

1d — MAX_ROW_BY and MIN_ROW_BY, returning whole rows

1e — smallest FLOP per token, guarding against missing values

1f — OVERENGINEERED_MODELS, two-condition filter

1g — FILTERED_MODELS driven by dropdowns

If any of this still feels shaky

Glossary

1a — `COMPUTE_INTENSITY` and the FLOP per token column

1b — `MEAN_FLOP`, filter then average

1c — `DOMAINFLOPS`, one row per domain

1d — `MAX_ROW_BY` and `MIN_ROW_BY`, returning whole rows

1f — `OVERENGINEERED_MODELS`, two-condition filter

1g — `FILTERED_MODELS` driven by dropdowns