Exercise 1 lives on the AIMODEL sheet of exam_part1.xlsx. It is a table of large-scale AI models with columns for parameter count, FLOP count, training tokens, release date, organisation, domain, and country. The seven subparts walk you up the LAMBDA ladder in a natural order: first a tiny two-argument function that computes a ratio, then a one-argument function that filters and averages, then the same function used inside a dynamic array, then rows-where-a-column-is-max, a trickier clean-missing-values version, a two-condition filter, and finally an interactive version driven by dropdown lists. Every subpart pays off the one before it, so if you're stuck on 1d it is worth making sure 1b really works first. Partial answers earn partial credit here — there is almost always a simpler path that gets you most of the way, and the subpart write-ups below flag those.
The dataset you're working with
AIMODEL is an Excel table; a block of rows with a banner row of column names, marked so that Excel treats the columns as named (AIMODEL[FLOP], AIMODEL[Domain], and so on) instead of by cell address. Each row is one AI model. The columns you'll care about most are Model (name), Domain (Language, Multimodal, Video, Biology, etc.), Date (release date, stored as a real Excel date), Organization (the lab that made it), Parameters (how many internal knobs it has), FLOP (floating point operations used to train it; see the README for the definition), Tokens (how much training data it saw — text for language models, frames for video, etc.), and Country.
Two quirks will bite you if you don't notice them early. First, FLOP and Tokens are missing for some rows — OpenAI and Google rarely publish these, and Excel doesn't always show missing values as blanks. When you divide FLOP by Tokens and one of them is blank, Excel silently treats the blank as 0, so 0 / 0 can become a #DIV/0! error and FLOP / 0 becomes #DIV/0! as well. Sometimes blanks get coerced to 0 and you end up with a legitimate-looking 0 that isn't really zero — it's missing. This breaks the obvious "just find the smallest ratio" approach in 1e, so the README flags it and the hint for 1e nudges you toward ISNUMBER. Second, Date is a real date value, not a year number. YEAR(AIMODEL[Date]) is how you extract the year — you will need this in Exercise 2.
A note on row order. Sample spills throughout this report match the solutions workbook. Yours may look different. Click for why.
The starter and solutions workbooks contain the same AIMODEL rows but in different sort orders; the starter ships sorted by Date descending, the solutions workbook is sorted by Tokens descending. FILTER and UNIQUE preserve whatever row order the source table is currently in, so re-sorting AIMODEL by clicking a column header changes the order of the spill. Both are correct outputs of the same formula on the same data; only the set of results, not the sequence, is what the formula guarantees.
Excel Labs, the tool where your LAMBDAs live
Excel Labs is an add-in from Microsoft Research. It opens a side panel called the Advanced Formula Environment where you can write, save, and reuse LAMBDA functions under a chosen name. Once you save a LAMBDA there as MEAN_FLOP, any cell in the workbook can call it as =MEAN_FLOP("Language") just like a built-in function such as SUM. The lecture that introduces Excel Labs is Lecture 4. Open 04lecturefilled.xlsx and look at sheets "6 Excel labs" and "7 LAMBDA"; the worked examples include TWOSUM, THREESUM, POSITIVESUM, COUNT_UNIQUE, and DOTPRODUCT. Every exercise file from Lecture 4 onward keeps its own worked solutions as LAMBDAs named Solutions.* inside Excel Labs; you can open those to study the exact same patterns used in the exam.
If Excel Labs isn't installed on your machine, the fallback is Name Manager (Formulas → Name Manager → New), which is built into Excel. You paste the same LAMBDA(...) expression there, give it a name, and it behaves identically from a cell. The two are interchangeable for these exam questions.
One reading convention throughout this report: when you see
MEAN_FLOP = LAMBDA(domain, ...)
the NAME = part is for your eyes — you do not type that into a cell. In Excel, you create the name MEAN_FLOP in Excel Labs or Name Manager, and its value is the right-hand side starting from LAMBDA(...). Then you call it from a cell as =MEAN_FLOP("Language").
1a — COMPUTE_INTENSITY and the FLOP per token column
The exam asks you to build a function COMPUTE_INTENSITY(flop, tokens) that computes flop / tokens, rounded to two decimals, and then add a new column to the AIMODEL table called FLOP per token filled by this function.
At heart this is a two-input LAMBDA that returns one number. The only thing that lifts it above a literal division is the missing-value handling: some rows have no Tokens, so the division blows up, and you want #N/A in those cells rather than #DIV/0! or a nonsensical number.
The official version:
COMPUTE_INTENSITY = LAMBDA(flop, tokens,
IFERROR(
ROUND(flop / tokens, 2),
NA()
)
)
IFERROR(value, alt) — if value evaluates to any error, return alt instead; otherwise return value. Here alt is NA(), an Excel function that produces the #N/A error on purpose; NA() takes no arguments and is how you mark "no answer available".
ROUND(number, digits) rounds number to digits decimal places. ROUND(1.2345, 2) is 1.23.
Save this LAMBDA in Excel Labs as COMPUTE_INTENSITY. Then click inside the AIMODEL table, go to the empty column just to the right of Country, type the header FLOP per token in row 1, and in the first data cell of that column enter =COMPUTE_INTENSITY([@FLOP], [@Tokens]). The [@FLOP] syntax means "the FLOP cell on the current row" — it's how Excel tables refer to same-row values. Because you typed it inside a table, Excel auto-fills the column for every row, and the new column becomes part of the table as AIMODEL[FLOP per token].
One gotcha the solution workbook explicitly flags: when FLOP is blank, Excel coerces it to 0, so COMPUTE_INTENSITY returns 0 rather than #N/A. The solution author manually replaced those zeros with #N/A. A cleaner fix is to test the blanks inside the function, for example:
COMPUTE_INTENSITY = LAMBDA(flop, tokens,
IF(OR(flop = 0, tokens = 0, ISBLANK(flop), ISBLANK(tokens)),
NA(),
ROUND(flop / tokens, 2)))
Both should give full marks; the examiner note says the hand-fix is acceptable.
Pause.If you dropped COMPUTE_INTENSITY entirely and just wrote =[@FLOP]/[@Tokens] in the new column, would you still get partial credit? (Yes — it loses the rounding and the error handling, but the column itself is there. Get something in before polishing.)
Lecture tie-in. Two-argument LAMBDAs like this one are the very first things Lecture 4 shows; see 04lecturefilled.xlsx sheet "7 LAMBDA", where TWOSUM = LAMBDA(a, b, a + b) is the opening example. For the example most like this, open 04exercises.xlsx and look at Solutions.KMPH_TO_MS in Excel Labs: it's a one-line LAMBDA (LAMBDA(kmph, kmph * 10 / 36)) — same skeleton as COMPUTE_INTENSITY (one tiny arithmetic body, optionally wrapped in error-handling), just with one argument instead of two and different arithmetic. For the missing-value idiom, Solutions.IFBLANK in the same file shows the pattern for guarding against blank inputs.
1b — MEAN_FLOP, filter then average
For 1b you need a function MEAN_FLOP(domain) that returns the mean FLOP across all AIMODEL rows whose Domain equals the argument.
The core idea: pull the FLOP column, pull the Domain column, keep only the FLOP values where the domain matches, and average what's left. Two steps — filter, then average.
The official version uses LET to hold the two columns in named local variables, then wraps AVERAGE around a FILTER:
MEAN_FLOP = LAMBDA(domain,
LET(
flops, AIMODEL[FLOP],
domains, AIMODEL[Domain],
AVERAGE(FILTER(flops, domains = domain))
)
)
A few functions to know. LET(name1, value1, name2, value2, ..., final_expression) — LET binds local names to values, and the last argument is the expression that uses those names. Here in MEAN_FLOP, LET is just for readability — flops and domains are each used once. In more complex formulas LET also lets you avoid retyping the same range or expression repeatedly (and lets Excel evaluate it once instead of every time). Inside this LAMBDA, flops means AIMODEL[FLOP] everywhere below, and domains means AIMODEL[Domain].
FILTER(array, keep_condition); the heart of Part I. It takes an array (one or more columns) and a keep_condition (a boolean array the same height as array) and returns only the rows where the condition is TRUE. Example: FILTER({10; 20; 30}, {TRUE; FALSE; TRUE}) returns {10; 30}. Here the condition domains = domain compares every entry of AIMODEL[Domain] with the supplied domain argument — you get a column of TRUE/FALSE the same height as the table.
AVERAGE(range) is the built-in mean. Fed a filtered array, it computes the mean of whatever survived.
Save this as MEAN_FLOP in Excel Labs. Then =MEAN_FLOP("Language") in any cell returns the single number you want. The exam statement adds "include the model name in the table"; the solution puts the result beside a short list; you can just show MEAN_FLOP("Language") next to the label "Language" and you're done.
Try it first.How would you compute the average FLOP only for Language models without writing a LAMBDA? Sketch the steps in plain English.
The simplest unwritten alternative is AVERAGEIF(AIMODEL[Domain], "Language", AIMODEL[FLOP]), which does the filter-then-average in one built-in. It won't earn full LAMBDA marks because the exam explicitly asks for MEAN_FLOP(domain), but it's a sanity check; the number should match.
Common mistake: writing domain = domains instead of domains = domain. In Excel both work for equality, but swapping arguments trips some students into writing FILTER(flops, domain = domains) and then mistakenly thinking they should compare domains to the column of flops. Keep the column on the left, the scalar on the right, and the boolean array makes sense.
In class. The "filter a column by another column, then aggregate" shape is the bread and butter of Lecture 3 (03lecturefilled.xlsx sheet "6 UNIQUE and FILTER") and Lecture 4's LAMBDA sheet. The closest worked parallel is POSITIVESUM in 04lecturefilled.xlsx (Excel Labs) — LAMBDA(array, SUM(FILTER(array, array > 0))) — exactly the same aggregate(FILTER(column, mask)) shape, just with SUM instead of AVERAGE and a constant condition rather than equality to a parameter.
1c — DOMAINFLOPS, one row per domain
1c asks for a dynamic array: first column is every unique domain, second column is MEAN_FLOP applied to that domain. Call the named result DOMAINFLOPS.
A dynamic array is a formula that returns more than one cell — you enter it once, and Excel "spills" the output into the surrounding block. The spill block shows a thin blue or grey border; if any of those cells are already occupied, you get a #SPILL! error until you clear them. You can't manually edit a spilled cell without breaking the formula.
The official version:
DOMAINFLOPS = HSTACK(
UNIQUE(AIMODEL[Domain]),
MAP(UNIQUE(AIMODEL[Domain]),
LAMBDA(d, MEAN_FLOP(d)))
)
Three new functions:
UNIQUE(range) returns the distinct values of range as a dynamic array. UNIQUE({"A";"B";"A";"C"}) returns {"A";"B";"C"}. In this workbook it gives you the column of nine distinct domains in the order they first appear (Robotics, Language, Multimodal, Biology, Speech, …).
MAP(array, LAMBDA) applies the LAMBDA to every element of array and returns an array of the results. Toy example: MAP({1;2;3}, LAMBDA(x, x*x)) returns {1;4;9}. Here, MAP(UNIQUE(AIMODEL[Domain]), LAMBDA(d, MEAN_FLOP(d))) walks every unique domain and calls MEAN_FLOP on it, yielding a one-column array of mean FLOP values in the same order as the domains.
HSTACK(a, b, ...) stacks arrays horizontally; its output has as many rows as the tallest input and columns equal to the sum of input columns. HSTACK({"A";"B"}, {1;2}) returns the 2x2 block {"A",1; "B",2}. Combined with UNIQUE on the left and MAP on the right, you get a two-column block: domain on the left, mean FLOP on the right.
Because DOMAINFLOPS is stored in Excel Labs, you don't put it in a cell — you refer to it from one. Typing =DOMAINFLOPS in an empty cell spills the nine-by-two block of results. The order is whatever order each domain first appears in the AIMODEL table — UNIQUE keeps first-seen order, not alphabetical. In the starter workbook the spill looks like:
Language 2.42E+24
Multimodal 3.47E+25
Video 1.97E+24
Biology 1.01E+24
Robotics 6.16E+24
Audio 1.40E+23
Image generation 3.96E+23
Vision 1.93E+23
Speech 1.90E+23
Partial-credit alternative: if MAP feels unfamiliar, you can cheat by listing the nine domains by hand in one column and calling MEAN_FLOP on each one in the next column. You lose the "dynamic" in dynamic array — new domains added later won't show up but the numbers are right and the pattern is visible.
A trap to avoid: calling UNIQUE(AIMODEL[Domain]) twice in HSTACK (once directly, once inside MAP) is fine and is what the official solution does. Some students try to stash it in a LET local, which also works, but make sure the local name isn't UNIQUE, since that shadows the built-in.
From class. MAP and HSTACK are introduced in 05lecturefilled.xlsx, sheets "HSTACK and VSTACK" and "The MAP function". The HSTACK(UNIQUE(col), MAP(UNIQUE(col), LAMBDA(x, f(x)))) shape is the standard "summarise a column by category" idiom from Lecture 5 — if the MAP line confused you, open that sheet and stare at the worked examples.
1d — MAX_ROW_BY and MIN_ROW_BY, returning whole rows
1d steps up: instead of a single number, you need the whole AIMODEL row (every column) where some column hits its maximum or minimum. Do this for Parameters, Tokens, and FLOP. The solution workbook defines two LAMBDAs that handle this:
MAX_ROW_BY = LAMBDA(col,
FILTER(AIMODEL[], col = MAX(col))
)
MIN_ROW_BY = LAMBDA(col,
FILTER(AIMODEL[], col = MIN(col))
)
AIMODEL[] refers to every data row of AIMODEL (every column, every row) as one 2D array. MAX(col) and MIN(col) are the standard Excel aggregates over the column argument. The condition col = MAX(col) evaluates to TRUE only on the row where col attains its max. FILTER(AIMODEL[], col = MAX(col)) returns the matching row (or rows, if several tie).
Save both LAMBDAs. Then to get the largest-FLOP model's full row, you enter =MAX_ROW_BY(AIMODEL[FLOP]) and Excel spills the one-row block across nine columns (the eight base AIMODEL columns plus the FLOP per token column you added in 1a). Similarly =MAX_ROW_BY(AIMODEL[Parameters]), =MIN_ROW_BY(AIMODEL[Tokens]), and so on — six calls total for the six rows the exam wants.
In the solution workbook the output lives on the Exercise 1 sheet at rows 62–69 and reads things like
Tokens Smallest PaLI Multimodal 14/09/2022 Google 1.69E10 1.69E23 1.6E9 USA 1.06E14
Tokens Largest Cosmos-1.0-... Robotics 07/01/2025 NVIDIA 1.4E10 6.16E24 9E15 USA 683942400
FLOP Largest Grok-3 Multimodal 17/02/2025 xAI 0 4.64E26 0 USA #N/A
...
Note one detail about MIN over a column with blanks: Excel's MIN ignores blank cells entirely, so MIN(AIMODEL[Tokens]) returns the smallest real value (around 1.6E+09 for PaLI), not 0. The row that comes back from MIN_ROW_BY(AIMODEL[Tokens]) is therefore PaLI. You don't need a FILTER(col, col > 0) guard — MIN already handles blanks correctly. The thing to watch out for is the display of the resulting row: cells that were blank in the source render as 0 in the spilled output (e.g. the Parameters smallest row shows FLOP = 0 and Tokens = 0 because those entries were blank for that model). That's a rendering quirk of the spill, not a bug in your formula.
A cheaper way to get the marks: the solution workbook explicitly says "No points are deducted for using XLOOKUP and returning only one model." That means =XLOOKUP(MAX(AIMODEL[FLOP]), AIMODEL[FLOP], AIMODEL[Model]) is a fine simpler path that returns just the model name (not the whole row). If the FILTER-returns-a-row idea is shaky, this will get the marks.
Pause.In plain English: if three models tied for the largest FLOP value, what would MAX_ROW_BY(AIMODEL[FLOP]) return, and what would the XLOOKUP version return?
Answer: MAX_ROW_BY returns all three rows — FILTER keeps every TRUE match. XLOOKUP returns only the first match. In the solution workbook this matters for Parameters: the largest-parameters result includes both Llama 4 Behemoth (preview) and Step-2, because both have 2E+12 parameters. The other requested extremes are single-row results.
Worked example. This is a close parallel to the lectures in the exam. Open 04exercises.xlsx, launch Excel Labs, and read Solutions.LARGEBY and Solutions.SMALLBY side by side. They take a table, a column, and a count n, apply SORTBY followed by TAKE(n), and return the top-n (or bottom-n) rows by that column. With n = 1 you get one largest or smallest row. The exam version uses a slightly different path — FILTER(..., col = MAX(col)) which returns all rows tied at the extreme. That tie behavior is why the largest-parameters output spills two model rows in the solution workbook. Students who already solved the LARGEBY/SMALLBY exercise effectively have the technique already.
1e — smallest FLOP per token, guarding against missing values
1e asks for the model with the smallest FLOP / Tokens, returning the model name and the computed value. The hint says: "Some models may have missing FLOP or token values. You may need to check for valid numbers using ISNUMBER." That hint matters; the naive approach finds one of the spurious zeros you created in 1a.
The official version (stored as OUT_MIN_FLOP_PER_TOKEN — it's a LET, not a LAMBDA, because it doesn't need arguments):
OUT_MIN_FLOP_PER_TOKEN = LET(
flop, AIMODEL[FLOP],
tokens, AIMODEL[Tokens],
names, AIMODEL[Model],
ratio, IF(flop / tokens, flop / tokens, NA()),
minval, MIN(FILTER(ratio, ISNUMBER(ratio))),
model, XLOOKUP(minval, ratio, names),
HSTACK(model, minval)
)
Walk the LET locals top to bottom. flop, tokens, names are just convenient aliases for the three columns. ratio is the whole column of flop / tokens values, with a trick: IF(flop / tokens, flop / tokens, NA()) — if the ratio is a non-zero number, keep it; if it's exactly 0 (the spurious value Excel produces when flop is blank but tokens isn't), the IF treats 0 as falsy and returns NA() instead. The #DIV/0! errors that arise when tokens is 0 (or both are blank) propagate straight through the IF rather than being stamped, but that's fine; the next step filters them out by ISNUMBER anyway. The net effect is that every row in ratio is either a real number or some kind of "no answer" marker.
minval is the smallest non-missing ratio. FILTER(ratio, ISNUMBER(ratio)) drops any non-numeric entries; both the #N/A markers from the IF trick and any #DIV/0! errors that propagated through, and MIN over what survives is the true minimum. ISNUMBER(x) returns TRUE if x is a number (any kind, including zero) and FALSE if x is text, blank, a boolean, or any error (#N/A, #DIV/0!, #VALUE!, …).
XLOOKUP(lookup_value, lookup_array, return_array) is the modern replacement for VLOOKUP. It finds lookup_value inside lookup_array and returns the corresponding entry from return_array. Here, find minval inside ratio and return the matching model name.
The final line HSTACK(model, minval) places the name and the value side by side and spills to a one-row, two-column result.
Store this in Excel Labs as a named expression OUT_MIN_FLOP_PER_TOKEN, and type =OUT_MIN_FLOP_PER_TOKEN into a cell. You should see something like Cosmos-1.0-Diffusion-14B Video2World | 683942400 (that particular model has a huge token count and relatively modest FLOP).
One thing that trips people up: skipping the IF(flop/tokens, flop/tokens, NA()) step. Without it, MIN(ratio) gives a useless answer in one of two ways. If any row has 0 / 0 (both blank) or real / 0 (Tokens blank), the ratio is #DIV/0!, and MIN propagates that error rather than returning a real number; the whole formula fails. If you somehow avoid the error rows (say by inspecting the data and trusting that won't happen), the rows where FLOP is blank but Tokens is real still give a ratio of 0, and MIN happily returns 0 as the smallest. Either way the genuine minimum is never surfaced and XLOOKUP either errors or returns the wrong model. The IF trick is what protects you from both cases.
Simpler version: if you manually stamped #N/A in the FLOP per token column in 1a (which the solution author did), you don't need the IF(..., NA()) trick inside ratio — you can reference AIMODEL[FLOP per token] directly and still use MIN(FILTER(col, ISNUMBER(col))). That's cleaner and earns the same marks.
Lecture parallel. The missing-value idiom comes from Lecture 4, specifically Solutions.IFBLANK inside 04exercises.xlsx (Excel Labs): LAMBDA(value, value_if_blank, IF(ISBLANK(value), value_if_blank, value)). Same shape — test a value, swap it for a fallback, push the number through. The XLOOKUP call is Lecture 3 — 03lecturefilled.xlsx sheet "4 The LOOKUP functions".
1f — OVERENGINEERED_MODELS, two-condition filter
1f defines an "overengineered" model as one that uses more than 2e24 FLOP and has a FLOP per token greater than 1e12 (the exam statement had a typo that said 0.1; the solution accepts either threshold).
The official version:
OVERENGINEERED_MODELS = LET(
cleaned, FILTER(AIMODEL[],
ISNUMBER(AIMODEL[FLOP]) *
ISNUMBER(AIMODEL[Tokens]) *
(AIMODEL[Tokens] <> 0)),
flop, INDEX(cleaned, , XMATCH("FLOP", AIMODEL[#Headers])),
tokens, INDEX(cleaned, , XMATCH("Tokens", AIMODEL[#Headers])),
model, INDEX(cleaned, , XMATCH("Model", AIMODEL[#Headers])),
ratio, flop / tokens,
mask, (flop > 2E+24) * (ratio > 1000000000000),
FILTER(model, mask)
)
This is denser than anything so far, so let's walk it carefully.
cleaned is the table with all the bad rows stripped out. FILTER(AIMODEL[], boolean_mask) keeps only rows where the mask is TRUE. The mask here is a product of three boolean arrays: the FLOP is a number, the Tokens is a number, and the Tokens aren't zero. This is the first time in the report that boolean multiplication acts as AND, and it's the pattern you'll see again in 1g and in Exercise 3. The rule: TRUE * TRUE = 1, TRUE * FALSE = 0, FALSE * FALSE = 0. Multiplying boolean arrays element-by-element gives you the AND of the conditions as a 1/0 array, which FILTER treats as TRUE/FALSE.
INDEX(array, row_number, column_number) pulls out a slice. With row_number left blank (the ,, in the call), it returns the whole column. XMATCH("FLOP", AIMODEL[#Headers]) finds the position of the header "FLOP" in the AIMODEL headers row which is 6 (Model=1, Domain=2, Date=3, Organization=4, Parameters=5, FLOP=6, Tokens=7, Country=8). So INDEX(cleaned, , 6) is the sixth column of the cleaned table, which is the FLOP column restricted to cleaned rows. Same logic for tokens and model.
ratio is flop / tokens across those cleaned rows — no missing-value dance needed because we stripped them at the top.
mask combines the two exam conditions: FLOP above 2e24 and FLOP per token above 1e12. Boolean multiplication again.
FILTER(model, mask) is the final answer: the list of model names that pass both conditions.
Type =OVERENGINEERED_MODELS into a cell and it spills a single column of thirteen names, starting with Llama 4 Behemoth (preview), Llama 3.1-405B, GPT-4, GLM-4 (0116), Nemotron-4 340B, Doubao-pro, and continuing through models like PaLM 2, Falcon-180B, and PaLM (540B).
Easier path: if INDEX(..., XMATCH(...)) looks terrifying, replace it with a simpler version that wraps the division in IFERROR so blank-token rows don't blow up:
OVERENGINEERED_MODELS_SIMPLE = FILTER(
AIMODEL[Model],
(AIMODEL[FLOP] > 2E+24) *
(IFERROR(AIMODEL[FLOP] / AIMODEL[Tokens], 0) > 1E12)
)
IFERROR(AIMODEL[FLOP] / AIMODEL[Tokens], 0) swaps every per-row #DIV/0! for a 0, which then doesn't pass the > 1E12 threshold. The other condition, (AIMODEL[FLOP] > 2E+24), independently knocks out blank-FLOP rows (a blank acts as 0, which fails the > 2E+24 test). Between them the mask has no errors and FILTER spills a clean list of names that matches the official answer on all the real models.
Easy to get wrong: using AND instead of multiplication. AND(AIMODEL[FLOP] > 2E24, AIMODEL[FLOP]/AIMODEL[Tokens] > 1E12) collapses the two boolean arrays into a single boolean — it returns one TRUE or FALSE, not an array. FILTER then sees a single value and can't decide row-by-row. You must multiply arrays to get an array of per-row booleans.
From the lectures. The (a > x) * (b > y) AND idiom comes from Lecture 3's FILTER examples (03lecturefilled.xlsx, sheet "6 UNIQUE and FILTER"). For the XMATCH/INDEX combo, see the same lecture's sheet "5 INDEX and MATCH". For a cleaner worked parallel with boolean multiplication, look at Solutions.LARGEFILTERBY inside 04exercises.xlsx — it's FILTER driven by a boolean vector built the same way.
1g — FILTERED_MODELS driven by dropdowns
1g pulls everything together: give the user two dropdown lists — one for Country, one for Domain, and show every AIMODEL row that matches both selections.
The LAMBDA that does the filtering is short:
FILTERED_MODELS = LAMBDA(d, c,
FILTER(AIMODEL[],
(AIMODEL[Domain] = d) *
(AIMODEL[Country] = c))
)
It takes a domain d and a country c, and filters rows where both match. Same boolean-multiplication-for-AND pattern from 1f, now as a two-condition filter on two different columns.
The construction-of-the-dropdowns work is where this subpart is where this gets harder. The steps:
- On the
Exercise 1sheet, pick a row out of the way and compute the unique domains:=TRANSPOSE(UNIQUE(AIMODEL[Domain])).TRANSPOSEflips a column into a row so the dropdown list reads horizontally. Do the same for countries:=TRANSPOSE(UNIQUE(AIMODEL[Country])). The solution workbook puts these in rows 109 and 112. - Click the cell where you want the "Select domain" dropdown. Data tab → Data Validation → Allow: "List" → Source: select the range that holds the unique domains you just spilled. Tick "In-cell dropdown". Do the same for country in an adjacent cell, pointing at the countries row.
- Say the domain-dropdown cell is
B118and the country-dropdown cell isC118. Then the one formula you need is=FILTERED_MODELS(B118, C118), placed in a cell below a header row you type out yourself (Model | Domain | Date | Organization | ...). It spills all matching rows.
Change the dropdowns, the spill updates.
Try it first.If the user picks Video and United States of America, describe in plain English what rows should appear. Then open the solution workbook and check row 122 onwards.
Watch out: sourcing the dropdown from the raw AIMODEL[Domain] column directly. That shows every row's domain (including duplicates, Language hundreds of times), which is ugly and slow. Always funnel through UNIQUE.
Another easy slip: the dropdown cell's contents end up exact-matching the source list, including trailing spaces. The solution actually includes both "United Kingdom" and "United Kingdom " in its country list because the data has a stray trailing-space row — don't spend time debugging that, it's a data quirk.
If that's too fiddly: if dynamic dropdowns are too fiddly, get most of the marks with hardcoded validation lists. Data Validation → List → Source: "Language,Multimodal,Video,Audio,...". You lose the "read from the data" dynamism, but FILTERED_MODELS(B118, C118) still works.
Where this came from. Data validation and dropdowns are Lecture 3 (03lecturefilled.xlsx, sheet "1 Data tables"). The dynamic-dropdown mechanics — UNIQUE piped into a data-validation source range — aren't covered explicitly in any lecture file by name; the construction follows from combining the data-validation list source from Lecture 3 with the UNIQUE examples in 03lecturefilled.xlsx sheet "6 UNIQUE and FILTER". For the two-condition FILTER pattern, it's Lecture 3 again; see 03lecturefilled.xlsx sheet "6 UNIQUE and FILTER" and the FILTER-with-multiplication examples that run through the AIMODEL-adjacent Sales dataset.
If any of this still feels shaky
If the LAMBDA syntax itself is the block, go back to 04lecturefilled.xlsx sheet "7 LAMBDA" and read the five worked examples (TWOSUM, THREESUM, POSITIVESUM, COUNT_UNIQUE, DOTPRODUCT). If FILTER feels like magic, 03lecturefilled.xlsx sheet "6 UNIQUE and FILTER" has five pages of small worked examples on the Sales data. If MAP and HSTACK in 1c didn't click, sheet "The MAP function" of 05lecturefilled.xlsx is the direct reference. If the whole "filter AIMODEL by something, then pull the max row" shape seems unfamiliar, open 04exercises.xlsx, launch Excel Labs, and read Solutions.LARGEBY, Solutions.SMALLBY, Solutions.LARGEFILTERBY, Solutions.SMALLFILTERBY in order that is the same exam subpart rewritten for a different dataset, with the official pattern fully worked.
Glossary
- Advanced Formula Environment
The side panel that opens when you enable the Excel Labs add-in. It lets you write, save, and edit LAMBDAs by name. The saved LAMBDAs can be called from any cell in the workbook as if they were built-in functions.
- AIMODEL
The Excel table on the sheet of the same name in
exam_part1.xlsx. Rows are AI models; columns areModel,Domain,Date,Organization,Parameters,FLOP,Tokens,Country.AVERAGE(range)Returns the arithmetic mean of the numeric values in
range. Ignores text and blanks.- Boolean multiplication as AND
TRUE * TRUE = 1, everything else= 0. Multiplying two boolean arrays element-by-element gives a per-row AND. Used wheneverFILTERneeds to combine multiple conditions.- Dynamic array
A formula result that spills into more than one cell automatically. Enter it in one cell, Excel fills a rectangular block. The block shows a thin border; cells inside it can't be manually written to without breaking the spill.
- Excel Labs
A Microsoft Research add-in that provides the Advanced Formula Environment. Install it from Insert → Get Add-ins → Excel Labs.
- Excel table
A block of cells formally marked as a table (Insert → Table, or Ctrl+T). Its columns are referenced by name —
AIMODEL[FLOP]means the FLOP column.FILTER(array, keep_condition)Returns the rows (or entries) of
arraywherekeep_conditionisTRUE.keep_conditionmust be a boolean array the same height asarray. Central to every LAMBDA in this exercise.HSTACK(a, b, ...)Stacks arrays horizontally into one dynamic array. All inputs must have the same number of rows.
IFERROR(value, alt)Returns
valueunless it evaluates to any error (#N/A,#DIV/0!,#VALUE!, etc.), in which case it returnsalt. Used inCOMPUTE_INTENSITYto swap missing-data divisions forNA().INDEX(array, row, column)Returns the cell at
(row, column)ofarray. Leavingrowblank returns the whole column; leavingcolumnblank returns the whole row. Used inOVERENGINEERED_MODELStogether withXMATCHto pull a column by header name.ISNUMBER(x)Returns
TRUEifxis a number (including zero),FALSEif it's text, blank, a boolean, or any error. Used in 1e and 1f to filter out the spurious#N/Aand error-propagated cells before aggregating.- LAMBDA
A reusable function defined with
LAMBDA(arg1, arg2, ..., expression)and saved by name in Excel Labs or Name Manager. Once saved, it behaves like a built-in function —=MY_LAMBDA(x, y)from any cell.LET(name1, val1, ..., final)Binds local names to values, then evaluates
final. Doesn't change the result, only readability, and Excel evaluates each value once, so reusing the same expression multiple times viaLETis also a small performance win.MAP(array, LAMBDA)Applies the LAMBDA to every element of
arrayand returns an array of the results. Same shape and order as the input. Used inDOMAINFLOPSto callMEAN_FLOPon each unique domain.MAX(range)/MIN(range)Largest / smallest numeric value in
range. Both ignore blanks and text; both treat0as a real value.MINignoring blanks is what makesMIN_ROW_BY(AIMODEL[Tokens])returnPaLIrather than a zero-token row.NA()Produces the
#N/Aerror deliberately. No arguments. Used to mark "no answer available" so downstreamISNUMBERchecks can strip those rows out.- Name Manager
Excel's built-in tool (Formulas → Name Manager) for managing named ranges and named LAMBDAs. Acts as a fallback to Excel Labs; the same
LAMBDA(...)expression pasted into Name Manager works identically when called from a cell.- Named expression
A name assigned to any formula; a range reference, a constant, or a LAMBDA. Created in Excel Labs or Name Manager.
DOMAINFLOPSandOUT_MIN_FLOP_PER_TOKENare named expressions; you call them by writing=DOMAINFLOPSor=OUT_MIN_FLOP_PER_TOKENin a cell.ROUND(number, digits)Rounds
numbertodigitsdecimal places.ROUND(1.2345, 2)is1.23. Used inCOMPUTE_INTENSITYto keep the FLOP-per-token column at two decimals.- Spill range
The block of cells a dynamic-array formula fills. The top-left holds the formula; the rest is filled by Excel and can't be manually edited without breaking the spill. The whole block is referenced as
C5#if the formula is inC5.UNIQUE(range)Returns the distinct values of
rangeas a dynamic array, in first-seen order.UNIQUE(AIMODEL[Domain])gives the column of nine domains used inDOMAINFLOPS.XLOOKUP(find, in_array, return_array)Modern replacement for
VLOOKUP. Findsfindinsidein_arrayand returns the corresponding entry fromreturn_array. Returns the first match if there are ties. Used inOUT_MIN_FLOP_PER_TOKENto map the smallest ratio back to its model name.XMATCH(find, array)Returns the 1-based position of
findinsidearray. Used inOVERENGINEERED_MODELSwithINDEXto look up columns by header name rather than by hardcoded position.