Task 2 — Monthly averages, two different ways

What this task is really about

You have a long daily table (the BIKE table, 730-odd rows, one row per day across 2011 and 2012) and you need a short monthly table (24 rows, one per month) with two columns of averages: average temperature for the month, and average number of users for the month. The user-type dropdown from Task 1 controls which underlying users column gets averaged, through the Users column you built in Task 1, so Task 2 just has to believe that [Users] is already the right number.

The exam explicitly says either a formula solution or a pivot table counts, and the official solution demonstrates both. This report does the same. The pivot is the shorter route and the one most students will pick; the formula version uses LAMBDA and is a real exercise in reasoning about arrays. Either one gets full marks for Task 2. If you pick the pivot you'll need it again for Task 3 anyway, and you'll want the formula version if your markers are nudging you towards the harder route.

How to read the exam wording

"Per month-year pair in the dataset" means you want one row per distinct combination of year and month — 2011-01, 2011-02, all the way through 2012-12. That's 24 rows. "The selected userType" is the dropdown; because the Users column on the BIKE table already reads the dropdown, all you need to do is average that column and the temp column grouped by MonthYear.

A pivot table is Excel's grouping-and-aggregating tool. You tell it "put this field on the rows, put this field in the values area, aggregate it with this function", and it builds the grouped table for you. Aggregate means reducing many numbers to one — sum, average, count, min, max. Row field is the column whose unique values become the row labels. Data field (also called value field) is the column whose values get aggregated. If your data has 731 rows and you pivot on MonthYear as a row field with Users as a data field summarised by average, you get 24 rows, each one being the mean of all that month's users.

The note in the exam wording — "if you use a PivotTable you will have to refresh the PivotTable when you change the userType dropdown" — matters. Pivot tables cache their data; they don't recalculate automatically when a dependency changes. If the user picks a different dropdown value, the Users column on the BIKE table updates instantly, but the pivot keeps showing the old numbers until you refresh. The official script doesn't solve this — it would need a separate refresh script the user triggers. The formula version doesn't have this problem: Excel formulas are live.

The minimum-viable, click-through version

Manual pivot: select any cell inside the BIKE table → Insert → PivotTable → place it on Questions. Drag MonthYear into Rows, temp and Users both into Values. Click the little arrow next to each value field and choose Summarize by → Average. Turn off column grand totals. Done. That took about forty seconds once you know where the buttons are.

The script version is just those clicks written out as code. The formula version skips the pivot entirely and gets the same numbers a different way. Either way you end up with a 24-row table: date, average temp, average users.

Pivot-table path; the walkthrough

The official pivot code lives in Q2pTable. Four lines set up the pivot shape, one loop switches both value fields to averages, and one line hides the grand totals. Read all six lines slowly. ExcelScript method signatures are easy to mix up.

Start by placing the pivot:

TypeScript

let pTable = Qs.addPivotTable("Q2MonthlyAverages", bike, Qs.getCell(19, 1));

addPivotTable takes three things: a name ("Q2MonthlyAverages" — visible later in the pivot-tables pane), the source (the BIKE table, referenced by the bike variable set up in main), and the destination cell (getCell(19, 1) = row 20, column B, written B20 in Excel's 1-based address). After this line the workbook has an empty pivot sitting on Questions at B20. No rows, no values yet — you've only told Excel where it goes and what feeds it.

(One thing worth knowing: the official script in part2_solutions.txt writes the pivot starting at B20, but the solutions workbook on file actually shows it at B23; the workbook was adjusted after the script was last run. If your output lands at B20 you've followed the code correctly; do not try to "fix" the discrepancy.)

Next, pin the row field and the two data fields:

TypeScript

pTable.addRowHierarchy(pTable.getHierarchy("MonthYear"));
let temp = pTable.addDataHierarchy(pTable.getHierarchy("temp"));
let users = pTable.addDataHierarchy(pTable.getHierarchy("Users"));

A hierarchy, in pivot-table terminology, is what the API calls a column you can drop into the Rows/Columns/Values areas. getHierarchy("MonthYear") grabs the MonthYear column; addRowHierarchy puts it in the Rows area — that's what makes each unique month become a row label. addDataHierarchy puts a column in the Values area. After these three lines the pivot has one row per month and two value columns, both defaulting to SUM.

Default aggregation in Excel is SUM for numeric columns, and that's not what the exam asked for. Switch both to average:

TypeScript

for (let x of [temp, users]) {
    x.setSummarizeBy(ExcelScript.AggregationFunction.average);
}

This is a for…of loop — it just runs the body once per item in the array, binding x to each in turn. The body calls setSummarizeBy with a value from an enum. An enum is a fixed list of named constants; ExcelScript.AggregationFunction.average is the "average" member of the AggregationFunction enum, equivalent to choosing "Average" in the GUI's Summarize-by menu. Other members include sum, count, min, max.

The headers Excel writes for the two value columns will now be "Average of temp" and "Average of Users". That's the default label — you could rename them with temp.setName("Temperature") if you want, but the official solution doesn't bother.

Finally, kill the grand totals row:

TypeScript

pTable.getLayout().setShowColumnGrandTotals(false);
return pTable;

getLayout() reaches the layout object, which holds the settings you'd find under PivotTable → Design → Grand Totals in the GUI. The script turns off column grand totals — for this pivot that hides the bottom "Grand Total" row that would otherwise sum the monthly averages across the whole range, which is not a meaningful number.

After all this, B20:D44 on Questions holds a pivot that looks like:

Row Labels           Average of temp    Average of Users
01/01/2011           8.106              1231.9
02/01/2011           11.584             1722.0
...

Methods used: addPivotTable, getHierarchy, addRowHierarchy, addDataHierarchy, setSummarizeBy, getLayout, setShowColumnGrandTotals; see pivottable.pdf and workbook_worksheet.pdf.

The thing that will go wrong: if the dropdown on B12 changes, the Users column on the BIKE table recomputes immediately, but the pivot cache still holds the old numbers. The pivot will visually not update. Marker-friendly solution: add a button or a separate script that calls pTable.refresh() after the dropdown change. The official script doesn't include this because it's an architectural decision, not a correctness one.

Formula path; the LAMBDA story

The formula solution builds a single dynamic-array formula that produces the same 24-row, 3-column output without a pivot. It's packaged as a named LAMBDA called getQ2Averages, and a single cell calls it with =getQ2Averages(BIKE). When that formula runs it spills a block of values across F23:H47; a whole region filled from one formula.

A dynamic array is a formula that returns more than one value and fills a rectangle of cells starting from the cell it's written in. A spill range is that rectangle. If any of the target cells already contain data, the formula errors with #SPILL!. You can see if a cell is the origin of a spill because it shows the whole block highlighted with a blue outline when selected.

Before diving into the official formula, a pause for the underlying idea.

Pause.You have 24 months of data and the BIKE[MonthYear] column repeats each month-value about 30 times. How would you take the list of 24 unique months and, for each one, average the temp values that match it? Sketch it in words.

The plan is: get the 24 unique months; then for each month, filter the temp column down to just the days in that month and take the average; do the same for users; stack the three columns side-by-side; add a header row on top.

Now the formula. The official script writes it via addNamedItem, escaped with line-continuations. Cleaned up, it reads:

Excel

=LAMBDA(BIKE,
  LET(
    dates,   UNIQUE(BIKE[MonthYear]),
    temp,    BYROW(dates, LAMBDA(row, AVERAGE(FILTER(BIKE[temp],  BIKE[MonthYear] = row)))),
    user,    BYROW(dates, LAMBDA(row, AVERAGE(FILTER(BIKE[Users], BIKE[MonthYear] = row)))),
    headers, {"Date", "Temperature", "Users"},
    body,    HSTACK(dates, temp, user),
    VSTACK(headers, body)
  )
)

Seven Excel functions appear here, all of them covered in the Part 1 LAMBDA lectures: LAMBDA in Lecture 4 (04lecturefilled.xlsx sheet "7 LAMBDA"), LET in Lecture 6 (06lecturefilled.xlsx sheet "Local variables with LET"), UNIQUE and FILTER in Lecture 3 (03lecturefilled.xlsx sheet "6 UNIQUE and FILTER"), BYROW in Lecture 5 (05lecturefilled.xlsx sheet "BYROW, BYCOL, and MAKEARRAY"), and HSTACK/VSTACK in Lecture 5 sheet "HSTACK and VSTACK". Worth refreshing them all here before looking at how they fit together.

LAMBDA(args..., expression) wraps a formula into a reusable function. LAMBDA(x, x * 2) is a function that doubles its argument. You can't call a bare LAMBDA directly in a cell — you either wrap it in addNamedItem so it has a name, or you call it inline: =LAMBDA(x, x * 2)(5) evaluates to 10. Saving it as a named item (what the script does via Qs.addNamedItem("getQ2Averages", ...)) makes =getQ2Averages(BIKE) work like any built-in function.

LET(name1, value1, name2, value2, ..., final_expression) is the formula equivalent of declaring local variables. Each pair is a name followed by the value it should hold; the last argument is the expression that actually returns something. =LET(a, 3, b, 4, a + b) returns 7. LET exists so you can name intermediate steps instead of nesting seven things inside one formula.

UNIQUE(array) returns the distinct values from a one-column (or one-row) array, keeping order. UNIQUE({1; 2; 1; 3; 2}) returns {1; 2; 3}.

FILTER(array, boolean_array) returns only the rows of array where the corresponding entry in boolean_array is TRUE. FILTER({10;20;30;40}, {TRUE;FALSE;TRUE;FALSE}) returns {10; 30}. The boolean array is usually built from a comparison: BIKE[MonthYear] = row returns a column of TRUE/FALSE, one per row in the BIKE table.

BYROW(array, lambda) takes a 2D array and applies the lambda to each row, returning one value per row. If array has 24 rows, you get 24 values back. The lambda receives each row as its argument.

HSTACK(...) glues arrays horizontally (side by side). VSTACK(...) glues them vertically (top to bottom). Both care about matching dimensions — HSTACK needs the same number of rows, VSTACK needs the same number of columns.

With that vocabulary, the LET locals read as a short pipeline.

Walking the locals in dependency order, with a miniature example — imagine the BIKE table has just two months of data, January and February 2011, four rows per month:

dates is UNIQUE(BIKE[MonthYear]). In the mini-example that's a 2-row, 1-column array: {01/01/2011; 01/02/2011}. In the real data it's 24 rows.

temp is where it gets dense. BYROW(dates, LAMBDA(row, ...)) runs the lambda once per entry in dates. For the first entry (01/01/2011), row equals 01/01/2011, so FILTER(BIKE[temp], BIKE[MonthYear] = row) returns only the temperature values where MonthYear equals January 2011 — all 31 of them. AVERAGE(...) collapses those 31 values to one. BYROW then does the same for February, and so on. temp ends up as a 24-row, 1-column array of per-month average temperatures.

user is structurally identical, just swapped for the Users column. 24 rows, 1 column, one average per month.

headers is {"Date", "Temperature", "Users"}. Curly-brace syntax with commas makes a horizontal 1×3 literal array. Semicolons would make it vertical. This is the header row the final output wants.

body is HSTACK(dates, temp, user). Three 24-row, 1-column arrays stacked side by side produce a single 24-row, 3-column array; the data body, without headers.

The final return is VSTACK(headers, body); a 1×3 array stacked on top of a 24×3 array gives a 25×3 result. That's what spills into F23:H47 when the cell formula =getQ2Averages(BIKE) is run.

Methods used: addNamedItem (worksheet-scoped, because Qs.addNamedItem rather than workbook.addNamedItem), setFormula, getAbsoluteResizedRange, getSurroundingRegion, getRowCount, setNumberFormat; see workbook_worksheet.pdf and range.pdf.

The script wraps the formula in a named item first, then writes the call into a single cell, then reformats the date column:

TypeScript

Qs.getCell(19, 5).setFormula("=getQ2Averages(BIKE)");
let cell = Qs.getCell(20, 5);
cell.getAbsoluteResizedRange(cell.getSurroundingRegion().getRowCount() - 1, 1)
    .setNumberFormat("mm/yyyy");

getCell(19, 5) is F20 — where the script writes the formula. getCell(20, 5) is F21, the first data cell under the header. (Same script-vs-workbook caveat as the pivot version: the script places the formula at F20, but the solutions workbook on file actually shows the spill starting at F23; the workbook was adjusted after the script was last run. If your output lands at F20:H44 you've followed the code correctly.) getSurroundingRegion() returns the whole contiguous block that includes that cell (so, the entire spill). getRowCount() - 1 subtracts the header row. getAbsoluteResizedRange(rows, 1) then reshapes the single cell back into a 1-column range of the right height (covering only the date cells under the header). setNumberFormat("mm/yyyy") gives them the same display format as the MonthYear column on the BIKE table.

The thing that goes wrong: if any cell in the spill destination isn't empty, the formula fails with #SPILL!. If you're following the script exactly, clear F20:H44 before running it. If you're matching the adjusted solution workbook layout, clear F23:H47. This includes any whitespace or leftover content you can't easily see.

Which one to pick

If you want the shortest correct answer, the pivot is six lines, uses concepts the lectures covered, and is hard to get wrong. Its one weakness — needing to be refreshed when the dropdown changes — is a known trade-off and easy to patch with a refresh button if you want to be thorough.

If you want the more instructive answer, the LAMBDA version teaches you something about array formulas that will show up again if you ever do real data work in Excel. It's also self-refreshing — change the dropdown and the averages update immediately, because the formula reads BIKE[Users] which is itself a formula.

Practically, a student with limited time should pick the pivot. A student with the LAMBDA version working can submit both, exactly as the official solution does; the top text on the Questions sheet in the solutions workbook reads "ONLY ONE APPROACH IS REQUIRED — HERE I DEMONSTRATE BOTH A FORMULA SOLUTION AND A PIVOTTABLE SOLUTION", and neither is penalised.

Try it first.Build the pivot via the GUI before you script it. Then read the six script lines one by one and match each to a thing you clicked. The script suddenly feels less magical.

An alternative worth mentioning

A third path, not taken by the official solution, is a helper-column approach: on the BIKE table, add columns for per-row month averages using AVERAGEIFS, then take UNIQUE(MonthYear) in the destination and look up the values. It works but it bloats the BIKE table with two calculation columns and re-does the same average for every one of the 731 rows. The LAMBDA approach is tidier — it keeps the calculation in one place and only computes each monthly average once.

If this still feels shaky

01PivotTables.pdf walks through the UI side of pivot tables; the drag-and-drop version of what addRowHierarchy and addDataHierarchy are scripting. pivottable.pdf is the method cheatsheet. For the LAMBDA side, the Part 1 lectures cover every building block: 04lecturefilled.xlsx for LAMBDA, 06lecturefilled.xlsx for LET, 03lecturefilled.xlsx for UNIQUE/FILTER, and 05lecturefilled.xlsx for BYROW and HSTACK/VSTACK. range.pdf explains getSurroundingRegion and getAbsoluteResizedRange, which are the two methods in the formula version that most students skim past without noticing.

Glossary

Aggregation: a function that reduces many numbers to one: SUM, AVERAGE, COUNT, MIN, MAX. setSummarizeBy(ExcelScript.AggregationFunction.average) sets which aggregation a pivot's data field uses.
Argument: a value passed to a function when you call it. In AVERAGE(BIKE[temp]), BIKE[temp] is the argument.
Array formula: a formula that works on (or returns) a whole block of values rather than a single number. Modern Excel handles these through dynamic arrays.
Arrow function: (not used in the official code, but likely seen elsewhere); a compact way to write a function in TypeScript, like (x) => x * 2.
BYROW: Excel function that applies a lambda to each row of an array, returning one value per row.
Column field: a pivot field dropped into the Columns area. Not used in this exam; the pivot here has a row field but no column field.
Data field / value field: a pivot field dropped into the Values area; its values get aggregated.
Dynamic array: a formula that returns multiple values and fills a rectangle starting at the formula cell.
Enum: a named set of constants. ExcelScript.AggregationFunction.average, ExcelScript.ChartType.line are enum members.
FILTER: Excel function that returns the rows of an array where a boolean condition is true.
Function (TypeScript): a reusable block of code that takes arguments and returns a value. main, Q1dataValidation, etc., are functions.
Grand total: a summary row or column a pivot adds by default to show overall totals. Turned off here because averaging the monthly averages doesn't answer a real question.
HSTACK: Excel function that concatenates arrays side by side.
LAMBDA: Excel function that wraps a formula into a reusable, callable thing. Saved with a name via addNamedItem so you can call it like a built-in function.
LET: Excel function that introduces named intermediate values inside a formula, like local variables.
Named LAMBDA: a LAMBDA stored as a named item so formulas can call it by name. getQ2Averages is one.
Pivot table: Excel's grouping-and-aggregating tool. Produces a summary table from a source table.
Return value: what a function gives back when called. Q2pTable returns the pivot; getQ2Averages(BIKE) returns a 25×3 array.
Row field: a pivot field dropped into the Rows area; its unique values become the row labels.
Spill / spill range: the rectangle of cells a dynamic-array formula fills. If any cell in the range isn't empty, the formula errors with #SPILL!.
UNIQUE: Excel function that removes duplicates from a list, keeping the first occurrence of each value.
VSTACK: Excel function that concatenates arrays top to bottom.