Twitter used to be a great place to make connections. Today I dove into my Twitter history to find a useful bit of information and it seems like a good idea to save it here so I can find it more easily in the future. After struggling with Google Sheets back in 2021, I wrote:
Note to self: you want VLOOKUP(). HLOOKUP() is hardly ever what you need. Also, your data is never sorted. Always add “false” as the last parameter.
I used to note that,
But eventually learned
The INDEX(MATCH( )) trick.
But note that MATCH(), too,
Has a dumb “best match” default
For the same gotcha.
Yes, he writes in haiku. (His Twitter handle is HaikusByIsaac.)
So my initial post was just an observation that when setting up a spreadsheets, data is usually arranged with named columns and data added as new rows. It matches the way we usually think about the relational database model invented by Edgar “Ted” Codd. So if you want to join two tables, you’ll want to find a foreign key relationship and match those values across the tables. So is that HLOOKUP or VLOOKUP? 100% of the time I guess wrong because the H stands for “Horizontal” and I have a mental model of looking across from one table to the other. Apparently, I need the “Vertical” version because I should think about scanning down the column until I find the value I’m looking for.
My incorrect mental model also explains the problem I have with the second parameter, which is called
is_sorted. The idea apparently is that you can look up “approximate match” if you have sorted the index column. If VLOOKUP can’t find an exact match, it’ll just return the value that’s the next smallest value it finds. And because it’s lazy, it just assumes the index is sorted from low to high and immediately returns the previous value if it hits an index value larger than the key it’s looking for.
Thankfully you can change the behavior by setting
false. But guess what value is the default. Right. It defaults to the behavior no rational person would want. Madness.
Isaac Moses’ suggestion solves several of my problems with the seemingly basic task of joining two tables. It uses two separate but related functions: MATCH and INDEX. While it’s possible to use these functions for a table that’s been pivoted to have named rows, it’s most natural to use traditional named column setup. It’s also a lot easier to troubleshoot bugs since you can first test MATCH and then implement the INDEX.
To add a column from another table, start typing:
Then select the cell on the current row that you want to find in the other table:
Next select the matching column in the other table:
=match(A2, 'some other tab'!$A:$A,
Finally override the dumb default:
=match(A2, 'some other tab'!$A:$A,0)
As Isaac mentioned in the second haiku, MATCH has an option to assume the index is sorted and that’s the default (1). It also helpfully includes and option for the column to be sorted in descending order (-1). Both of those settings break everything so you have remember to ask for an exact match (0).
Now that you’ve done the MATCH step, you can verify that it returns the row that matches the key with the index. If it looks right, you are ready to add the INDEX function and the column I want to pull values from:
=index('some other tab'!C:C, match(A2, 'some other tab'!$A:$A,0))
That’s pretty much it. I don’t use it very often, so I always find myself needing to remember how MATCH works and then using INDEX. But that’s a lot easier than struggling with H/VLOOKUP.
One more thing: people struggle with SQL join syntax, but I find it much easier. The equivalent SQL would be:
SELECT C FROM this_tab JOIN some_other_tab ON this_tab.A = some_other_tab.A;
What? That’s not easier for you? Weird.