Today it’s almost impossible to find a company that hasn’t embraced data in some form. “We need more data” is a statement perfectly vague enough to allow no opposition.
But what to do with data.
Data can tell you what happened, but it can never tell you why it happened.
Moreover even when the why is unlocked, a creative response is more often than not still required to move beyond it.
Long story short, data alone is never the answer.
Yet everyone loves to talk about data. Fewer though, like to talk about measurement. Fewer still about what to measure, or even what the data is trying to uncover.
Maybe part of the challenge is that most of us (me included) don’t really have a shared definition, nor understanding, of what is meant by ‘data’.
Most people concerned with running a profitable business just know they want it, and probably need it.
The truth is data is useful. It can help answer a variety of questions:
- How many employees do we have?
- What product do most people buy on our web store?
- How many songs have the Wellington International Ukulele Orchestra released?
And data can get nuanced. Consider an app, here’s a few things you could look at:
- App store views per day
- Conversion rate
- Downloads per day
- Daily users
- Session durations
You know as well as I do when it comes to data, there is no shortage of it available to us.
However no data on its own will tell you what you need to change (or do), to get the outcome you might be looking for.
But maybe we should back up a bit – what is data?
It boils down to two things:
- Quantitive (measuring things)
- Qualitative (categorising the qualities of things)
In her 2019 talk at The Conference Mälmo, Andrea Jones-Rooy, Data science professor at New York University, outlined three main ways to collect data:
- Automate it (E.g. steps on your phone)
- Ask people (E.g. surveys)
- Conduct randomised tests (E.g. A/B tests)
Automation is good because, let’s be honest, there’s no human in the mix. Less error.
Asking people is good but often you get what people say they would do, not what they actually would do.
Randomised tests are great, and it can add speed and objectivity to decision making.
When it comes to digital product development, we could also add observing people, which is good, but is often performed under conditions that are not completely natural to your average wily human being.
As you probably already know… recommended best practice is to use a combination. And even when using all of the above, there is still work to be done. Data is not the end. It’s not even the beginning. It’s a messy place somewhere in the middle.
The key to it all is to think of data not as an answer, but as a question.
So that anytime we see data, we should ask ourselves: OK, so why is that? and what’s driving this?
If we don’t, then we risk jumping to conclusions, and in turn jumping to potentially ineffective, biased, or even detrimental solutions.
Andrea Jones-Rooy recommends reading up on the scientific method. Behold:
- Isolate your Question
- Test hypothesis
- Draw conclusions
- Report findings
So what does this all mean? It means being curious enough to isolate the question is key. What is it that you want to find out? And why? Now turn that into a question.
It’s also worth pointing out that Data is only part of it. On either side you need to spend time figuring out what to measure, how to measure, how to validate, and how to interpret.
It also means getting multiple inputs, and considering multiple consequences. And you can still be wildly wrong with your conclusions.
This is mainly because data is not truth. It’s a collection of selected facts that we’ve decided to pay attention to, assembled in a way that hopefully sheds light on the question.
Meaning, the inputs to data (even big data) are only ever a sliver of reality. In this way data is inherently flawed, but that doesn’t mean it’s useless. It just means it’s up to us to pinpoint what the data is pointing at.
Andrea Jones-Rooy believes everyone can be good at this:
“We can all be data scientists, there is nothing really stopping us. There’s a lot of resources out there to code better, but what data to even look for in the first place, is an equally important aspect, that doesn’t get as much attention.”
Baseball Hall of Famer Willie Keeler famously said: “Hit ’em where they ain’t”. At a glance this may not sound profound (or even relevant) but it is quite intuitive.
The ‘data’ would only tell you where the players are. What you need to do from that data is to hit ’em where they ain’t.
Hannah May, in her New Yorker piece on What Data Can’t Do, gives many superb examples of data and humans at play, and she warns:
“Numbers can be at their most dangerous when they are used to control things rather than to understand them.”
Interestingly that’s usually what most people are trying to do. Control the data. But we can’t control it. We can only use it to attempt to understand something, or as input towards a creative solution that hopefully gives us the outcome we want.
Hannah May, in that same piece, goes on to show how very complex prediction models are often only slightly better than very simple ones. In short: more data isn’t always better.
She’s tells the story of Jakob Nielsen, who analysed 83 of his own product studies — and found that 85 percent of the problems were observed after just five people. Just. Five. People.
Jake Knapp and team arrived at the same conclusion, and implemented it within the Design Sprint approach.
“Five is the magic number. After five customer interviews, big patterns will emerge.”
Even the great Bill Benter couldn’t control the data. He did however build an algorithm that earned him millions of dollars once he isolated the right questions, inputs, and even operational logistics over the course of 10+ years.
Today his document “Computer-Based Horse Race Handicapping and Wagering Systems: A Report” has become a playbook for budding computer science gamblers around the world.
But if we are honest with ourselves, all we really want to know is: why did that work, or why didn’t that work.
But the data by itself won’t tell us that.
It’s also easy to spin data the wrong way. “Repeat customers have a 30% chance of converting” is identical to “Repeat customers have a 70% chance of not converting”. One sounds optimistic, the other less so. Neither illuminates why.
This is where “Ok, so why is that?” comes in. Because that will lead to a more interesting place, towards what might be behind those numbers. (It might even lead to more data.)
Right now it feels like we are at a peak data moment. Especially with large language models and Ai running rampant.
Data privacy and GDPR changed things a lot, but still, it’s never been easier to collect data with the wide variety of tools and methods available.
Even so we need to go past being “data-driven”, and elevate the things that are on either side of it, like what is it we are trying to find out, and what should we do with what we now know… and especially, what consequences might there be.
From a digital product design point of view, I’ll leave you with my attempt to rework the scientific method into this unscientific method.
Feel free to improve it:
- What is your question?
- How can you interrogate that question? (Data)
- What is the data pointing at?
- What action should you take?
- What might be the consequences of that action?
- Inject creativity to do/make a thing
- (Actually doing/making the thing)
- Release the thing into the wild (test)
- Measure what happened (more data)
- In the data: if something is true, what else might be true?
- Ask more questions
Bonus operational list:
- What are you going to measure?
- How will you collect the data?
- How will you validate what you collect?
- How will you interpret it all?