4 min read
Inclusive Research at Pace: How AI Helped Us Listen to the Users Who Are Hardest to Reach
Most services work well for the majority of people – the users who meet the criteria, have the right documents and understand the process. For others, they hit friction points. This can show as eligibility questions with no room for nuance, payment steps that fail without explanation and digital journeys that stop without telling the user why or where to go next – ‘unhappy paths’ through a service that was never designed with them in mind.
Unhappy paths are where many services fail. They generate failure demand, complaints and appeals, excluding the people who most need access. And they are almost always under-researched because the users who experience them are harder to recruit, harder to schedule and take longer to understand.
Teams, especially those working in Government, understand users through ongoing research. In practice, time and cost constraints mean most teams research with small numbers of users on a regular basis, which is rational. It is also how marginalised users end up less visible in the evidence base, and how the difficult edges of a service avoid getting designed properly.
The challenge
Working with a government agency on their digital service, we encountered this problem directly. Hundreds of thousands of applications are processed annually and are required by users to meet a legal requirement.
The service had largely been designed around the eligible applicant. Our understanding of unhappy paths was weaker: applications that failed, applicants found to be ineligible and edge cases involving complex personal circumstances. These were the users we needed to understand better. And they were exactly the users who are hardest to recruit.
This organisation was able to assemble a larger than usual sample. We researched with over 40 participants, many of whom had declared disabilities, lower literacy levels and complex circumstances including criminal history and right to work ineligibility. The research provided rich, varied data which was exactly what was needed to understand where the service might break down.
We had a two-week window to analyse this data and generate a suite of GDS-compliant artefacts.
The decision to use AI
Faced with 40 interview transcripts and a two-week deadline, the options were to narrow the sample, simplify the analysis or find a way to do it properly at pace.
Narrowing the sample would have meant losing the very evidence the research was built to capture. Simplifying the analysis would have meant flattening nuance and potentially dismissing difficult accounts as outliers. Neither was going to achieve the outcome we needed.
We brought in AI tools to handle the analytical heavy lifting, which is a well-evidenced use of the technology. A 2024 pilot study on the value of generative AI for qualitative research (Pattyn, 2024)(1) found that it completed tasks with four times less effort and fifteen times faster throughput than human coders. So, our question was not whether AI could assist qualitative analysis, it was whether it could do so without losing rigour or integrity.
Used with discipline, we found the answer was yes.
We designed multi-layered prompts around assessment expectations from the beginning. They functioned as both an analytical method and a form of design documentation. Themes were surfaced systematically across all 40 transcripts, stress tested across rounds of research and outputs were assessment-ready by design.
Researcher accountability was non-negotiable throughout. AI outputs were validated against interview notes and human-led synthesis, then workshopped with the organisation. Researchers who had been present in sessions verified that generated themes reflected what they had actually heard. Accelerating analysis was only made safe by that human check.
What the analysis revealed
Eligibility was a complex problem in the service. A significant proportion of applicants each year invest in training and pay fees despite their circumstances meaning they will face refusal. Criminal history, overseas checks and right to work status are not simple yes or no questions for many applicants.
An early assumption was that a separate eligibility checker, aligning with established government patterns, would resolve this but our research started to show that it wouldn’t. Many applicants could not reliably assess their own eligibility, particularly where the determining factors were complex or hard to accept. A standalone checker, however well designed, would not catch them.
Because analysis was moving fast enough to feed directly into design exploration, the team did not stop at that finding. Structured workshops examined alternatives against policy requirements, business processes, and user needs. Ideas were tested, found insufficient, and replaced. The process identified embedding adaptive eligibility checks within the application itself as a more promising direction, supporting better decisions at the points that mattered most. That conclusion required the time and space to explore which AI-assisted analysis had created. The team had space to deliver more robust findings.
Two outcomes worth naming
The first is about inclusion. Government digital teams aim to include marginalised or excluded users in research, but in practice this evidence often comes from a very small number of hard-to-recruit sessions. Here, a larger sample made it possible to see where patterns genuinely held and where they broke down. Users who challenged assumptions were able to shape the direction of the service.
The second is about researcher wellbeing. Much of this research surfaced trauma, distress, and open hostility. Repeated re-immersion in that material is a real cost to researchers. AI-assisted analysis reduced that exposure while still ensuring those realities informed personas, journeys, and design decisions making inclusive research more sustainable.
What followed at assessment
At assessment, the panel reflected not only on the volume of evidence but on the quality of thinking it demonstrated, particularly where complexity had been worked through in the open. Insights remained traceable to evidence and the difficult cases were visible and worked through.
Used well, we found AI did not compress research into something less. It made it possible to go broader without losing depth. In this case, it gave the team the capacity to understand users who had previously been less well understood, and to design a service more likely to work for all of them.
The AI tools didn’t do the thinking but they did make more thinking possible.
Want to design services that work better for everyone? Explore our User Centred Design approach, or talk to us about how we can help make your digital services more inclusive, accessible and effective.
(1) Pattyn, F. (2024). The value of generative AI for qualitative research: A pilot study. Journal of Data Science and Intelligent Systems, 3(3), 184–191