Bias and fairness: when fluent is not fair
AI systems learn the patterns — and the prejudices — in their training data, so a confident, fluent model can still produce systematically unfair outcomes, especially in the consequential decisions regulators now scrutinize.
Working and Deep are Pro — free during launch.
An AI system learns by finding patterns in the data it is trained on. That is its strength and, for fairness, its central liability: the patterns it absorbs include the prejudices, gaps, and historical inequities baked into that data. A model trained on a decade of hiring decisions learns who was hired, including any quiet preference for one kind of candidate; a model trained on past lending decisions learns who was approved, including the legacy of who was historically shut out. The system has no notion of fair or unfair. It optimizes for reproducing the patterns it saw, and if those patterns were skewed, the output is skewed too. The danger for an executive is that the skew arrives dressed in the language of objectivity. A confident, fluent, fast model that produces a number or a ranking feels neutral, and that feeling is precisely what makes systematic unfairness hard to spot until it has already affected real people at scale.
It helps to be concrete about how bias enters, because the mechanisms are mundane rather than malicious. The most common source is the training data itself: if a group is underrepresented, mislabeled, or appears mostly in negative contexts, the model performs worse for that group. A landmark 2018 study, Gender Shades, tested commercial facial-analysis systems and found error rates under one percent for lighter-skinned men but as high as roughly 35 percent for darker-skinned women, because the datasets the systems learned from were overwhelmingly lighter-skinned and male. A second, subtler source is the choice of what the model is asked to predict. In a widely cited 2019 study published in Science, a healthcare algorithm used on millions of US patients predicted future health costs as a stand-in for health need. Because less money had historically been spent on Black patients with the same conditions, the algorithm systematically underestimated their needs, and correcting the flaw would have roughly doubled the share of Black patients flagged for extra care. No one wrote a biased rule. The proxy was convenient, and convenient proxies inherit the inequities embedded in the thing they stand in for.
A genuinely uncomfortable finding from the research is that fairness cannot be fully optimized, because the common definitions of fairness contradict one another. The clearest example came from the debate over COMPAS, a US criminal-justice risk-scoring tool. In 2016 the newsroom ProPublica reported that, among people who were not rearrested, Black defendants were nearly twice as likely as white defendants to have been wrongly flagged as high-risk. The vendor responded that its scores were equally accurate across races: a given score meant the same probability of reoffending regardless of group. Both claims were true at once. Researchers then proved mathematically that whenever two groups have different underlying base rates, no single tool can satisfy every reasonable fairness definition simultaneously. This is not a flaw waiting for a clever engineer to patch; it is a property of the math. The practical consequence is that fairness is not a setting to switch on but a deliberate choice about which kind of error to minimize and for whom, and that choice is a leadership decision, not a technical one.
Why this matters to your organization is increasingly a matter of law as well as ethics. AI systems now sit inside the consequential decisions regulators watch most closely: hiring, lending, insurance, housing, healthcare, and benefits. Long-standing anti-discrimination law generally does not care whether a human or an algorithm produced a disparate outcome, and it often does not require intent to harm. In the United States, a discrimination suit over an AI hiring-screening tool, Mobley v. Workday, was permitted in 2025 to proceed as a nationwide collective action covering applicants aged forty and older, a signal that both the deploying employer and the AI vendor can face exposure even when no one intended to discriminate. In the European Union, the AI Act treats AI used in employment, credit, and access to essential services as high-risk and requires that training and test data be examined for bias that could produce discriminatory outputs. The reputational exposure compounds the legal: a biased outcome in a sensitive domain is the kind of story that travels.
None of this means AI is inevitably unfair, and it is worth resisting both the hype that algorithms are neutral and the doom that they are irredeemably prejudiced. A well-instrumented AI system can be more consistent and more auditable than the scattered human judgments it replaces, because you can test it, measure its outcomes across groups, and document the trade-offs in a way that is rarely possible with a room of individual decision-makers. The same Gender Shades study that exposed the disparities also prompted the named vendors to retrain their systems, and a follow-up audit found that all three substantially narrowed the gaps within about seven months, which shows that measurement creates pressure and pressure produces fixes. The difference between a fair and an unfair deployment is rarely the model and almost always the surrounding discipline: whether anyone checked the training data for representativeness, whether outcomes are monitored across groups over time, whether a chosen proxy actually measures what matters, and whether a human remains accountable for consequential decisions.
For an operating leader, the useful posture is informed scrutiny rather than either blind trust or blanket prohibition. The questions worth carrying into any AI decision in a sensitive domain are concrete and answerable. What data was this trained on, and which groups are thin or missing in it? What is the system actually predicting, and is that the same as what we care about, or merely a convenient stand-in? Has anyone measured outcomes across the groups our anti-discrimination obligations cover, and how recently? Which definition of fairness did we choose to optimize, since we cannot have them all, and can we explain why? And when the system gets it wrong, who notices, who is accountable, and how does a person appeal? An organization that can answer those questions has not eliminated bias, because no one can, but it has converted an invisible risk into a managed one, which is the realistic goal.
Where this comes from.
- Buolamwini & Gebru, "Gender Shades" (2018, PMLR) — intersectional accuracy disparities in commercial gender classification
- Raji & Buolamwini, "Actionable Auditing" (AIES 2019) — named vendors narrowed the Gender Shades accuracy gaps within ~7 months
- Dastin, "Amazon scraps secret AI recruiting tool that showed bias against women" (Reuters, 2018; via MIT Technology Review)
- Obermeyer et al., "Dissecting racial bias in an algorithm used to manage the health of populations" (Science, 2019)
- Angwin et al., "Machine Bias" (ProPublica COMPAS investigation, 2016)
- Fairness impossibility result — Kleinberg, Mullainathan & Raghavan, "Inherent Trade-Offs in the Fair Determination of Risk Scores" (2016): fairness criteria cannot all hold when base rates differ
- Chouldechova, "Fair prediction with disparate impact" (2017) — independent proof of the same impossibility result
- Guilbeault et al., "Age and gender distortion in online media and large language models" (Nature, 2025)
- Federal court allows Mobley v. Workday AI-hiring bias suit as a collective action (Holland & Knight analysis, 2025)
- EU AI Act, Regulation (EU) 2024/1689 (official text)
- NIST AI Risk Management Framework (AI RMF 1.0) — govern/map/measure/manage functions; distinguishes systemic, computational, and human-cognitive bias
- ISO/IEC 42001:2023 — Artificial intelligence management system (AIMS) standard