DE version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
69% Positive
Analyzed from 3359 words in the discussion.
Trending Topics
#more#study#names#applicants#bias#race#discrimination#paper#group#racial

Discussion (85 Comments)Read Original on HackerNews
That seems like a nonsensical way to measure racial discrimination. What could justify it?
It indicates there may be adverse impact to one group. It specifically is not used to resolve racial discrimination.
It's purely a signal for "we should consider asking more questions, because this appears unusual". That's what your quote says too, it "flags" a low recommendation -- it's indicating further study and investigation is likely warranted.
"Adverse impact occurs when there is (i) practically and (ii) statistically significant disparities in the selection rate for the group of interest when compared against the selection rate ′ of the most selected group ′ . Practical significance requires the impact ratio ... to be less than 0.8, which is why the EEOC guidance is colloquially referred to as the 'four-fifths' rule."
The headline numbers reflect the positions for which the 4/5 rule was triggered, not the result of some further investigation: “We discovered that 26% of Black applicants and 15% of Asian applicants applied to positions where the AI system discriminated against their racial group.” Based on the methodology, I think that means that 26% of black applicants applied to positions that were flagged under the 4/5ths rule.
it sounds like how you'd get that kind of metric at least
This doctrine is the basis for much of employment law. It is a significant reason why employers don't administer IQ tests (or equivalents) to screen candidates since ~the 90s.
A common objection to the doctrine is that it leads to unfalsifiable discrimination claims, which is why it seems nonsensical to you.
If the issue happens upstream of the defendant to a claim - generally an organization being sued by an individual with fewer resources - it incentivizes such entities to push for changes upstream, so that they don't get stuck with the bill.
There is a large body of literature concerning the question "does disparate-impact enforcement cause employers to alter hiring behavior in ways unrelated to actual productivity or discrimination?" and the answer is largely "yes". As you suggested elsewhere in this discussion, Google may be useful.
The assumption that applicants from all races are on average equally qualified for every position. Whole subfields of modern academia are based on that assumption.
Here's some analysis of what it is and why it's useful as a canary in the coal mine: https://www.prevuehr.com/resources/insights/adverse-impact-a...
> Since the 80% test does not involve probability distributions to determine whether the disparity is a “beyond chance” occurrence, it is usually not regarded as a definitive test for adverse impact. Instead, other statistically significance tests, such as the standard deviation analysis, may be used for this purpose.
But then my question recurs: isn’t this a ridiculous way to measure discrimination? It’s assuming that the only thing that differs between the different ethnic applicant pools is their ethnicity, which is essentially never going to be true.
Like. If I am evaluating a developer on lines of code written, I am a bad manager. But if an engineer has 40% fewer lines of code than the team median, it's absolutely ok for me to go, "Interesting. What's the story there? Are they slower or is there some other factor?"
Same idea -- this is purely a fast, first pass metric that can quickly assess if something warrants a deeper evaluation.
If you are trying to say "more data needed, headline misleading" you should say that instead of misrepresenting the 4/5ths rule. Also the word "can" implies uncertainty of conclusion. This isn't ridiculous, the authors point out that this is the first large scale study of this topic. Nothing has been "proven" here, it's showing that this warrants further investigation and attention.
Do you read many academic papers, because you seem to be having a rough go here.
High-risk – AI applications that are expected to pose significant threats to health, safety, or the fundamental rights of persons. Notably, AI systems used in health, education, recruitment, critical infrastructure management, law enforcement or justice. They are subject to quality, transparency, human oversight and safety obligations
That's a pretty common sense legislation to me.
Definitely open to opposing or critical views
The dataset is constructed, deliberately, to hold candidate performance constant and vary the names of candidates to appear to be associated with a specific race.
But they picked 9 family names per group. Which sounds quite low. And combined that with first names to reach 500 first+last names per group.
I wonder how much of the bias we see has to do with the names actually picked versus it being racially motivated (absolutely not denying that this probably is a factor, but might not be the only one).
For example, in France there is the national BAC end of high school exam. If you you at the names X grade distribution, and look at the higher “very good” bracket: some names are heavily under-represented (less than 5% of say “Jordan” get that grade) while some are over-represented (35% of “Josephine” get such a grade). The exam is for the most part anonymous, but some names are definitely heavily correlated with lower/higher income groups. So nothing surprising: Josephines tend to come from richer families, thus in average get better education/support, thus better grades. Same thing is true with family names to a smaller extent.
So I wonder how much of the bias we see, be it from real persons or the AI has more to do with a class thing than a racial thing. Again those are not neatly separate things, but still
I'm not saying AI is not biased, but this study does not prove that.
[0] https://arxiv.org/pdf/2605.27371
From the paper:
> Fig. 1. The pymetrics process. > Stage 1: Applicants apply to positions. > Stage 2: Applicants are directed to the pymetrics platform to play assessment games. > Stage 3: pymetrics algorithms use applicant gameplay features to recommend 58.2% of applicants per position on average. > Stage 4: Employers decide which applicants to interview or hire, typically rejecting applicants that were not recommended by pymetrics.
https://www.yahoo.com/news/us/articles/california-judge-upho...
"Cards held by African-American sellers sold for approximately 20% ($0.90) less than cards held by Caucasian sellers, and the race effect was more pronounced in sales of minority player cards."
>If the AI had recommended Black and Asian candidates at the same rate as it recommended the most-favored group (typically white applicants), 40,000 more of their applications would have advanced to the next stage of hiring.
I don't think this is the right benchmark here, or at least, it would be very interesting if the actual outcome, offer or rejected, was considered at the end.
For the AI study real data from "3.4 million people who submit 4 million job applications to 1,700 job postings across 150 employers and 11 industry sectors" was used.
They find "disparate impact" of pymetrics across racial groups, but it doesn't seem like they controlled for anything.
:)
I guess this one just compounds.
[1] https://news.ycombinator.com/item?id=48620142
AI works by learning patterns. So it will become bias by just learning from factors like education history, schools attended, employment history, ZIP codes, or geographic location. Those 3 factors alone are an easy proxy for race.
And if you add names into the equation (if the AI was trained without removing applicant names), the model can become even more bias.
I see nothing that shows any system was making a decision on race. How is the race being presented to the AI?
All this is showing from what I can see, is that certain groups of people were more often denied a next step in the process - but why?
Was the AI going by spelling and grammar? Were there names that were different but the rest of the resume was exactly the same? Were there pictures?
There were mentions that the rate of each group may be more prominent in the data when you split apart different types of jobs instead of all jobs in aggregate.. One could read that like it's inferred; that more warehouse jobs are offered to a race and less admin jobs.. but that same would happen if AI was more focused on perfect grammar for one job and it was not as much of a factor for a warehouse job.
Also if the people applying for the various jobs were self selecting, acceptance percentages this would skew things based upon which ones were applied / not applied to right?
There are so many ways you could draw conclusions like this from data, however correlation is not causation, yet this seems to say it is.
I feel this is an important thing to watch, but Stanford may not be the place to trust with 'Policy Recommendations' as it's very unclear there is any proof that 'AI Hiring Tools Yield Racial Bias and Systemic Rejection' from this study and paper.
PS - now that I see the HN title did not have the word "can" in it, and the title of the article is actually "Tools Can Yield" - maybe that is less accusing and more noting.
I tried it before, and discrimination is there, I would get one resume rejected quickly and few days later the same company would invite another resume for a screening call. I tried this before and after AI hype, results weren’t that different btw, and that was tested in US and Canada employers only.
Only 40% self report gender/race
no resume data, no education information, degrees, schools, GPA, major, work experience, skills/certifications
Zero job qualifications
I would be surprised if the results were different.
> 30% of Black applicants apply to at least one position that demonstrates adverse impact against Black applicants.
The whole thing reads like a tautology.
Some people just can't help but put their biases on display at every opportunity, even when it comes to the most minute details.
The phrase "most-favored" means, "most recommended by the AI relative to the field".
What did you think this sentence meant?
Hypothetical SAT score: 1060
How does that help you predict the race of an individual applicant? It's been a while since I took the SAT, but I didn't realize one's score provided so much information.
The paper's conclusion, that we need to study this more, is showing the authors likely believe this to be a byproduct of inherent/invisible bias.
(I assume they're just using a big LLM for this, it doesnt say, it just says "AI" when they say "AI like that they usually mean LLM".. A custom trained hiring ML system would be better)
> There is no such thing as anti-white racism.
If you find yourself wanting to disagree with that then, I'm sorry but you simply don't know what racism is. Racism is pervasive, insidious and systemic.
A good example in the hiring space is what's called the "second syllable name problem". Traditionally Afrcian names often stress the second syllable (eg Jamal, Lakisha, Malik, Lashonda). Studies have shown that such names have higher rejection rates in job applications [1]. So if you're wondering about the four-fifths rule, it's because it exposes this kind of bias. It's not proof of bias. It simply means further investigation is required.
The problem with AI hiring tools is the logic is opaque. You have no idea why an AI system is rejecting or selecting candidates and you may find it's doing something illegal. Some companies want to hide behind this opaqueness, arguing that if no explicit decision was made then there is no bias. But that's not how system racism works.
There are many such signals that correlate with race that if they affect selection rate, it could be a problem. Did you go to an HBCU? Was your high school in a minority-majority area? What about your previous employers?
This kind of bias doesn't have to be intentional.
[1]: https://www.npr.org/2024/04/11/1243713272/resume-bias-study-...
> If you find yourself wanting to disagree with that then, I'm sorry but you simply don't know what racism is.
You are saying that if you think anti-white racism can exist, you don't know what racism is. That's obviously ludicrous.
Too many of these studies only focus on percentages and the end result is unqualified candidates getting hired from minority groups at the expense of qualified ones.
The authors are saying it's worth doing more research, because in a controlled data set the results appear unbalanced.
Looks like you didn't read the paper. There are no resumes involved. It is about assessment games.
Happy to share some sample reports if anyone is interested!
1. https://www.latentevals.com/