RU version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
48% Positive
Analyzed from 4931 words in the discussion.
Trending Topics
#data#census#privacy#should#don#government#more#why#https#differential

Discussion (160 Comments)Read Original on HackerNews
It’s a census: it just asks questions.
If you start publishing and weaponizing the data against people with various attributes, they’ll just lie or not answer. And then you are left with worse than nothing: bad data people try to act on.
The real push for this now is to form lists of people to disenfranchise.
You think the census is what the government would use to mass identify and imprison people, not the NSA database(s)?
You think homeland security, or the FBI, or any other alphabet agency doesn't already have access to a giant list of people?
Think about what meta knows about everyone, or Google. You do realize that the US gov has read access to their core databases right?
"The census" has absolutely no bearing on any of that which you're worried about.
It's just shocking the level of ignorance that gets upvoted in the comments here now.
Cell tower data, credit bureau integration, social media scraping, palantir, smart home device surveillance, DNA database exploitation, facial recognition networks, tax, payroll, passport, visa, medicare/medicaid, immigrations and customs databases and many more...
The census is a historical relic used to jerrymander congressional seats, and that's about it.
I'm all for keeping all of this data private. But to think it isn't already available is a bit 'head in sand'. Maybe put laws in place for 'general' privacy across all data, before getting too inflamed about Census in particular.
First they came for…
Because making it esy to find all the rich people just seems like a very bad idea given the direction things are going.
When it was broad, the only thing you could do was locate, say, large minority groups. Blacks and latinos for instance. And even that led to problems. I can't imagine what will happen when we can drill down and tease out immigrants from citizens. Gay from straight. Rich from well to do. And so on.
Something about this conversation is fundamentally broken if there's no space to iterate towards optimization and instead it's just swinging between maximalist extremes.
Thats what dutch and french bureaucrats thought until 1940.
You really, really don't want a government who can build a unified profile on you in that way.
The comments that this rather expensive endeavour should just be about getting a head count are also amusing to me. The data collected was such an important baseline of common understanding, and this will not be a good thing for its future quality. I've grown very jaded now seeing all the things taken for granted in this country and lost or degraded recently with a whimper.
*: To be fair, they sent me specifically to places that didn't respond, so I was naturally led to believe that everyone in my region hated the government, ignored bizzarrely threatening fliers, or had recently moved and had no knowledge of the inhabitants (if any) during the census period.
"What is your religious affiliation". Seems perfectly innocuous, but turned out to be retroactively fatal if your answer could be attributed to you by a certain foreign occupier in the 1940s .
Boy were the Germans happy to find these.
The American obsession with asking for people their perceived origins (AAPI, AA, Latino, ...) is more than weird: it's downright dangerous. Don't fucking ask these questions, and never, ever write it down, especially not with names.
Thankfully, now they can just buy it from data brokers and let Palantir target, so that makes life easier for them
https://www.census.gov/programs-surveys.html
The American Community Survey is the most well-known, as it replaced the “long form” sampling that had been an extension to the Census.
no person shall be compelled to disclose information relative to his religious beliefs or to membership in a religious body.
https://www.congress.gov/94/statute/STATUTE-90/STATUTE-90-Pg...
Doesn't that mean they can ask that question with an option for "rather not disclose"?
Differential privacy is absolutely necessary, and the social scientists being unable to reconstruct the data at an individual level is intended. A macroscopic description is rather enough for most purposes, and anything more is asking for a surveillance state.
That frankly sounds more like a failure of enforcement, on top of a failure of the construction of the financial system. Here in Germany, it is absolutely not a common thing that mortgages or the banks holding them get sold like a hot potato towards some other sucker, and thus such a letter would cause immediate suspicion.
I think a large amount of the US’s success is the result of good institutions handling granular data. Policies can be adjusted to match outcomes more rapidly than otherwise.
I understand why people decide to diminish all state capacity - they feel that governments are populated by their opponents who will use state capacity against them. But as our relative strength wanes, our ability to overcome these forces of inertia does as well. And then our governments become less capable and eventually life starts getting worse.
We don’t need house-level data immediately (except perhaps in order to place census blocks within their appropriate congressional district etc). But there are aggregation units above which we should be using as good information as we possibly could be.
Intentionally damaging infrastructure is the recurring theme of this administration.
Seems like something that could be abused to achieve political objectives.
I don't know what the political undertones are here, but at some level you need to have actual ground truth, including "this person/household declined".
Publishing raw data though? That seems like shooting yourself in the foot from a national security perspective, not to mention all the other reasons not to do it.
It is introduced in the public data, not the secret data.
Or it's saying that one of these conflicting goals is more valuable than the other, and so shouldn't be sacrificed for it.
> The census bureau decided to adopt differential privacy for the 2020 Census
and:
> The consequences will be dire for utility or for privacy, and possibly both. It's hard to understate this point: future statistical releases will either be useless compared to past ones, or they will be incredibly unsafe
so we took the census for centuries before this point, and it was “ok.” and for the last census only we added some privacy items. but if we remove just one of those filters, we are in “dire” circumstances? but there were no privacy features before. so we’re actually still much better off than we were for hundreds of years before this.
this makes it feel like an emotional overblown problem
Privacy issues that weren’t possible before due to cost are now pennies to exploit. Also keep in mind as it points out people were using census data to drive gerrymandering efforts, so these attacks are real and have been going on for a long time.
One notable thing we have today that we didn't have 100 years ago is a computer. Before, you could reasonably assume that recreating individual records wasn't feasible, at least not on a large scale. You can't assume that now. A 4 digit password was safe for hundreds of years, but it would be a security lability today for the same reason.
No it is not an overblown problem.
Eg via some app that instructs respondents to enter a specific answer in a pseudorandomly chosen question.
Of course security would be another question.
Do. The American Census Survey (randomly-selected long-form questionairre) is dangerously overinvasive.
I was a big fan of differential privacy but now I think it might be doing more harm than good, as I haven't seen a single case where it was applied successfully in a problem where it actually mattered, and it contributed strongly to discrediting and preventing a lot of work on other anonymization techniques as it was deemed the only way to preserve privacy by the research community, so showing up with enhancements to k-anonymity or any other noise mechanism not rooted in it was a sure way to get ridiculed and ignored. And it's just not a practical mechanism, even when it works for a single disclosure you always end up having to blow up the privacy budget to a ridiculous amount in order to keep disclosing statistics as otherwise you would for almost all real-world data run out of budget after a few publications.
So, for me it's a technique that works in the areas where it doesn't really matter (publishing highly aggregated statistics that pose almost zero privacy risk even without differential privacy) and doesn't work in other areas where it would actually matter (publishing fine-grained data about individuals or small groups). There are some niche use cases but in my view the privacy community has really overblown the importance of differential privacy by portraying it as the only way to reliably anonymize data.
BTW the German census bureau has an interesting approach to anonymization which they use for several decades already and so far I haven't heard of any cases of successful de-anonymization of the data, maybe the US bureau should have a look at that for their own needs.
As the article says anytime you want to enforce privacy, the data becomes somewhat less useful, there is just no way around that.
The point of rights is that we have them and that they should not be trampled upon when they become slightly inconvenient to someone in power.
They weren't prepared for data that was obviously noisy. The data has always been inherently inaccurate, and folks just chose to ignore that previously
1: https://www.aeaweb.org/articles?id=10.1257%2Fpandp.20191107&... 2: https://www.science.org/doi/10.1126/sciadv.abk3283?utm_sourc... 3: https://www.nationalacademies.org/read/27150/chapter/14
4: https://hdsr.mitpress.mit.edu/pub/7evz361i/release/2
* I want to accurately report the finances of our company to the best of my ability.
* But that report would allow people to reconstruct private data about the terms of our contracts with various counterparties. I'd really like to avoid that, there's no rule that says we're supposed to release that data. In fact some of those contracts probably came with nondisclosure agreements!
* So here's what I'm going to do. I'm going to calculate our results to the best of my ability, and then I'm going to add random values to them and report only the randomized ones. Any reconstruction people try to do will be wrong because of the randomness.
* If the SEC says "no, you need to report your actual numbers", I will explain to them that there's no such thing as an actual number because all data is noisy.
I can't get behind it.
You can of course disagree about what what should actually be part of a transparent public record. (Though I suspect a lot of people post-date what was generally available in a "phone book.")
because the next two years are going to become insanely miserable
https://www.statenews.org/government-politics/2026-06-12/ohi...
Representative Joyce Beatty is from Ohio and was instrumental in stopping Trump from illegally renaming the Kennedy Center.
https://www.theatlantic.com/culture/2026/06/kennedy-center-b...
Fundamentally this is public data. If it's to dangerous to make public, it's too dangerous to collect, and people should be aware of exactly what it is.
There are very few things that the state has data on that should not be made public. Census data is simply not one of those things.
publishing should be the default for any data, and to keep it unpublished should require substantially good reasons that impact the country as a whole. Frankly, if it isn't detailed national defence plans, i struggle to see any data that should not be public.
The biggest challenge with running a census is getting people to trust you enough to answer your questions.
A lot of census questions are sensitive. The ACS covers topics like citizenship status, disabilities, income, SNAP assistance, languages spoken at home.
If you want accurate information about the people who live in your country you need the census process to feel as safe for people to respond to as possible.
Are you saying the census shouldn't collect any data that people wouldn't be comfortable publishing? Because that's a recipe for a census that is far less useful for helping the country make useful decisions.
I'll say that. The state representatives should provide congress and the president any data needed to inform policy decisions about the people they represent. And as others have pointed out, other departments and agencies (such as the IRS) have most of the rest of the data required to make policy decisions.
Except for gerrymandering purposes, I fail to see why income, party affiliations, etc., is useful for the purpose the census was created for.
https://www.census.gov/topics/public-sector/voting/about/faq...
> the CPS Voting and Registration Supplement does not ask any questions of a partisan nature.
There are laws in place forbidding government agencies from merging together datasets.
The last thing people should support is creating of profiles of individuals by combining data from different government agencies. This is why the census is so important as a data collection mechanism.
> The actual Enumeration shall be made within three Years after the first Meeting of the Congress of the United States, and within every subsequent Term of ten Years, in such Manner as they shall by Law direct.
The key thing you're missing is "in such Manner as they shall by Law direct".
Congress has passed a whole bunch of laws that attach additional responsibilities to the census for the purpose of supporting government decisions.
The Permanent Census Office Act of 1902 for example, which established the census office and tacked on "an annual survey of cotton production, and other economic censuses" https://www.census.gov/about/history/historical-censuses-and...
I don't understand why the census would include SNAP data or income: surely the government already has that information. I have never doubted that the IRS knows my income better than I do. Maybe better use of existing datasets could restrict the census to less invasive questions.
Detailed census records are published 72 years after they were collected; the last release (of 1950 census data) came out in 2022; the next one should be published in 2032.
See: https://www.archives.gov/research/census
https://prologue.blogs.archives.gov/2022/01/20/census-record...
TBH I don't think the people who wrote this knew how much collateral impact it would have.
I don't trust the Census Bureau with my data, so if this is as "dangerous" as the author and some people here seem to think, they shouldn't be collecting it in the first place.
This works by the same principle as how nobody ever drives faster than the speed limit.
2. Without noise injection it's rather simple to do statistical attacks to reverse engineer individual entities.
3. This data is and has already been used in the past to undermine democratic systems by targeting and disenfranchising minorities, as well as gerrymandering the US to hell.
4. "Too dangerous to make public, too dangerous to collect" - this is a false dichotomy. To govern effectively you need sensitive data, but it should be collected and used in a way that's safe for the individuals.
5. Macro level aggregates don't need individual exposure, that's why noise, anonymization and statistical functions are fine.
They do. After a substantial delay. Pretty handy for geneological research, while protecting privacy for the living.
But the devil is in the details. If we don't want advertisers constructing semi-complete profiles from simple web interactions then why would we publish 330 million census questionnaires for their use?
But we do. A detailed census is essential for making good policy. For example, knowing the age and distribution of children across the country helps local and state governments decide where to put the next school or children's hospital. The federal govt. allocates funds for education and daycare accordingly.
The census is the best and most important measure of govt. policy. Taking it away would leave everyone worse off.