Training our own AI models

202

ttartieret about 19 hours ago 139 commentsRead Article on posthog.com

RU version is available. Content is displayed in original English for accuracy.

⚡ Community Insights

Discussion Sentiment

72% Positive

Analyzed from 3282 words in the discussion.

Discussion (139 Comments)Read Original on HackerNews

JimDabell•about 19 hours ago

“Opt-in by default” is an oxymoron. If it’s default then I haven’t opted into anything. It’s been enabled by default.

xnorswap•about 18 hours ago

This frustrates me too, if something is "opt-in", that means by default you're not included and can choose to be included. If something is "opt-out", that means you're included and can choose not to be.

But then it gets used to describe the reverse, and we have to add words to clarify.

I once saw a post here with a correctly described opt-in telemetry before, and the top comment here was attacking them for the reverse, thinking it was including them by default, so there's little winning, it's one of those words that has just come to mean it's opposite.

thundergolfer•about 14 hours ago

“Opt-in by default” is just Opt-out. We already have the term.

FeteCommuniste•about 16 hours ago

They are opting you in by default. Very cool.

burnte•about 16 hours ago

Never miss opting in to newsletters again with NewsAI! NewsAI automatically opts you in to thousands of newsletters and marketing campaigns around the world in an instant!

bachmeier•about 16 hours ago

But that's not the standard definition of "opt-in". See for instance MW: "to choose to do or be involved in something".

dolebirchwood•about 15 hours ago

Yes, we know. It was sarcasm.

spl757•about 8 hours ago

CEOs and the like speak Weasel and only use Weasel Words.

deflator•about 19 hours ago

Very true. I was considering PostHog, but this sours them in my eyes. Very deceptive wording.

skrebbel•about 18 hours ago

They could’ve done a lot worse, and most companies would’ve.

deflator•about 18 hours ago

There's a helpful response. It could always be worse!

mannanj•about 18 hours ago

Isn't it kind of like mandatory tip? If you haven't given it voluntarily, i.e .its automatically opted-in and you maybe can't even not give it. its the same.

abustamam•about 18 hours ago

Many restaurants have the audacity to add a 20% (or whatever percentage) "service fee" that isn't considered tip. It even says something like "we use this to pay our staff competitive wages and health insurance." You can't opt out. It's just part of the bill. Then they have the gall to ask for a tip on top of that.

I've taken to a) leaving a negative Google or yelp review for such establishments and b) never coming back. This is a practice that needs to die.

rectang•about 18 hours ago

Do you leave a negative review if they add the service charge but don't ask for a tip?

croes•about 18 hours ago

Opt-in by default means it is either mandatory (if you can‘t disable it) or it‘s opt-out (if you can) Opt-In by default is BS to make it sound less invasive

abustamam•about 18 hours ago

Imagine if they said paying taxes were opt-in by default. No, it's mandatory! Sure you can technically not pay taxes but you won't have a good time.

irishcoffee•about 18 hours ago

You were given the option to option in, by default. Clearly it makes sense, optioned out by default only happens when someone loses money on the option in default instead.

The internet really stinks. The 1999 teenager in me somewhere is really bummed.

Waterluvian•about 18 hours ago

PostHog was a system we set up once, generally don't think about, and review from time to time, providing some occasional value. It was mostly harmless to leave around.

But it's apparently yet one more thing we have to be actively suspicious of as it defaults towards an intolerable state. So it's easier to just rip it out of the system and move on.

rafael-lua•about 12 hours ago

PostHog was wonderful as an analytics solution, with its developer-first approach, good tools, and nice pricing.

At this point, I lost count of how many times in the last two decades I fell for it, although I'm more used to it now. Companies that grow into success and change. With the AI frenzy, PostHog also started going all-in, and not only that, but they seem to be exploring no-coding tools and whatnot. Supabase is another one that was cool, but now it is in the AI abyss.

Indeed, at this point, I'm the constant. Maybe I'm the problem here, and perhaps I should learn to accept the new AI overlords, give up and go full AI.

sixtyj•about 19 hours ago

Most companies would bury this change in a deceptively boring T&Cs update, but we value transparency, so here's what you need to know in an internet-friendly numbered list:

Users on our EU cloud instance are opted out by default

So too users with agreements that prevent training (e.g. BAA, MSA, or similar)

All other users on our US cloud instance are opted in by default

We will anonymize all data before it's used for training

We will only use data that already exists in your PostHog instance

We will do all the model training ourselves, which means...

We won't sell or send your data to third-party model providers

You can opt out at any time via your org settings in PostHog (admin access required)

Training won't start until June 29, so there's plenty of time to decide

teraflop•about 18 hours ago

> All other users on our US cloud instance are opted in by default

Cool, cool. Glad to see that you are the arbiter of what your users have "opted" to do, and their input isn't required.

While we're at it, I'm going to "volunteer" your time to rebuild my patio this weekend. You don't need to worry about volunteering, I've done it for you.

trollbridge•about 14 hours ago

If you send a postcard by Friday 5:00 PM, you can opt out of helping me with my patio. This is a one time offer; if you don’t, you have to help me every weekend for forever.

mark242•about 19 hours ago

If "we will opt everyone in because otherwise we won't get enough data because we know users won't opt in" is your business model, maybe it's time for a rethink.

micromacrofoot•about 18 hours ago

this is the business model of all companies training AI, if they had to get permission we wouldn't have frontier LLMs at all

abustamam•about 18 hours ago

So it's OK to do stuff without permission as long as we get something that makes a lot of people a lot of money?

48terry•about 16 hours ago

> if they had to get permission we wouldn't have frontier LLMs at all

Don't threaten me with a good time.

bigstrat2003•about 16 hours ago

That would be a good thing TBH.

freshnode•about 18 hours ago

It's frustrating as we literally just moved to it. Back to Mixpanel?

abustamam•about 18 hours ago

I'd recommend at least doing a short spike to see if you can build your own in some way. We did that for the purpose of experimentation and now we've built our own metrics platform that we completely own.

rottencupcakes•about 19 hours ago

Defaults matter.

Opt-in vs opt-out organ donorship has a large impact.

Most people on any web app won’t stray from the defaults.

hyperbovine•about 18 hours ago

I sincerely hope this never comes to pass, but you or your loved ones may someday find themselves in the position of wishing more people were opted in for organ donation.

The same cannot be said for some random corporation training AI models off your data to make a buck or two.

mmh0000•about 18 hours ago

Which we probably need to consider changing now that some truly bizarre and evil shit is being done on donor organs:

https://news.ycombinator.com/item?id=48212992

tredre3•about 14 hours ago

Organ donation saves lives at no cost to you (because at this point you're dead).

It's unclear if my private data being used for training has a cost to me, but I'm giving posthog money already. They shouldn't double dip.

philipwhiuk•about 18 hours ago

Again, this is because it's uninformed.

Consent matters.

cjonas•about 18 hours ago

yea except one is a "dark pattern" to exploit customers for corporate profit while the other is to benefit society.

hilariously•about 18 hours ago

There is no such thing as opt in by default - and burning that amount of customer goodwill because you want something instead of say, giving a discount to people who are willing to do it is a choice for people who have a lot more market share and their customers would have more trouble leaving.

infecto•about 18 hours ago

> Most companies would bury this change in a deceptively boring T&Cs update, but we value transparency, so here's what you need to know in an internet-friendly numbered list:

This feels like a really bad defense. It’s great you provide transparency but I don’t want my analytics system writing my code. There are already so many other first movers that are better that I would rather connect to your analytics.

buzer•about 18 hours ago

> We will anonymize all data before it's used for training

Anonymize by what definition? GDPR? Do note that this very high bar.

> All other users on our US cloud instance are opted in by default

Including end users in the EU? You should remember that you are obtained the personal data directly from data subject meaning Article 13 obligations apply. Article 13 omissions cannot be cured retroactively. Can you show all of your customers have provided sufficient Article 13 notice to cover this processing?

And do note that you are almost definitely within the scope of 3(2)(b).

ryanmcbride•about 18 hours ago

Hey man, respectfully, opt-in by default is not opt-in. That's opt-out, and it's scummy.

I feel like you either know that already, or should, but either way I won't be using your product anymore. Just pulled it out of the projects I'm personally in charge of and in the future I'm going to recommend against using it both internally and for clients.

Legitimately disappointed.

kelsey98765431•about 18 hours ago

Cant wait to see posthog crash and burn, i have hated their service for years now.

rafael-lua•about 12 hours ago

Calm down, GA4/Amplitude/Mixpanel.

vovavili•about 18 hours ago

Bizarre take.

johnsillings•about 18 hours ago

why?

_heimdall•about 17 hours ago

> We will do all the model training ourselves

That's actually an interesting note. So you all will be managing the training runs on hardware you own or rent and manage?

tartieret•about 17 hours ago

There is definitely some confusion on the EU part. I am a European citizen, but some of my activity data on some of the sites I host is logged in US Posthog, which means Posthog is subject to the GDPR, even if the data is US hosted!

osigurdson•about 15 hours ago

I think it would be better if there was an EU browser that provided a warning if accessing sites outside of the EU. If an origin site is in the EU then it could be subject to EU regulations.

GuinansEyebrows•about 18 hours ago

> All other users on our US cloud instance are opted in by default

This is slimy.

xnorswap•about 18 hours ago

It's slimy because your government allows it, this doesn't have to be the case.

1. Lobby your representatives to improve your data protection laws, even if you think it's pointless to do so

2. Stop attacking EU data protection laws, even if they inconvenience you

As can be seen from this announcement, data protection laws do make a difference.

ryanmcbride•about 18 hours ago

I don't want to support a company that's going to do everything they can possibly legally get away with, I want to support companies that do the right thing where they can.

osigurdson•about 18 hours ago

What concrete difference is being made here? If a site is hosted in the US but accessible by EU citizens AND using a PostHog US server isn't the data still being used for training?

Legitimate question, I am not trying to prove a point.

mrweasel•about 18 hours ago

Not really, it's slimy because it should be obvious that it's the morally wrong thing to do. There's no tangible benefit to the users, only risk.

The fact that they only opt-out EU users, because regulation forces them, tells you all you need to know about the moral compass of PostHog.

This shouldn't even require regulation, but apparently expecting companies to act morally is a bloody pipe dream. Profit over morals and concerns for your costumers, apparently.

GuinansEyebrows•about 18 hours ago

yes, of course!

shafyy•about 14 hours ago

> Users on our EU cloud instance are opted out by default

> All other users on our US cloud instance are opted in by default

Perfectly demonstrating that we need regulation to protect the general public, most companies will not do the right thing but the thing that benefits them first and foremost.

sammy0910•about 17 hours ago

as a user i dont like it, and am disappointed. it will take a bit of time to transition our systems off of posthog, but we will need to.

if you are looking at your metrics, I want to be clear that this transition will not happen overnight, but it _will_ happen for this reason, so just be aware that your short-term metrics won't tell the full story

frankest•about 18 hours ago

What a great reminder to build my own analytics and self host. PostHog just lost a customer. They could easily send a email to each customer asking if we want this. The assumption means they have no product intuition about their own customers, let alone the customers of their customers. Bye.

xrd•about 18 hours ago

Not trying to be snarky but why not just opt out instead of vibe coding your own analytics platform? I'm uncomfortable with people using my data to train AI, but those concerns revolve around where my data goes, and whether I'm notified/aware. Posthog is giving me good answers to those questions here.

frankest•about 18 hours ago

It has to do with the priorities of the company and its leadership. Either they lack the basic awareness to know that training on your business customers data will likely leak their sensitive information to their competitors, or they just intend to sell that data. We are not paying to have our data stolen.

xrd•about 18 hours ago

Very fair point!

infecto•about 18 hours ago

Thanks for posting. I had been in the fence for the past few months of switching. The new AI products combined with the weird UIs had been irking me for a while. This is the final nail in the coffin. Opt-in is a terrible business model imo.

thecatapps•about 18 hours ago

Agreed. While I don't entirely care enough to rip it out of any existing products, I certainly won't be adding it to any new ones.

I remember people cheering about their "OS" web redesign, which was the most confusing and unnecessary UX complication when I needed to go track down a session replay to debug something (They've since added navigation to the top right.)

tines•about 19 hours ago

“Opt-in by default” = opt-out?

Tsarp•about 19 hours ago

Guess its "Opted-in" by default

natch•about 18 hours ago

Then it’s not opted. It’s just in.

patates•about 18 hours ago

"Possible to somewhat disable", I call it "PTSD".

mrits•about 18 hours ago

Opt means to make a choice or select an alternative. They are either incompetent or lying on purpose.

thecatapps•about 18 hours ago

It's probably very obvious by now, but there's something to be said about companies with the "SF Quirky" vibes:

- The OS Redesign

- "Sexy Legal Documents"

- Emails with "<relevant hedgehog meme goes here>" as the subject line

- Having a merch shop with action figures of your CEO

It works both ways. When you're looking for adoption and making very pro-user moves, I guess it can be a benefit. However, when you're now looking to grow revenue and making very anti-user moves, it's insult to injury.

I'm the last person to say that tech "shouldn't be fun" or something overly-broad like that, but if your messaging doesn't match the decisions of leadership, you're gonna have a bad time.

rafael-lua•about 12 hours ago

It started well, though. It was an analytics tool with a developer-oriented mind; it was refreshing compared to the competition. But all good things do seem to come to an end, especially when it is a company. They went full weirdos in the last 2 years. AI just made everything worse.

Back to scanning open source projects, I guess.

48terry•about 16 hours ago

> Why this is opt out, not opt in

> Put simply, because otherwise we will not have enough data to train a model that's actually useful.

Hmm, when asked to opt-in to giving their data away for yet another AI non-service, people don't want to. That's strange! The only way to get their data is to assume you can take it and force them to tell you to stop. Wonder what that could mean? Oh well, it's a mystery no one will be able to solve.

aabhay•about 15 hours ago

This should be a lesson in bad communication. Not being clear about whats being trained on is a huge mistake. And this announcement really puts into focus the drawbacks to PostHog’s cringe forward brand ethos

brauhaus•about 18 hours ago

Every day I'm more glad about EU legislation, that's all I have to say for now

gobdovan•about 18 hours ago

Yeah, the legislation is morally defensible on its own terms. But when you look at the full system, something funny happens: EU legislation is blocking data extraction and platform lock-in tactics that Big Tech already used to become monopolies.

And since the big platforms don't have to unwind their advantages or pay back for the methods that are now restricted and considered illegal, they can peacefully extract rents from their entrenched positions for even longer, while everyone else is prevented from using the same ladder they climbed.

vovavili•about 18 hours ago

...until you learn the rates of economic development between Europe and the US since 2008.

tredre3•about 14 hours ago

Yes the amount of fiscal growth in America is very impressive. Yet quality of life is higher in Europe. I wonder why that is?

Laurel1234•about 18 hours ago

Every last single cent of that "economic development" is in the hands of billionaires, at least people in Europe have rights and their government isn't a couple of monopolies in a trenchcoat.

vovavili•about 15 hours ago

That is a doubly naive pair of statements.

freshnode•about 18 hours ago

Why won't companies explain what anonymisation means for them?

Posthog has unfettered logged in access to some sensitive stuff. What steps are they actually taking to scrub sensitive data from my replay before being used to train a model?

tartieret•about 18 hours ago

this is what triggered my post. The announcement pretends that it's not bug deal because of "anonymization" but that's easier said than done. You can send custom events and logs that contain confidential information even if it doesn't contain personal identifiers

abustamam•about 18 hours ago

> Why this is opt out, not opt in

> Put simply, because otherwise we will not have enough data to train a model that's actually useful.

AKA we won't be able to make as much money if we required you to give us permission to use your data.

Dave_Rosenthal•about 17 hours ago

They say, "our goal here is to improve PostHog as a product for our customers, not to expose or sell models trained on your data" but then don't actually list that as a limitation in the bulleted points.

AFAICT this now gives them default permission to train an LLM on your code (as Posthog telemetry data is inextricably tied to your code) use it, and even sell it if they wanted to (as it's not your data anymore, it's their model). Yikes.

rafael-lua•about 12 hours ago

Time to scan for the current opensource copmmunity projects to migrate. Any recommendations are most welcome.

the__alchemist•about 18 hours ago

How much are they paying the users?

rafael-lua•about 12 hours ago

They are probably adding more trinkets to their shop, so you can buy them.

stevoski•about 17 hours ago

I’ve been evaluating PostHog for our company.

I’ve now made our decision. We won’t be using them.

If they are going to position yourself as the non-slimy no-BS guys, they can’t pull this nonsense.

throwatdem12311•about 8 hours ago

We were considering Posthog and now they’re on our shitlist.

Good job guys.

rad_val•about 17 hours ago

All of them do if you don't do something about it(e.g. migrate to self hosted solutions), trusting a ToS in 2026 is as naive as it gets.

ASinclair•about 18 hours ago

Mostly unrelated but the name of this company makes me think it's a Dick-Pics-as-a-Service provider.

lljk_kennedy•about 18 hours ago

netdix.com

Analemma_•about 15 hours ago

This is totally intentional and they play it up for all it's worth on their bus ads in SF. Personally I find it just pathetic, but what can you do.

mrcwinn•about 18 hours ago

Gross.

They’ll use your product and your data to later sell a product back to you.

xp84•about 17 hours ago

Even if there were no AI, that's not any different than any SaaS where your data gets stored. Picking at random, Optimizely certainly has a ton of interaction data available and they build new features and products that leverage your data (without which the features would be impossible). Could be reporting tools, funnel analysis, etc.

jen20•about 18 hours ago

Perhaps if they hopped on a quick call for five minutes with some customers, they'd realize quite how little appetite there is for putting up with being opted into things automatically in the US but not in the EU.

As an aside, this also means the EU rules are working.

freshnode•about 18 hours ago

+1 this made me glad we opted for the EU region

bigstrat2003•about 18 hours ago

This is the fastest way possible to ensure I will never do business with you, or stop doing business with you if I already am.

tartieret•about 19 hours ago

I initially used Posthog as an alternative to Google Analytics with more privacy. Now they want to use the data for a business purpose. Working hard towards enshitification?

rvz•about 19 hours ago

> I initially used Posthog as an alternative to Google Analytics with more privacy.

This does not make any sense.

> Now they want to use the data for a business purpose.

They raised VC money and they want a return so this was predictable.

mrits•about 18 hours ago

It makes perfect sense actually

Henchman21•about 18 hours ago

You can’t “opt-in” to something that is the default. The choice is made for you — and when the choice is made for you? You haven’t opted in or out?

scosman•about 18 hours ago

I would have guessed that was just a bad title here but no, article states it as "opted in by default".

tartieret•about 18 hours ago

I fixed the title, sorry for the typo!

scosman•about 17 hours ago

not your fault, the article uses that language!

datagreed•about 13 hours ago

Posthog still cannot fix bugs that exist there for years and still cannot fix their identification process for merging users, but who cares right? Lets slop some AI on top of a non-working project. Good thing

staticautomatic•about 16 hours ago

Friendly reminder that you don't have to enable PostHog replays at all. I have a site lightly instrumented server-side with the slim bundle, and I'm still gonna double check my account settings but I'm pretty sure it's not even capable of doing the replay telemetry.

dabeeeenster•about 12 hours ago

"Training won't start until June 29, so there's plenty of time to decide"

...except, we have made the decision for all the US customers already. Nice.

hattermat•about 14 hours ago

posthog have lost the plot

mikkelam•about 18 hours ago

The enshittification has begun. Time to move on!

gyoridavid•about 18 hours ago

I feel that the US should step up their legislation game and make sure these companies can't retroactively make rules to steal their users data. I know it's trendy to hate the EU but their legislation actually protects the users, and not the companies interests.

TZubiri•about 18 hours ago

Today I was thinking, if I start a company in the LLM tooling space, I would put in the company mission in the incorporation documents that client data will not be used to train.

The temptation and the value is too great, and the opt-in opt-out consent thing ends up being a fuckery where the company tries to trick the user into allowing them to take a look into the data, presumably because they are selling the product at a loss and need an alternative revenue model.

Just make it impossible from the get-go, the fine print would be that the data can be shared off-band explicitly, in an email, or if explicitly copy pasted in a support chatbox, but there would be no mechanism for us to read the data from the databases much less from the client.

I don't mean it would be an air-tight mechanism like Signal or ProtonMail, if a court order would ask us to produce client info, we would still reserve the right to produce the data, but exceptionally, and definitely not for training models.

suttontom•about 9 hours ago

Not to be cynical but do you think this would matter at all? Are you saying that companies would hold themselves to their missions or even something that's legally binding?

> "Google is not a conventional company. We do not intend to become one."

> OpenAI being founded as a nonprofit and becoming for profit.

> Didn't Anthropic literally say they wouldn't train on your data or keep it for longer than 30 days unless legally required, and then decided to opt people in to having their conversations used for training?

TZubiri•about 6 hours ago

if it's in the charter/articles of incorporation/ articles of organization, it's binding. If I break the mission and a.

> OpenAI being founded as a nonprofit and becoming for profit.

I think this is a common misconception, or a disregard for nuances. The NFP was not and cannot be converted to a Corp, that's kind of the idea of an NFP. However there exist satellite companies.

Sam Altman does not own shares of Open AI because there are no shares.

OpenAI has a for profit company (capped, Public benefit corporation), which Sama I don't think has shares in. It's an instrument for investments.

But every transaction needs to be fair and in kind, there can be no gifts at any point in a way that would magically negate the purpose of an NFP, Sama cannot cede the IP of ChatGPT to himself or one of his companies, that's not what's going on.

Again, saying it, putting in terms of contracts (that can be retracted with notice), and putting it in the charter are all different.

OkayPhysicist•about 18 hours ago

More companies need to make, for lack of a better term, "oaths" of what they won't do as a company. My pitch on it is to tie it to financial penalties the company agrees to pay, somewhere in the "enough to incentivize a significant portion of our user base to sue us" territory, such that it would be financial suicide to violate them.

TZubiri•about 18 hours ago

Contracts ad incorporations are designed for this, the issue is that the incumbent legal strategy is to use template documents, and to reduce potential disputes to 1$ in private arbitration, essentially legal's job is to make legal go away.

Another term I would incorporate is a Seppuku term, if we get hacked, I resign, the company goes bankrupt. Anything else is the wrong attitude to computer security for companies that want to scale to Global reach.

dzonga•about 18 hours ago

another would be excellent product company destroyed or being destroyed slowly due to VCs and the ever chase for 'growth'

slopinthebag•about 18 hours ago

PostHog better transition to an AI company soon because they are one of the SAAS's which are absolutely cooked by vibe coding. What it does is extremely amenable to LLMs and it's also non-critical for a business, making it an excellent candidate for replacement by in-house solutions. And if it means never having to use their website again that's even better.

I wonder if they regret opensource, considering people will be using LLMs to replace them which have surely trained off of their code.

calmbonsai•about 18 hours ago

LOL. You stay classy PostHog.