HI version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
68% Positive
Analyzed from 2011 words in the discussion.
Trending Topics
#content#cloudflare#blog#writing#bots#crawlers#why#human#site#cache

Discussion (54 Comments)Read Original on HackerNews
We need to stop this treadmill of trying to "build reputation" and stop focusing on "symbolic capital" and "clout" and whatever else bloggers are going after. You're not going to get it, and even if you do, you're not going to be able to "monetize" it.
If you have a need to write, write. Maybe a handful of actual people will read it, maybe not. But, I wouldn't try to do it for a living. The reward will have to be the cathartic process of writing itself, and not in how much attention it gets, how much it "blows up" or how viral it gets.
What I need is for my writing to spread enough that I can receive opportunities to have my programming ability evaluated.
The reason I write about programming is that, in the past, some readers found my programming essays interesting, and that led to chances for me to be tested. I had to leave graduate school because of financial problems, and I did not graduate from a prestigious university.
So this is not simply about monetizing writing. It is a struggle to receive opportunities. Those are fundamentally different things.
Some people may be happy writing things that nobody reads. But many people are happier when they can share their writing and let their values collide with those of others.
Dead internet manifest.
It has been virtually impossible to find real information in some areas if you don't personally know which website is reliable or not. That's why we devs used to go to StackOverflow, and people use site:reddit.com when searching Google. LLMs just exacerbated all of that, but it was already happening.
I'm a copywriter and I used to get hired to write posts on behalf of founders on LinkedIn or for their company blog.
Now, the last three jobs I had were all focused on sending cold email.
It's going to be a serious problem and I've already seen sites that are down 90% in traffic simply because AI is scraiping them, answering the questions themselves and never providing a linkback.
Why do I care if I shave off 200ms from a crawler's request, instead of a human's?
Based on that I think it's more about requests from bots/scrapers having the greatest chance possible of hitting a cache before hitting the blog's origin/real host. Bots will hit some layer of Cloudflare first then they'll hit Fastly and then if not in Fastly they'll hit the Ghost blog's server.
To me, this makes a lot of sense if it's self-hosted but I also thought it was already the standard to shove your self-hosted blog behind a reverse-proxy and cache as much as possible.
And I'm not a professional web developer but all the extra caching layers for a static personal blog seem a bit overkill.
Aside from the graphic, the article is a lot of words about engaging with an LLM to get a full understanding of how caching works for their blog hosting and how it enabled them to change their setup for the better.
It's kind of hard to understand because there are no words about what they actually did or how what they actually did was better.
> If you care about how your content moves through the world now, including through AI systems, you have to care about caching. Not as a performance optimisation for human browsers, but as infrastructure for machine readership.
The screenshot in the image says 3k req/day. That’s 2 requests per minute (amortized). At that rate, you can serve it with cgi and Perl.
Cache is only relevant if you have a lot of traffic AND dynamic pages, or if you care about latency (which is only relevant for humans).
For the sort of thing you’re doing, it should be as simple as “throw it behind Cloudflare/Fastly/Bunny/whichever private CDN you like” and that’s it.
Also the diagram near the end is pretty much incoherent. GenAI, I presume.
Yes, the architecture setup is generated by ChatGPT but in itself it says what it needs.
Maybe we're being trolled and the op is crowd sourcing solutions by posting something ridiculous and getting us to put solutions up
now I suddenly I have 10k visitors a month hammering my apis and causing massive egress and cpu usage - so i had to get them behind cloudflare and now build everything statically - cut the costs back down from 90+ cpu hours to about 0.2 cpu hours a month
crazy times
(also, all donw w/ claude code's help, or it would have taken a week for me to figure out)
$4 hetzner vps can serve tons of request if you put cloudflare in front of it
I host my own runners for CI and artifcat building on Hetzner VPS (spun on demand).
People are easily lured by pay as you go plans on serverless and other cheap to get started managed services and end up racking huge bills.
This is same reason I don't use stack driver or cloud monitoring and prefer to use graphana + loki + Prometheus setup
My setup cannot be misconfigured and end up racking huge bills.
Ideally I'd make the content available to crawlers for training open models, but that seems to be nearly impossible. It would be possible if other AI companies behaved.
That can’t block Grok, can it?
(You might have a fake iPhone or something visit your site if you ask Grok to retrieve information from it)
This is bad because there are fitness guides on my domain
https://macrocodex.app/guides which newbies often put in chatgpt and asks to simplify.
I enabled crawl for LLMs. There is lot of misinformation in fitness field so it's better if LLMs get their content from people who atleast have experience in the field
I also used Claude to help me drill into what's going on. Bizarrely, about 80% of my traffic comes from Singapore, which the author mentioned. I don't know why. A lot of the traffic looks real; it stays for a while, clicks different links in different orders. But no one in Singapore has ever read a thing I've written on my site as far as I'm concerned.
I thought Cloudflare would help protect my site from bots, but it utterly fails. I'm not sure if they're too sophisticated or people overestimate how well CF works for these things. I paid for advanced features for a while and reverted to the free plan once I realized it made no difference. It's a great platform in general, but hasn't been great for allowing me to see how many humans actually read my content.
I know some do because they email me occasionally. If I had to guess, of the ~200 visits per week reported in analytics, around 15 are real.
From what I understand, Cloudflare is trying to create a way for agents to consume content in a more structured manner than allowed for attention to the author, and potentially payment along with it.
I don't want to be paid but I'd love to see how often context from my writing winds up in a session a human is actively using.
- My blog is static content and it costs me ~nothing to serve the requests.
- The bots were ignoring robots.txt anyway.
- If there's ultimately a human driving the bot (e.g. someone asking "summarise this article"), I don't mind.
- It's like trying to block search engines. Just as I want my blog to turn up in search results, I want agents etc. to know it exists, too.
My original motivation for denylisting, years ago, was that LLMs were simply not very good, so training-set scrapers seemed like all downside with no upside.