DE version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
71% Positive
Analyzed from 614 words in the discussion.
Trending Topics
#amazon#amazonbot#user#quick#robots#txt#agent#websites#aws#traffic

Discussion (13 Comments)Read Original on HackerNews
I found a mention on some user agent trackers but no official documentation. Anyone knows if it’s documented? Asking because I am seeing decent traffic (30GB/week) from this.
> Crawling behavior [...] Crawler identification: Identifies itself with user-agent string "aws-quick-on-behalf-of-<UUID>" in request headers.
Maybe people found a way of using it as a loophole for something or Amazon Quick is just picking up in usage, and your website is popular amongst whoever uses that sort of stuff.
It has AI agents included so I guess this can just come from it searching the web based on user requests.
this bit made me laugh. was the email drafted in Outlook? was it sent to some sort of forwarding mailbox, or did they just BCC every customer in?
Did end up just adding them to our WAF blocklist, which is weirdly ironic - hosting on their infra & using their services to block their AI scraper...
> Amazonbot is used to improve our products and services. This helps us provide more accurate information to customers and may be used to train Amazon AI models.
They've been getting some heat on it lately, but I find it hard to believe they're going to give up entirely? And if so, what's to stop someone from just flouting their rules on pricing, and then doing the robots.txt thing to prevent issues?
The traffic isn't a problem. I've got Cloudflare in front and the machine itself is relatively overpowered, and downtime isn't critical. But I'd just like the thing to be able to spider me properly. Someone did point out to me that maybe I wasn't receiving actual Amazonbot but some other spider: https://news.ycombinator.com/item?id=46352723
Cloudflare had a nice technic to address the bot problem (if you use their name servers). It'll respect and use the robots.txt while sending the remaining bots to a deep black hole.
That said, one of the biggest websites in the world not respecting it is definitely a noteworthy story. Hopefully another one of the biggest websites in the world (formerly known as Twitter) eventually respects it as well instead of not even disclosing itself via a user agent and pretending to be Safari running on iOS.
You're talking about one (yes, biggest) but millions of other bots don't follow must be a bigger story.