RU version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
59% Positive
Analyzed from 5411 words in the discussion.
Trending Topics
#using#amazon#more#management#token#tokens#don#usage#company#metrics

Discussion (160 Comments)Read Original on HackerNews
The top-down approach to encouraging (mandating?) AI usage strikes me as infantilizing to the workers, who are perfectly capable of choosing which tools they use and when.
In the early nineties, it was common for experienced electrical engineers to keep on using schematic entry digital design and look down on RTL and synthesis tools, despite that fact the latter was already way more productive. At some point, management had to put their foot down and force everyone to switch to using synthesis.
It's not unreasonable to assume that many people are set in their ways and unwilling to change their behavior without a bit of a push.
"Research" isn't part of my job title. If you don't know what's possible then why are you deploying it? You should be telling _me_ what's possible. I mean, you _paid_ for it, how can you possibly not know what you were getting?
> in the expectation that you might learn something useful that will be more valuable in the long run.
"I'll take `what even are profits?' for $200, Alex."
An overly generous steelman in my opinion as well. Have 10% of your employees focus on finding ways to properly leverage the new technology - don’t pressure 100% of your employees with bull shit metrics.
It's that simple.
(Never mind that these bloggers are just writing ad copy for cloud providers.)
It's quite possible they aren't trying to measure performance but are literally just trying to increase token consumption to feed the bubble and hype.
Plus pressure employees may find new unique use cases for AI.
It's like if your goal is inflation, you give out tons of money and as long as its spent, you achieve your goal.
It makes for pretty charts, extrapolations, and projections.
It doesn’t matter if the numbers are not particularly correct. As long as the data gathering step can be justified it’ll do. Though bonus points if making the number bigger is a good thing (v.s. tracking something like number of sev 1 issues).
> Here is what happens when the McNamara discipline is applied too literally: The first step is to measure whatever can be easily measured. This is okay as far as it goes. The second step is to disregard that which can't be easily measured or give it an arbitrary quantitative value. This is artificial and misleading. The third step is to presume that what can't be measured easily really isn't very important. This is blindness. The fourth step is to say that what can't be easily measured really doesn't exist. This is suicide.
— Daniel Yankelovich, "The New Odds"
This why AWS is bleeding good engineers for years. What is left is starting to look like Boeing post McDonnell merger...
They took out a quarter of their documentation page limited real estate, with AI doc shorts nobody asked for, nobody needs, and cant disable.
Senior management let go our localisation staff. Now they want us to use AI to translate. They still want manual review.
We use Github Copilot at work, we get a measly 300 requests with the budget to go over if necessary. Opus 4.7 or GPT 5.5 would eat all of those up in a day. Are we supposed to be using more than the allotted amount, do management see that as a good thing. Or is it best to stick within the allocated amount. Who knows? Management are playing games everywhere it seems.
One of the weirder things about all this is how arbitrary and non objective the billing structure seems. One of the reasons I'm happy to use it at work, but won't ever personally subscribe. It's so opaque.
Maybe they’re right. But it’s really hard to see how.
I setup entire virtual teams (Dev, QA, product, reviewers etc with the initiating model just acting as the agent manager to keep it's context minimal) to one-shot some stuff and it kept churning and making progress.
Those days are just about over with the change to token pricing but for a time....
"You spent $23, over the $20 food limit. Be more careful next time. You spent $600 on tokens, $200 more than the average. Congratulations!"
> whoever spent $600 on Anthropic last night, great job leveraging Al! But to the person who spent $23 on Uber Eats please remember our limit for food is $20 per meal
I can't say that this isn't happening, but at least the parts of the company I get visibility into, what the article describes isn't my experience. There is a lot of interest in using GenAI, but people are mostly getting kudos around creative uses for GenAI, not just for raw amount of tokens. For most scaled GenAI efforts, there is a lot of focus on output metrics (metrics like accuracy, number of findings, number of things fixed, and so on).
I'm surprised how few comments are written with the prior that Amazon managers aren't stupid or uninformed about how incentives work.
My guess would be that someone created the leaderboard without a lot of consultation with managers, and that some employees feel a competitive urge to try to "win" the leaderboard by burning tokens.
LOL, I'd imagine even Amazon HR would be little restraint in showering such praise.
What we can verify is how how Amazon already treats workers, they will surveil anyone within their systems regardless of the futility of said surveillance. Why are we suppose to not believe them using LLM systems as a means to further control their expensive employees from unionizing or seeking out solidarity with fellow workers? All LLMs do is enable tyrannical managers more power to hold over other workers, said workers are forced to engage in self alienation for fear of losing your job or forcing to do meaningless work as that is what's being tracked (and what LLMs excel at producing).
Hardly a good proposition for any worker.
I'm sorry but I fully do not believe you. This is a company that fires workers for taking too long of a bathroom break where said workers piss in bottles for fear of getting fired and you're going "hey guys, it's not too bad. Only some workers get whipped, others don't!"
> That’s my latest joke — that we’ll have to pretend like we used the tools so they can feel validated they’ve spent all this money on hyped up technology. So, yes, it’s em-dashes and “it’s not just this, it’s that …” so they can hopefully leave us alone
One of my favorite heuristics/quotes applies here: "no matter how good the strategy, occasionally consider the result."
Want to know if AI is working for your org? Ask yourself/employees to "show me the result." That requires judgment and taste (is the result something of value, or just the appearance of work having been done), but it will also save you a ton of stress and disappointment later.
― Charlie Munger
However I see tons of people on LinkedIn with ways of backing up context, not wanting to lose context, etc.
This seems like another way the system is being misused. Higher context usage also uses more tokens. I suspect you get worse (and slower) output too than a dense detailed context.
a) you find a particular context that executes well and want to preserve parts of it or not have to repeat explanations
b) you want to continue a session so you don't have to rebuild the context from scratch
I think A is something where it's totally reasonable to preserve pieces as part of like a prompt library or equivalent, or directory-specific agent files, that kind of thing.
I think B is much more likely to lead to problems if you do it over a long time, but it can be pretty useful for getting the last drop of juice out of the metaphorical orange.
I think the antipattern (that I've done myself, admittedly) is swapping between different restored contexts for different tasks or roles - at that point you should be either converting it to more durable documentation if warranted, or curating it more specifically than "restore the entire context" even if it's just one-off.
Ideally that replaces the back and forth cycle of it's this, no it's that, it's that for reasons XYZ with a single ingestible blob that gets the agent up to speed.
Sometimes it's better to dump context incrementally, reinitialize the agent with a subset of the context, or manually prime it, then ask it to write documentation as a focused task.
If every exchange is treated as an independent query/response then it's much easier to see how cutting out the fluff using a combination of its summaries and your own helps stay focused.
At my company(big name, AI beneficiary), middle management seems to mostly be concerned with shuffling chairs on the deck of the Titanic while they wait for their stock to fully vest. There is very little interest in improving anything, just an obsession with risk avoidance and performative sideshows whenever upper management wonders why execution is so poor.
Where? What industry, what kind of projects? The only one where I can imagine it to be true is vulnerability research, and I imagine all the low-hanging fruit to be picked soon
It will spin up a boilerplate uboot or BSP config no problem. I still go in and manually check and add peripherals, but opus 4.7 is terrifyingly smart.
Need to modify or add a new peripheral, it's there no problem. Or in a bare metal project, I can point it at an STM32 cubemx starter repo and ask for a feature (set up the ADC on pins 4 and 7, ask me for parameters) and it's just done. I do in a day what would probably take me 2.
It doesn't help me with reviewing others' work, or planning (I maintain that these are manual tasks). So yeah, I agree with the 40-60%. The parts of my job it helps, it really helps.
My experience is it will attempt read from the wrong memory block resulting in garbadge. But that's a while ago so maybe LLMs have gotten better.
We started working on a new product a few months ago and it's really dangerous up front on an empty code base. It can quickly write more code than you can comfortably understand. The more serious danger is when three people are all doing that at once. I had to bring this up at meetings and try to get a better review culture going.
Now that we're a few months in and changes are more targeted additions to an existing system we're happy with, it's _huge_ (which has been my experience on our existing product). I can drop a brief paragraph I speech-to-texted into my agent, give it a general starting place (where I imagine the issue/feature extension point is), and then tell it to do some research and propose a change. I'd guess it's about 50% of the time that I have to update it's implementation plan. Then I let it run (my favorite is setting this up before a meeting) and come back. Then we have to review the code and go from there.
Definitely a 50%+ speed up in some cases, but not all. It's also great for problems that procrastinating, as it reduces friction so much.
People churning out slop is slowing me down and the full effects of it won't be felt for a while.
Codex was pretty sure something was wrong with the response object being returned by the endpoint in question. It turned out there was a conversion method applied to the endpoint response, which mutated its input. This method had been running w/o problems for a while, until the dev put it in a useEffect. At this point, React dev mode's policy of rendering everything twice kicked in, which caused the second pass through the conversion method to fail on the now-mutated input object.
Codex never even hinted that the conversion method mutating the input could be a problem, nor anything about React dev mode rendering everything twice (specifically to catch problems like this). Apparently, neither of those came up much in its training data.
My point is that this dev seems to have lost, in a few short months of writing everything with Codex, the ability to trace an error from its source (the error trace was being swallowed in a Codex-written catch block that spit out a generic error message). He was completely stuck and just kept doubling down on trying to get Codex to solve the problem, even checking with Copilot as a backup. I'm not optimistic about where this is headed.
In my view you should 1) use AI as a tool to help you learn and 2) write boilerplate you could have easily written yourself. Getting it to think for you is counterproductive (at least until it replaces us entirely).
Everyone I talk to has nowadays KPIs tied to AI usage on their performance evaluation.
It's astonishing how society forgets.
That said, I’m kind of having a blast using CC in corporate with all the connectors available at our disposal, and I baffled how little some of my coworkers know about what’s available and what the capabilities are. So it’s clear that perhaps some encouragement is prudent for those who are slower to embrace new technologies, but I’m not sure tokencounting and tokenmaxing are the answer.
If I do all of this, do I get a promotion?
I have an FT subscription and they keep moving toward this kind of narrative first reporting to get clicks. It’s no longer a believable paper.
Filing JIRA tickets, updates. Opening PRs, having AI review PRs. This will all use tokens.
No need to tokenmaxx, you will end up burning tokens with just regular AI usage
There should be an anti leaderboard that highlight people under a threshold. Not trying to learn how to use ai while working at a company like Amazon is almost certainly a bad thing, and cause for looking into why.
Most people look at sea changes come and go. They all have a story of how they "could have bought Bitcoin when it was $100" or whatever. In an org, you don't want to have the story of "we could have done that when nobody else had", so you incentivize adoption of the tool as hard as possible and hope that dipping feet in the water makes people want to swim. If you don't already have a culture of early adoption (and no large company can) then you have to use blunt incentives. I don't think anyone has demonstrated otherwise.
This measuring of tokenmaxxing as a proxy for something beneficial to the company has got to be the single dumbest thing I have ever heard of in my entire software career.
It would be like some company in the dot com era measuring employee's internet download traffic as a proxy for productivity or internet-pilledness.
Why not just reward employees based on who's submit the largest expenses claims? That might have some correlation to work too, right ?!
Hell, I'm in the bowels of Google as an IC and it's hard to understand what adjacent teams are doing. Even harder for management that never gets their hands on anything.
So while you know engineers are probably bullshitting you with fake work, you can at least turn around and tell your supervisor the numbers. It's all a game of plausible deniability.
...except each keystroke has an associated cost, the sum of which may equal or exceed my salary.
mass hysteria perhaps?
There used to be a time where people used to die from dancing too much (from my understanding in which hey I can be wrong, I usually am): https://en.wikipedia.org/wiki/Dancing_plague_of_1518
I think that although we wish to consider ourselves as smart and really intelligent but we run on biological machines and clocks which evolutionary have not much of a difference since 1518 or even the times when we used to hunt and forage for that matter.
People use AI differently and they can be equally productive with a variety of token usage quantities.
Also, different kinds of work are differently amenable to using AI.
Using it to grade people is, err, rather unwise.
Hell, throw a Tarot reading in the middle of the loop so the agent has non-deterministic behavior too.
https://github.com/trailofbits/skills/tree/main/plugins/let-...
Amazon management wants to play five-dimensional chess? Play Balatro instead.
It does not get any better than that
Jensen, Sam, Dario: https://i.imgur.com/AI7rtCY.jpeg
Is that in the contract to use AI tools? If not, then what are they on about.
Very very few jobs in the US give you a contract.
That said, if you can't figure out how to use AI in a software job you should look into it. Not using AI at this point is a lot like not using CAD as an architect.
They also use a bunch of dumb metrics like, total PRs submitted, total comments made on PRs, etc. To the point that, there are multiple heavily used internal tools to game these metrics. Eg, auto-comment LGTM on any approved PR. Thus, making the metrics even worse than they would have been prior.
> Managers are discouraged from using token use to measure performance, according to a person familiar with the matter.
Like CAD and architects, if you're not using LLM's while coding it's an issue, but Amazon is very clear that this isn't an official metric. I would believe managers know how many tokens you're using, but it sounds like they just interviewed a disgruntled employee who didn't like AI and published it.
You're replying to an amazon employee who says they are being used in performance reviews, in comment thread on an article where 2 other Amazon employees say that their token usage is being tracked and they feel pressure to maximize token usage.
Do you have first hand knowledge to refute these 3 people with first hand knowledge?
The CAD thing is incredibly weird. I've never known an architect who had their CAD usage minutes tracked.
Btw I'm a big tech company and I know many people who are "token maxing". It's very common.
Does CAD software regularly generate an incorrect design that results in a catastrophic failure of the building?
AI is genuinely useful for many tasks. But 2x or greater business value from engineering orgs isn’t it. And even if it was business are terrible at measuring value added on an individual basis.
What they can measure though is token use. I’ve heard the same thing from other large companies my friends work for.
It’s bad enough that I’ve moved a significant amount of money out of US large-cap stocks.
You should have asked AI to come up with a better analogy.
No thanks I’ll just watch y’all slip down the slope.
"Wow, look at how fast employee # 2 is setting money on fire! Let's promote him!"
When LLMs are capable of actually doing a good job, then it might be like that. We are not there yet, and we may never be.
Heh. No need to be ashamed, I used to believe them when they lied to me like this too!