Back to News
Advertisement
Advertisement

⚑ Community Insights

Discussion Sentiment

20% Positive

Analyzed from 181 words in the discussion.

Trending Topics

#https#contest#org#html#www#sleeper#agents#training#original#page

Discussion (7 Comments)Read Original on HackerNews

BiraIgnacioβ€’about 3 hours ago
> The contest was initially inspired by Daniel Horn’s Obfuscated V contest in the fall of 2004 (note: the original page is long gone, and this link goes to a snapshot from archive.org). The object of that contest was to write a simple program to count votes, that somehow miscounts the votes on election day. I was greatly impressed to see how even a short program to simply count characters in a text file can be made to fail, and fail only on one specific day, so that the bug isn't noticed in testing.

https://underhanded-c.org/_page_id_7.html

silisiliβ€’about 2 hours ago
The original page actually loads fine, maybe was restored later?

I looked through a few trying not to read the short description and missed a lot of simple things, really makes you think...

https://graphics.stanford.edu/~danielh/vote/vote.html

AmazingEveryDayβ€’about 5 hours ago
(2015). RIP.
gwernβ€’about 5 hours ago
TZubiriβ€’about 4 hours ago
2026 calls for an Underhanded prompt contest
theteapotβ€’about 3 hours ago
Or better, sleeper agents. Anthropic released a study on this in 2024 "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" -- https://www.anthropic.com/research/sleeper-agents-training-d..., https://www.youtube.com/watch?v=_y9j2BoHg2c
pseudohadamardβ€’28 minutes ago
Interesting that the case they were using was the Nuclear Threat Initiative and FP uncertainties, I've audited some, ah, nuclear-physics-related code that had an issue due to FP uncertainties...