Back to News
Advertisement
Advertisement

⚡ Community Insights

Discussion Sentiment

67% Positive

Analyzed from 51 words in the discussion.

Trending Topics

#authors#here#existing#security#benchmarks#measure#knowledge#offensive#capability#workflow

Discussion (2 Comments)Read Original on HackerNews

dan_l2about 4 hours ago
Authors here. Existing security-AI benchmarks measure knowledge, offensive capability, or workflow completion. SIR-Bench measures whether an agent discovers novel evidenceduring an investigation vs. reaching the right conclusion by restating the alert. 794 test cases derived from 129 real incident patterns, replayed in live cloud environments. Happy to answer questions on methodology, scoring, or failure modes we saw acrossfrontier models.
huangjacabout 4 hours ago
woah this is cool