Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding

kkordlessagain about 4 hours ago 1 commentsRead Article on deep-reinforce.com

⚡ Community Insights

Discussion Sentiment

0% Positive

Analyzed from 74 words in the discussion.

Discussion (1 Comments)Read Original on HackerNews

SwellJoe•about 2 hours ago

I added this to a benchmark I've been doing of how well agents find security bugs, specifically security bugs originally found by Mythos. It performs poorly with only read/grep/ls tools, but in a follow-up test with a full shell and Python, it doubled its findings (still a poor showing, but it does at least indicate it is doing what it says on the tin: making tools to help it solve problems). It also did worse than Qwen AgentWorld, another recent post-train of Qwen 3.6 MoE intended for agentic use.

https://swelljoe.com/post/will-it-mythos/