Back to News
Advertisement
tthmsmxwll about 18 hours ago 0 commentsRead Article on springboards.ai

DE version is available. Content is displayed in original English for accuracy.

As frontier LLMs have very little output diversity even for open ended queries. We built Flint to see if we could reverse this. It’s a finetuned Qwen3 30B model specifically trained to produce higher entropy when asked open ended questions.

Flint significantly increases the NoveltyBench score compared to the base model, without significantly reducing the score on non-creative benchmarks like MMLU-STEM.

This shows that that divergence tuning doesn't actually have to be a tax on base capabilities.

Flint scores 7.47/10 on NoveltyBench while most frontier models score between 1.8 and 3.2.

Advertisement

Discussion (0 Comments)Read Original on HackerNews

No comments available or they could not be loaded.