Back to News
Advertisement
Advertisement

⚑ Community Insights

Discussion Sentiment

100% Positive

Analyzed from 210 words in the discussion.

Trending Topics

#data#doubles#zstd#stream#library#values#idea#try#predictors#representation

Discussion (7 Comments)Read Original on HackerNews

loegβ€’about 1 hour ago
The question is, how close can OpenLZ come? (This is from the same people who develop zstd, but suitable for structured data in a generic way.)
Scaevolusβ€’about 2 hours ago
I see you have ALP, but have you tried Chimp128 or Arrow's byte stream split?
KerrickStaleyβ€’about 2 hours ago
Another library in this space is pcodec; I'd appreciate a comparison of the two.
endukuβ€’3 days ago
I built "fc", a C library for compressing streams of 64-bit floating-point values without quantization.

It is not trying to replace zstd or lz4. The idea is narrower: take blocks of doubles, try a set of float-specific predictors/transforms/coders, and emit whichever representation is smallest for that block.

It is aimed at time-series, scientific, simulation, and analytics data where the numbers often have structure: smooth curves, repeated values, fixed increments, periodic signals, predictable deltas, or low-entropy mantissas.

The API is intentionally small: "fc_enc", "fc_dec", a config struct, and a few counters to inspect which modes won. Decode is parallel and meant to be fast; encode spends more CPU searching for a better representation.

Current caveats: x86-64 only for now, tuned for IEEE-754 doubles, research-grade rather than production-hardened.

Repo: https://github.com/xtellect/fc

snissnβ€’about 1 hour ago
What do you mean by decode is parallel?
gus_massaβ€’1 day ago
Does it assume the floats come from photos or sound or something?
endukuβ€’about 7 hours ago
It is intended t obe mainly source agnostic (will try to add custom source predictors too). The idea is to treat input as an ordered stream of doubles and look for numeric structure like repeats, smooth deltas, fixed increments, or low-entropy bits. Target presentlyis scientific/time-series/simulation/analytics data, not photos or sound.