ES version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
67% Positive
Analyzed from 137 words in the discussion.
Trending Topics
#onnx#cpus#inference#speed#openvino#cpu#model#need#unlike#gpus

Discussion (5 Comments)Read Original on HackerNews
Instead, I'd recommend exploring CPU-specific AI optimizations. For instance, leveraging AVX512_BF16 instructions could reduce the inference time by 2x or 3x compared to the results in the article. OpenVINO supports this really well on Intel CPUs, and converting an ONNX model to OpenVINO is straightforward.
You can technically do Q4 quantization for larger embedding models but I am not sure if that plays nice with ONNX.
what we really need it something like auto-round for ONNX