DE version is available. Content is displayed in original English for accuracy.
It's an integrated pipeline for lightweight multilingual text classification, covering preprocessing, training, and evaluation. It implements SetFit, a few-shot learning technique that works well for low-data regimes (down to a few dozen examples), and offers high throughput on CPUs, since it's based on Sentence Transformers. Dependencies are kept lean, but of course PyTorch itself isn't exactly small.
autofit2 takes a base model and a JSON config as input, and outputs a TorchServe model archive as well as a model card. The model card includes any benchmarks you have for your task, self-consistency tests, estimated CO2 emissions of the finetune, as well as an entropy-based bias analysis. For the bias eval, small test corpora for 50 languages are included. It works best with my EAR (Entropy-based Attention Regularization) fork of Sentence Transformers.
Feedback is welcome.

Discussion (0 Comments)Read Original on HackerNews
No comments available or they could not be loaded.