Advertisement
Advertisement
β‘ Community Insights
Discussion Sentiment
71% Positive
Analyzed from 320 words in the discussion.
Trending Topics
#data#dataset#model#lot#don#hand#cases#simple#job#build
Discussion Sentiment
Analyzed from 320 words in the discussion.
Trending Topics
Discussion (11 Comments)Read Original on HackerNews
This is backwards. The model is the easy part. Getting good data is 99% of the job, and nearly any clown can make a good model once you hand them a good dataset.
If you hand me a clean, well-labeled, representative dataset, I can make the model do a respectable little dance by lunch.
If you hand me a Kaggle CSV with duplicated rows, target leakage, mislabeled outcomes, and columns named final_final_v2_REAL, suddenly Iβm not doing ML anymore. Iβm doing archaeology with a red nose on.
The model is the balloon animal. The dataset is the elephant you had to drag into the tent.
https://ianreppel.org/how-to-spot-a-rogue-data-scientist/