FR version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
71% Positive
Analyzed from 320 words in the discussion.
Trending Topics
#data#dataset#model#lot#don#hand#cases#simple#job#build

Discussion (11 Comments)Read Original on HackerNews
This is backwards. The model is the easy part. Getting good data is 99% of the job, and nearly any clown can make a good model once you hand them a good dataset.
If you hand me a clean, well-labeled, representative dataset, I can make the model do a respectable little dance by lunch.
If you hand me a Kaggle CSV with duplicated rows, target leakage, mislabeled outcomes, and columns named final_final_v2_REAL, suddenly I’m not doing ML anymore. I’m doing archaeology with a red nose on.
The model is the balloon animal. The dataset is the elephant you had to drag into the tent.
https://ianreppel.org/how-to-spot-a-rogue-data-scientist/