DE version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
0% Positive
Analyzed from 313 words in the discussion.
Trending Topics
#common#crawl#product#website#copyrighted#business#selling#hard#data#bad

Discussion (10 Comments)Read Original on HackerNews
The publishers need to rethink their entire take on how the Internet works or any "victory" they earn is going to be extremely Pyrrhic.
It's absurd to say "you can't record this book to a friend or robot".
Nobody seems to actually reproduce the copyrighted materials.
High-dimensional eigendecompositions which underpin AI similarity are some of the most literally derivative materials of texts that you can imagine.
(my point being that it would be different if the product CommonCrawl provides were trained models, but this is not the case: its product is unlawful reproductions of copyrighted data for commercial use)
Common Crawl is not a business and is not selling anything.