Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The point of a dataset is to serve as a representative sample of the full spectrum of possible input data. Datasets need to be properly classified, and with different elements / 'intricacies' well distributed.

Think of it as using Hacker News posts to illustrate what posts are on topic. You could give someone a better impression if you showed them 10 posts instead of just 3, but if the 10 posts were all on the same subject then it wouldn't be of any additional utility. And if you accidentally include an off-topic post, then that user is going to have a mistaken impression of the post.

As for video... that's a long story.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: