Hi, Farmify! This is a great question. I have designed it with medical data (among other challenging verticals) in mind too. Infact, one of the medical companies we're working with has exactly the same problem as yours: sparse data. High cost and privacy is deflecting them from what can be said as having better ML models fast. For them, we see what can be the generalised anomalies and scale those anomalies up to a background dataset. This helps us take care of the the limitations of the datasets they already have with them.
Not a minimum dataset requirement for now, but yeah- the more the merrier.
Hi, I'm Sumit Srivastava, founder of Projell. We made this after dealing with the data hell like lw data availability, high data procuring cost, huge time sink for data collection, and privacy concerns over the user data.
This prompted me to build an easy way to generate synthetic data for machine learning models.
This primarily uses GANs, but we use techniques which are most efficient for specific usecases.
Areas where we've found it useful are biomedical, drone imagery, sattelite imagery, retail, and autonomous mobility.
As already prominent in the ImageNet challenge, the state of the art is using synthetic data to gain higher accuracy.
Google, for their autonomous vehicles, used millions of miles of real driving data and billions of miles of synthetic data. It is clear where the world is moving towards.
I would be happy to share the tools with everyone since dealing with data is something we struggled with and don't want anyone to struggle anymore.
I got really better at living life by learning from other people's experiences. PG really knows which ones would I value more. That's why I like PG's essays.