I am working on a system at present where the data scientist has done the calcul...

shoemakersteve · on April 11, 2025

Any reason you're using CSV instead of parquet?

epistasis · on April 11, 2025

CSV seems to be a natural and easy fit. What advantage could parquet bring that would outweigh the disadvantage of adding two new dependencies? (One in Python and one in R)

pjacotg · on April 11, 2025

Not the op, but I started using parquet instead of CSV because the types of the columns are preserved. At one point I was caching data to CSV but when you load the CSV again the types of certain columns like datetimes had to be set again.

I guess you'll need to decide whether this is a big enough issue to warrant the new dependencies.

pletnes · on April 11, 2025

Many of the reasons csv is bad is because you don’t control both reader and writer. Here, if you’re 2 persons that collaborate OK, they should be fine.