Hardwood, the project Gunnar Morling kick-started handling of Parquet files in Java, reached version 1. Its multi-threaded approach and zero mandatory external dependencies promise a simpler, more ...
Don't use .collect() on large DataFrames — brings all data to driver, causes OOM errors Don't chain multiple .count() calls — each triggers a full scan; cache DataFrame if needed Don't ignore skew — ...