An Experimental Evaluation of Garbage Collectors on Big Data Applications

Authors: Lijie Xu , Tian Guo , Wensheng Dou , Wei Wang , Jun Wei

The 45th International Conference on Very Large Data Bases (VLDB'19)

paper

Abstract:

Popular big data frameworks, ranging from Hadoop MapReduce to Spark, rely on garbage-collected languages, such as Java and Scala. Big data applications are especially sensitive to the effectiveness of garbage collection (i.e., GC), because they usually process a large volume of data objects that lead to heavy GC overhead. Lacking in-depth understanding of GC performance has impeded performance improvement in big data applications. In this paper, we conduct the first comprehensive evaluation on three popular garbage collectors, i.e., Parallel, CMS, and G1, using four representative Spark applications. By thoroughly investigating the correlation between these big data applications’ memory usage patterns and the collectors’ GC patterns, we obtain many findings about GC inefficiencies. We further propose empirical guidelines for application developers, and insightful optimization strategies for designing big-data-friendly garbage collectors.

BibTeX

@article{Xu:2019:EEG:3303753.3316445,
 author = {Xu, Lijie and Guo, Tian and Dou, Wensheng and Wang, Wei and Wei, Jun},
 title = {An Experimental Evaluation of Garbage Collectors on Big Data Applications},
 journal = {Proc. VLDB Endow.},
 issue_date = {January 2019},
 volume = {12},
 number = {5},
 month = jan,
 year = {2019},
 issn = {2150-8097},
 pages = {570--583},
 numpages = {14},
 url = {https://doi.org/10.14778/3303753.3303762},
 doi = {10.14778/3303753.3303762},
 acmid = {3316445},
 publisher = {VLDB Endowment},
}