Publications from Cake Lab
-
Speeding up Deep Learning with Transient Servers
Authors: Shijian Li , Robert J. Walls , Lijie Xu , Tian Guo
arXiv:1903.00045
Distributed training frameworks, like TensorFlow, have been proposed as a means to reduce the training time of deep learning models by using a cluster of GPU servers. While such speedups are often desirable—e.g., for rapidly evaluating new model designs—they often come with significantly higher monetary costs due to sublinear scalability. In this paper, we investigate the feasibility of using training clusters composed of cheaper transient GPU servers to get the benefits of distributed training without the high costs.
We conduct the first large-scale empirical analysis, launching more than a thousand GPU servers of various capacities, aimed at understanding the characteristics of transient GPU servers and their impact on distributed training performance. Our study demonstrates the potential of transient servers with a speedup of 7.7X with more than 62.9% monetary savings for some cluster configurations. We also identify a number of important challenges and opportunities for redesigning distributed training frameworks to be transient-aware. For example, the dynamic cost and availability characteristics of transient servers suggest the need for frameworks to dynamically change cluster configurations to best take advantage of current conditions.
BibTeX
@article{DBLP:journals/corr/Li19arxiv, author = {Shijian Li, Robert J. Walls, Lijie Xu, Tian Guo}, title = {Speeding up Deep Learning with Transient Servers}, journal = {CoRR}, volume = {abs/1903.00045}, year = {2019}, url = {https://arxiv.org/abs/1903.00045}, archivePrefix = {arXiv}, eprint = {1903.00045}, }
-
Cost optimization through unifying multi-cloud resources
Authors: Benjamin C. Nickerson , Synella Gonzales , Elsa M. Luthi , Tian Guo
The major qualifying project (MQP'18)
The goal of this project is to create a multi-cloud web interface that provides users with the cheapest resource provisioning options from Amazon Web Services and Google Cloud Platform. The user can choose between predefined allocations based on workloads or specify a custom amount of resources needed. In addition, our application handles deployments to respective cloud providers. By handling the end-to-end functionality of finding cloud resources and managing deployments, the user is able to optimize costs from multiple providers.
BibTeX
@article{DBLP:journals/corr/Li19arxiv, author = {Shijian Li, Robert J. Walls, Lijie Xu, Tian Guo}, title = {Speeding up Deep Learning with Transient Servers}, journal = {CoRR}, volume = {abs/1903.00045}, year = {2019}, url = {https://arxiv.org/abs/1903.00045}, archivePrefix = {arXiv}, eprint = {1903.00045}, }
-
An Experimental Evaluation of Garbage Collectors on Big Data Applications
Authors: Lijie Xu , Tian Guo , Wensheng Dou , Wei Wang , Jun Wei
The 45th International Conference on Very Large Data Bases (VLDB'19)
Popular big data frameworks, ranging from Hadoop MapReduce to Spark, rely on garbage-collected languages, such as Java and Scala. Big data applications are especially sensitive to the effectiveness of garbage collection (i.e., GC), because they usually process a large volume of data objects that lead to heavy GC overhead. Lacking in-depth understanding of GC performance has impeded performance improvement in big data applications. In this paper, we conduct the first comprehensive evaluation on three popular garbage collectors, i.e., Parallel, CMS, and G1, using four representative Spark applications. By thoroughly investigating the correlation between these big data applications’ memory usage patterns and the collectors’ GC patterns, we obtain many findings about GC inefficiencies. We further propose empirical guidelines for application developers, and insightful optimization strategies for designing big-data-friendly garbage collectors.
BibTeX
@article{Xu:2019:EEG:3303753.3316445, author = {Xu, Lijie and Guo, Tian and Dou, Wensheng and Wang, Wei and Wei, Jun}, title = {An Experimental Evaluation of Garbage Collectors on Big Data Applications}, journal = {Proc. VLDB Endow.}, issue_date = {January 2019}, volume = {12}, number = {5}, month = jan, year = {2019}, issn = {2150-8097}, pages = {570--583}, numpages = {14}, url = {https://doi.org/10.14778/3303753.3303762}, doi = {10.14778/3303753.3303762}, acmid = {3316445}, publisher = {VLDB Endowment}, }
-
MODI: Mobile Deep Inference Made Efficient by Edge Computing
Authors: Samuel S. Ogden , Tian Guo
The USENIX Workshop on Hot Topics in Edge Computing (HotEdge ‘18)
In this paper, we propose a novel mobile deep inference platform, MODI, that delivers good inference performance. MODI improves deep learning powered mobile applications performance with optimizations in three complementary aspects. First, MODI provides a number of models and dynamically selects the best one during runtime. Second, MODI extends the set of models each mobile application can use by storing high quality models at the edge servers. Third, MODI manages a centralized model repository and periodically updates models at edge locations, ensuring up-to-date models for mobile applications without incurring high network latency. Our evaluation demonstrates the feasibility of trading off inference accuracy for improved inference speed, as well as the acceptable performance of edge-based inference.
BibTeX
@inproceedings {216771, author = {Samuel S. Ogden and Tian Guo}, title = {MODI: Mobile Deep Inference Made Efficient by Edge Computing}, booktitle = {USENIX Workshop on Hot Topics in Edge Computing (HotEdge 18)}, year = {2018}, address = {Boston, MA}, url = {https://www.usenix.org/conference/hotedge18/presentation/ogden}, publisher = {USENIX Association}, }
-
Cloud-based or On-device: An Empirical Study of Mobile Deep Inference
Authors: Tian Guo
2018 IEEE International Conference on Cloud Engineering (IC2E'18)
Modern mobile applications benefit significantly from the advancement in deep learning, e.g., implementing real-time image recognition and conversational system. Given a trained deep learning model, applications usually need to perform a series of matrix operations based on the input data, in order to infer possible output values. Because of computation complexity and size constrained, these trained models are often hosted in the cloud. When utilizing these cloud-based models, mobile apps will have to send input dat over the network. While cloud-based deep learning can provide reasonable response time for mobile apps, it also restricts the use case scenarios, e.g. mobile apps need to have access to network. With mobile specific deep learning optimizations, it is now possible to employ on-device inference. However, because mobile hardware, e.g. GPU and memory size, can be very limited when compared to desktop counterpart, it is important to understand the feasibility of this new on-device deep learning inference architecture. In this paper, we empirically evaluate the inference efficiency of three Convolutional Neural Networks using a benchmark Android application we developed. Our measurement and analysis suggest that on-device inference can cost up to two orders of magnitude response time and energy when compared to cloud-based inference, and loading model and computing probability are two performance bottlenecks for on- device deep inferences.
BibTeX
@article{Guo2018CloudBasedOO, title={Cloud-Based or On-Device: An Empirical Study of Mobile Deep Inference}, author={Tian Guo}, journal={2018 IEEE International Conference on Cloud Engineering (IC2E)}, year={2018}, pages={184-190} }