DNN models, especially popular CNN models, are often run from within GPU memory. This memory is a limited quantity, especially when compared to the number of models being served. The RIPCORD project focuses on improving DNN serving through better GPU memory management, intelligent model selection and request routing.
Challenges and Opportunities of DNN Model Execution Caching. Guin R. Gilman, Samuel S. Ogden, Robert J. Walls, Tian Guo. Third Workshop on Distributed Infrastructures for Deep Learning (DIDL) 2019. (Stay tuned for the camera-ready)
Poster: EdgeServe: Efficient Deep Learning Model Caching at the Edge. Tian Guo, Robert J. Walls, Samuel S. Ogden. ACM/IEEE Symposium on Edge Computing (SEC 2019). (Paper)