Publications

Conference
Journal
arXiv
Workshop
Poster

Cloud GPU servers have become the de facto way for deep learning practitioners to train complex models on large- scale datasets. However, it is challenging to determine the appro- priate cluster configuration—e.g., server type and number—for different training workloads while balancing the trade-offs in training time, cost, and model accuracy. Adding to the complexity is the potential to reduce the monetary cost by using cheaper, but revocable, transient GPU servers.
In this work, we analyze distributed training performance under diverse cluster configurations using CM-DARE, a cloud- based measurement and training framework. Our empirical datasets include measurements from three GPU types, six geographic regions, twenty convolutional neural networks, and thousands of Google Cloud servers. We also demonstrate the feasibility of predicting training speed and overhead using regression-based models. Finally, we discuss potential use cases of our performance modeling such as detecting and mitigating performance bottlenecks.

DistStream: An Order-Aware Distributed Framework for Online-Offline Stream Clustering Algorithms

Lijie Xu, Xingtong Ye, Kai Kang, Tian Guo, Wensheng Dou, Wei Wang and Jun Wei

40th IEEE International Conference on Distributed Computing Systems (ICDCS'20)

Stream clustering is an important data mining tech- nique to capture the evolving patterns in real-time data streams. Today’s data streams, e.g., IoT events and Web clicks, are usually high-speed and contain dynamically-changing patterns. Existing stream clustering algorithms usually follow an online- offline paradigm with a one-record-at-a-time update model, which was designed for running in a single machine. These stream clustering algorithms, with this sequential update model, can not be efficiently parallelized and fail to deliver the required high throughput for stream clustering.
In this paper, we present DistStream, a distributed framework that can effectively scale out online-offline stream clustering algorithms. To parallelize these algorithms for high throughput, we develop a mini-batch update model with several efficient parallelization approaches. To maintain high clustering quality, DistStream’s mini-batch update model preserves the update order in all the steps during parallel execution, which can reflect the recent changes for dynamically-changing streaming data. We implement DistStream atop Spark Streaming, as well as four representative stream clustering algorithms based on DistStream. Our evaluation on three real-world datasets shows that Dist- Stream-based stream clustering algorithms can achieve sublinear throughput gain and comparable (99%) clustering quality with their single-machine counterparts.

Demystifying the Placement Policies of the NVIDIA GPU Thread Block Scheduler for Concurrent Kernels (Performance'20)

Guin Gilman, Samuel S. Ogden, Tian Guo, Robert J. Walls

38th International Symposium on Computer Performance, Modeling, Measurements and Evaluation 2020

The Naked Sun: Malicious Cooperation Between Benign-Looking Processes

F. De Gaspari, D. Hitaj, G. Pagnotta, L. De Carli, and L. V. Mancini

Applied Cryptography and Network Security (ACNS) 2020

PointAR: Efficient Lighting Estimation for Mobile Augmented Reality

Yiqin Zhao, Tian Guo

16th European Conference On Computer Vision (ECCV'20)

Recurrent Networks for Guided Multi-Attention Classification

Xin Dai, Xiangnan Kong, Tian Guo, John Lee, Xinyue Liu, Constance Moore

ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'20)

Silhouette: Efficient Protected Shadow Stacks for Embedded Systems

Jie Zhou, Yufei Du, Lele Ma, Zhuojia Shen, John Criswell, Robert J. Walls

USENIX Security Symposium 2020

QuRate: Power-Efficient Mobile Immersive Video Streaming

Nan Jiang, Yao Liu, Tian Guo, Wenyao Xu, Viswanathan Swaminathan, Lisong Xu, and Sheng Wei

ACM Multimedia Systems Conference 2020 (MMSys'20)

Commodity smartphones have recently become a popular platform for deploying the computation-intensive virtual reality (VR) applications. Among the variety of VR applications, immersive video streaming (a.k.a., 360-degree video streaming) is one of the first commercial use cases deployed at scale. One specific challenge involving the smartphone-based head mounted display (HMD) is to reduce the potentially huge power consumption caused by the immersive video and minimize its mismatch with the constrained battery capacity. To address this challenge, we first conduct an empirical power measurement study on a typical smartphone immersive streaming system, which identifies the major power consumption sources. Then, based on the insights drawn from the measurement study, we propose and develop QuRate, a quality-aware and user-centric frame rate adaptation mechanism to tackle the power consumption issue in immersive video streaming on smartphones. QuRate optimizes the immersive video power consumption by modeling the correlation between the perceivable video quality and the user behavior. Specifically, QuRate builds on top of the user's reduced level of concentration on the video frames during view switching and dynamically adjusts the frame rate without impacting the perceivable video quality. We evaluate QuRate with an Institutional Review Board (IRB)-approved subjective user study to validate its minimum video quality impact. Also, we conduct a comprehensive set of power evaluations involving 5 smartphones, 21 users, and 6 immersive videos with empirical user head movement traces from a publicly available dataset. Our experimental results demonstrate significant power savings and, in particular, QuRate is capable of extending the smartphone battery life by up to 1.24X while maintaining the perceivable video quality during immersive video streaming.

DRAB-LOCUS: An Area-Efficient AES Architecture for Hardware Accelerator Co-Location on FPGAs

Jacob T. Grycel and Robert J. Walls

IEEE International Symposium on Circuits and Systems (ISCAS'20)

Advanced Encryption Standard (AES) implementations on Field Programmable Gate Arrays (FPGA) commonly focus on maximizing throughput at the cost of utilizing high volumes of FPGA slice logic. High resource usage limits systems' abilities to implement other functions (such as video processing or machine learning) that may want to share the same FPGA resources. In this paper, we address the shared resource challenge by proposing and evaluating a low-area, but high-throughput, AES architecture. In contrast to existing work, our DSP/RAM-Based Low-CLB Usage (DRAB-LOCUS) architecture leverages block RAM tiles and Digital Signal Processing (DSP) slices to implement the AES Sub Bytes, Mix Columns, and Add Round Key sub-round transformations, reducing resource usage by a factor of 3 over traditional approaches. To achieve area-efficiency, we built an inner-pipelined architecture using the internal registers of block RAM tiles and DSP slices. Our DRAB-LOCUS architecture features a 12-stage pipeline capable of producing 7.055 Gbps of interleaved encrypted or decrypted data, and only uses 909 Look Up tables, 593 Flip Flops, 16 block RAMs, and 18 DSP slices in the target device.

Enabling IoT Residential Security Stewardship for the Aging Population (Extended Abstract)

L. De Carli, I. Ray, and E. Solovey

Conference on Human Factors in Computing Systems (CHI) Workshop on Designing Interactions for the Aging Populations

MDInference: Balancing Inference Accuracy and Latency for Mobile Applications

Samuel S. Ogden and Tian Guo

2020 IEEE International Conference on Cloud Engineering (IC2E'20), Invited paper

Deep Neural Networks (DNNs) are allowing mobile devices to incorporate a wide range of features into user applications. However, the computational complexity of these models makes it difficult to run them efficiently on resource- constrained mobile devices. Prior work has started to address aspects of supporting deep learning in mobile applications either by decreasing execution latency or resorting to powerful cloud servers. As prior approaches only focuses on single aspects of mobile inference, they often fall short in delivering the desired performance.
In this work we introduce a holistic approach to designing mobile deep inference frameworks. We first identify the key goals of accuracy and latency for mobile deep inference, and the conditions that must be met to achieve them. We demonstrate our holistic approach through the design of a hypothetical framework called MDInference. This framework leverages two complementary techniques; a model selection algorithm that chooses from a set of cloud-based deep learning models and an on-device request duplication mechanism.
Through empirically-driven simulations, we show that MD- Inference achieves an increase in accuracy without impacting its ability to satisfy Service Level Agreements (SLAs). Specifically, we show that MDInference improves aggregate accuracy over static approaches by 40% without incurring SLA violations. Additionally, we show that with SLA = 250ms, MDInference can increase the aggregate accuracy in 99.74% of cases on faster university networks and 96.84% of cases on residential networks.

Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models

Matthew LeMay, Shijian Li, Tian Guo

2020 IEEE International Conference on Cloud Engineering (IC2E'20)

Deep learning models are increasingly used for end-user applications, supporting both novel features, such as facial recognition, and traditional features, such as web search. To accommodate high inference throughput, it is common to host a single pre-trained Convolutional Neural Network (CNN) in dedicated cloud-based servers with hardware accelerators such as Graphics Processing Units (GPUs). However, GPUs can be orders of magnitude more expensive than traditional Central Processing Unit (CPU) servers. Under-utilized server resources brought about by dynamic workloads can influence provisioning decisions, which may result in inflated serving costs. One potential way to alleviate this problem is by allowing hosted models to share the underlying resources, which we refer to as multi-tenant inference serving. One of the key challenges is maximizing the resource efficiency for multi-tenant serving given hardware with diverse characteristics, models with unique response time Service Level Agreement (SLA), and dynamic inference workloads. In this paper, we present Perseus, a measurement framework that provides the basis for understanding the performance and cost trade-offs of multi-tenant model serving. We implemented Perseus in Python atop a popular cloud inference server called Nvidia TensorRT Inference Server. Leveraging Perseus, we evaluated the inference throughput and cost for various serving deployments and demonstrated that multi-tenant model serving can lead to up to 12% cost reduction.

Community Cleanup: Incentivizing Network Hygiene via Distributed Attack Reporting

Yu Liu, and Craig A. Shue

IEEE/IFIP Network Operations and Management Symposium (NOMS)

Residential networks are difficult to secure due to resource constraints and lack of local security expertise. These networks primarily use consumer-grade routers that lack meaningful security mechanisms, providing a safe-haven for adversaries to launch attacks, including damaging distributed denial-of-service (DDoS) attacks. Prior efforts have suggested outsourcing residential network security to experts, but motivating user adoption has been a challenge. This work explores combining residential SDN techniques with prior work on collaborative DDoS reporting to identify residential network compromises. This combination provides incentives for end-users to deploy the technique, including rapid notification of compromises on their own devices and reduced upstream bandwidth consumption, while incurring minimal performance overheads.

PointAR: Efficient Lighting Estimation for Mobile Augmented Reality

Yiqin Zhao, Tian Guo

arXiv:2004.00006

We propose an efficient lighting estimation pipeline that is suitable to run on modern mobile devices, with comparable resource complexities to state-of-the-art on-device deep learning models. Our pipeline, referred to as PointAR, takes a single RGB-D image captured from the mobile camera and a 2D location in that image, and estimates a 2nd order spherical harmonics coefficients which can be directly utilized by rendering engines for indoor lighting in the context of augmented reality. Our key insight is to formulate the lighting estimation as a learning problem directly from point clouds, which is in part inspired by the Monte Carlo integration leveraged by real-time spherical harmonics lighting. While existing approaches estimate lighting information with complex deep learning pipelines, our method focuses on reducing the computational complexity. Through both quantitative and qualitative experiments, we demonstrate that PointAR achieves lower lighting estimation errors compared to state-of-the-art methods. Further, our method requires an order of magnitude lower resource, comparable to that of mobile-specific DNNs.

Poster: PointAR: Efficient Lighting Estimation for Mobile Augmented Reality

Yiqin Zhao and Tian Guo

The 21st International Workshop on Mobile Computing Systems and Applications (HotMobile'20)

In this poster, we describe the problem of lighting esti- mation in the context of mobile augmented reality (AR) ap- plications and our proposed solution. Lighting estimation refers to recovering scene lighting with limited scene obser- vation and is critical for realistic 3D rendering. As a long- standing challenge in the fields of both computer vision and computer graphics, the difficulty of lighting estimation is exacerbated for mobile AR scenarios. When interacting with mobile AR applications, users would trigger the place- ment of virtual 3D objects into any position or orientation in their surrounding environments. In order to present a more realistic effect, such objects need to be rendered with appro- priate lighting information. However, lighting, especially in the indoor scenes, can vary both spatially and temporally.

2019

Challenges and Opportunities of DNN Model Execution Caching

Guin R. Gilman, Samuel S. Ogden, Robert J. Walls, Tian Guo

Workshop on Distributed Infrastructures for Deep Learning

We explore the opportunities and challenges of model execution caching, a nascent research area that promises to improve the performance of cloud-based deep inference serving. Broadly, model execution caching relies on servers that are geographically close to the end-device to service inference requests, resembling a traditional content delivery network (CDN). However, unlike a CDN, such schemes cache execution rather than static objects. We identify the key challenges inherent to this problem domain and describe the similarities and differences with existing caching techniques. We further introduce several emergent concepts unique to this domain, such as memory-adaptive models and multi-model hosting, which allow us to make dynamic adjustments to the memory requirements of model execution.

Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models

Matthew LeMay, Shijian Li, Tian Guo

arXiv:1912.02322

ERHARD-RNG: A Random Number Generator Built from Repurposed Hardware in Embedded Systems

Jacob T. Grycel and Robert J. Walls

arxiv:1903.09365

Quality randomness is fundamental to cryptographic operations but on embedded systems good sources are (seemingly) hard to find. Rather than use expensive custom hardware, our ERHARD-RNG Pseudo-Random Number Generator (PRNG) utilizes entropy sources that are already common in a range of low-cost embedded platforms. We empirically evaluate the entropy provided by three sources---SRAM startup state, oscillator jitter, and device temperature---and integrate those sources into a full Pseudo-Random Number Generator implementation based on Fortuna. Our system addresses a number of fundamental challenges affecting random number generation on embedded systems. For instance, we propose SRAM startup state as a means to efficiently generate the initial seed---even for systems that do not have writeable storage. Further, the system's use of oscillator jitter allows for the continuous collection of entropy-generating events---even for systems that do not have the user-generated events that are commonly used in general-purpose systems for entropy, e.g., key presses or network events.

Account Lockouts: Characterizing and Preventing Account Denial-of-Service Attacks

Yu Liu, Matthew R. Squires, Curtis R. Taylor, Robert J. Walls, Craig A. Shue

Conference on Security and Privacy in Communication Networks (SecureComm)

To stymie password guessing attacks, many systems lock an account after a given number of failed authentication attempts, preventing access even if proper credentials are later provided. Combined with the proliferation of single sign-on providers, adversaries can use relatively few resources to launch large-scale application-level denial-of-service attacks against targeted user accounts by deliberately providing incorrect credentials across multiple authentication attempts. In this paper, we measure the extent to which this vulnerability exists in production systems. We focus on Microsoft services, which are used in many organizations, to identify exposed authentication points. We measure 2,066 organizations and found between 58% and 77% of organizations expose authentication portals that are vulnerable to account lockout attacks. Such attacks can be completely successful with only 13 KBytes/second of attack traffic. We then propose and evaluate a set of lockout bypass mechanisms for legitimate users. Our performance and security evaluation shows these solutions are effective while introducing little overhead to the network and systems.

Account Lockouts: Characterizing and Preventing Account Denial-of-Service Attacks

Yu Liu, Matthew R. Squires, Curtis R. Taylor, Robert J. Walls, Craig A. Shue

Conference on Security and Privacy in Communication Networks (SecureComm)

Detecting Root-Level Endpoint Sensor Compromises with Correlated Activity

Yunsen Lei, and Craig A. Shue

Conference on Security and Privacy in Communication Networks (SecureComm)

Endpoint sensors play an important role in an organization's network defense. However, endpoint sensors may be disabled or sabotaged if an adversary gains root-level access to the endpoint running the sensor. While traditional sensors cannot reliably defend against such compromises, this work explores an approach to detect these compromises in applications where multiple sensors can be correlated. We focus on the OpenFlow protocol and show that endpoint sensor data can be corroborated using a remote endpoint's sensor data or that of in-network sensors, like an OpenFlow switch. The approach allows end-to-end round trips of less than 20ms for around 90% of flows, which includes all flow elevation and processing overheads. In addition, the approach can detect flows from compromised nodes if there is a single uncompromised sensor on the network path. This approach allows defenders to quickly identify and quarantine nodes with compromised endpoint sensors.

Characterizing the Deep Neural Networks Inference Performance of Mobile Applications

Samuel Ogden, Tian Guo

arXiv:1909.04783

Today's mobile applications are increasingly leveraging deep neural networks to provide novel features, such as image and speech recognitions. To use a pre-trained deep neural network, mobile developers can either host it in a cloud server, referred to as cloud-based inference, or ship it with their mobile application, referred to as on-device inference. In this work, we investigate the inference performance of these two common approaches on both mobile devices and public clouds, using popular convolutional neural networks. Our measurement study suggests the need for both on-device and cloud-based inferences for supporting mobile applications. In particular, newer mobile devices is able to run mobile-optimized CNN models in reasonable time. However, for older mobile devices or to use more complex CNN models, mobile applications should opt in for cloud-based inference. We further demonstrate that variable network conditions can lead to poor cloud-based inference end-to-end time. To support efficient cloud-based inference, we propose a CNN model selection algorithm called CNNSelect that dynamically selects the most appropriate CNN model for each inference request, and adapts its selection to match different SLAs and execution time budgets that are caused by variable mobile environments. The key idea of CNNSelect is to make inference speed and accuracy trade-offs at runtime using a set of CNN models. We demonstrated that CNNSelect smoothly improves inference accuracy while maintaining SLA attainment in 88.5% more cases than a greedy baseline.

ModiPick: SLA-aware Accuracy Optimization For Mobile Deep Inference

Samuel S. Ogden, Tian Guo

arXiv:1909.02053

Mobile applications are increasingly leveraging complex deep learning models to deliver features, e.g., image recognition, that require high prediction accuracy. Such models can be both computation and memory-intensive, even for newer mobile devices, and are therefore commonly hosted in powerful remote servers. However, current cloud-based inference services employ static model selection approach that can be suboptimal for satisfying application SLAs (service level agreements), as they fail to account for inherent dynamic mobile environment. We introduce a cloud-based technique called ModiPick that dynamically selects the most appropriate model for each inference request, and adapts its selection to match different SLAs and execution time budgets that are caused by variable mobile environments. The key idea of ModiPick is to make inference speed and accuracy trade-offs at runtime with a pool of managed deep learning models. As such, ModiPick masks unpredictable inference time budgets and therefore meets SLA targets, while improving accuracy within mobile network constraints. We evaluate ModiPick through experiments based on prototype systems and through simulations. We show that ModiPick achieves comparable inference accuracy to a greedy approach while improving SLA adherence by up to 88.5%.

Confidential Deep Learning: Executing Proprietary Models on Untrusted Devices

Peter M. VanNostrand, Ioannis Kyriazis, Michelle Cheng, Tian Guo, Robert J. Walls

arXiv:1908.10730v1

Performing deep learning on end-user devices provides fast offline inference results and can help protect the user’s privacy. However, running models on untrusted client devices reveals model information which may be proprietary, i.e., the operating system or other applications on end-user devices may be manipulated to copy and redistribute this information, infringing on the model provider’s intellectual property. We propose the use of ARM TrustZone, a hardware-based security feature present in most phones, to confidentially run a proprietary model on an untrusted end-user device. We explore the limitations and design challenges of using TrustZone and examine potential approaches for confidential deep learning within this environment. Of particular interest is providing robust protection of proprietary model information while minimizing total performance overhead.

Poster: EdgeServe: Efficient Deep Learning Model Caching at the Edge

Tian Guo, Robert J. Walls, Samuel S. Ogden

ACM/IEEE Symposium on Edge Computing (SEC 2019)

In this work, we look at how to effectively manage and utilize these deep learning models at each edge location, to provide performance guarantees to inference requests. We identify challenges to use these deep learning models at resourceconstrained edge locations, and propose to adapt existing cache algorithms to effectively manage these deep learnings models.

Poster: Virtual Reality Streaming at the Edge: A Power Perspective

Zichen Zhu, Nan Jiang, Tian Guo, and Sheng Wei

ACM/IEEE Symposium on Edge Computing (SEC 2019)

This poster focuses on addressing the power consumption issues in 360-degree immersive video streaming on smartphones, an emerging virtual reality (VR) application in the consumer video market. We first conducted a power measurement study that indicates VR view generation as the major power consumption source. Then, we developed an edge-based immersive streaming system called EdgeVR that offloads the power-consuming view generation operation from the smartphone to the edge. Through our preliminary evaluations using EdgeVR, we identified the challenge of Motion-to-Photon latency associated with offloading. To reduce such delay, we propose a viewport prediction-based pre-rendering mechanism at the edge and thus ensuring the quality of experience in the VR application.

Presentation: Confidential Deep Learning: Executing Proprietary Models on Untrusted Devices

Peter M. VanNostrand, Ioannis Kyriazis, Michelle Cheng, Tian Guo, Robert J. Walls

Great Lakes Security Day 2019

Performing machine learning on client devices is desirable as it provides fast, offline inference results and can protect the user's privacy. However,running models on untrusted client devices reveals information about the model such as structure and neuron weights which may be proprietary. As users have full access to the hardware and software of their devices, the client operating system or other applications may be manipulated to copy and redistribute this information, infringing on the model provider's intellectual property. We propose the use of ARM TrustZone, a hardware security module present in most phones, to provide a trusted environment for the execution of machine learning models. Outside the trusted execution environment, all model information would be kept encrypted to ensure model confidentiality. We explore the limitations and design challenges of using ARM TrustZone and examine potential approaches for confidentiality performing deep learning within this environment. Of particular interest is providing robust protection of proprietary model information while minimizing total performance overhead.

Presentation: Silhouette: Efficient Intra-Address Space Isolation for Protected Shadow Stacks on Embedded Systems

Jie Zhou, Yufei Du, Lele Ma, Zhuojia Shen, John Criswell, Robert J. Walls

Great Lakes Security Day 2019

CloudCoaster: Transient-aware Bursty Datacenter Workload Scheduling

Samuel S. Ogden, Tian Guo

arXiv:1907.02162

Today's clusters often have to divide resources among a diverse set of jobs. These jobs are heterogeneous both in execution time and in their rate of arrival. Execution time heterogeneity has lead to the development of hybrid schedulers that can schedule both short and long jobs to ensure good task placement. However, arrival rate heterogeneity, or burstiness, remains a problem in existing schedulers. These hybrid schedulers manage resources on statically provisioned cluster, which can quickly be overwhelmed by bursts in the number of arriving jobs.
In this paper we propose CloudCoaster, a hybrid scheduler that dynamically resizes the cluster by leveraging cheap transient servers. CloudCoaster schedules jobs in an intelligent way that increases job performance while reducing overall resource cost. We evaluate the effectiveness of CloudCoaster through simulations on real-world traces and compare it against a state-of-art hybrid scheduler. CloudCoaster improves the average queueing delay time of short jobs by 4.8X while maintaining long job performance. In addition, CloudCoaster reduces the short partition budget by over 29.5%.

Control-Flow Integrity for Real-Time Embedded Systems

Robert J. Walls, Nicholas F. Brown, Thomas Le Baron, Craig A. Shue, Hamed Okhravi, Bryan C. Ward

31st Euromicro Conference on Real-Time Systems (ECRTS 2019)

Attacks on real-time embedded systems can endanger lives and critical infrastructure. Despitethis, techniques for securing embedded systems software have not been widely studied. Manyexisting security techniques for general-purpose computers rely on assumptions that do not hold inthe embedded case. This paper focuses on one such technique, control-flow integrity (CFI), thathas been vetted as an effective countermeasure against control-flow hijacking attacks on general-purpose computing systems. Without the process isolation and fine-grained memory protectionsprovided by a general-purpose computer with a rich operating system, CFI cannot provide anysecurity guarantees. This work proposes RECFISH, a system for providing CFI guarantees onARM Cortex-R devices running minimal real-time operating systems. We provide techniques forprotecting runtime structures, isolating processes, and instrumenting compiled ARM binaries withCFI protection. We empirically evaluate RECFISH and its performance implications for real-timesystems. Our results suggest RECFISH can be directly applied to binaries without compromisingreal-time performance; in a test of over six million realistic task systems running FreeRTOS, 85%were still schedulable after adding RECFISH.

A Coq formalization of Boolean unification

Daniel J. Dougherty

33rd International Workshop on Unification

We report on a verified implementation of two (well-known) algorithms for unification modulo the theory of Boolean rings: Lowenheim's method and the method of Successive Variable Elimination. The implementations and proofs of correctness were done in the Coq proof assistant; we view this contribution as an early step in a larger project of developing a suite of verified implementations of equational unification algorithms.

Presentation: A Random Number Generator Built from Repurposed Hardware in Embedded Systems

Jacob T. Grycel and Robert J. Walls

New England Security Day 2019

Speeding up Deep Learning with Transient Servers

Shijian Li, Robert J. Walls, Lijie Xu, Tian Guo

The 16th IEEE International Conference on Autonomic Computing, arXiv:1903.00045

Distributed training frameworks, like TensorFlow, have been proposed as a means to reduce the training time of deep learning models by using a cluster of GPU servers. While such speedups are often desirable—e.g., for rapidly evaluating new model designs—they often come with significantly higher monetary costs due to sublinear scalability. In this paper, we investigate the feasibility of using training clusters composed of cheaper transient GPU servers to get the benefits of distributed training without the high costs.
We conduct the first large-scale empirical analysis, launching more than a thousand GPU servers of various capacities, aimed at understanding the characteristics of transient GPU servers and their impact on distributed training performance. Our study demonstrates the potential of transient servers with a speedup of 7.7X with more than 62.9% monetary savings for some cluster configurations. We also identify a number of important challenges and opportunities for redesigning distributed training frameworks to be transient-aware. For example, the dynamic cost and availability characteristics of transient servers suggest the need for frameworks to dynamically change cluster configurations to best take advantage of current conditions.

Authenticating Endpoints and Vetting Connections in Residential Networks

Yu Liu, Curtis R. Taylor, and Craig A. Shue

IEEE ICNC Workshop on Computing, Networking and Communications (CNC)

The security of residential networks can vary greatly. These networks are often administrated by end-users who may lack security expertise or the resources to adequately defend their networks. Insecure residential networks provide attackers with opportunities to infiltrate systems and create a platform for launching powerful attacks. To address these issues, we introduce a new approach that uses software-defined networking (SDN) to allow home users to outsource their security maintenance to a cloud-based service provider. Using this architecture, we show how a novel network-based two-factor authentication approach can be used to protect Internet of Things devices. Our approach works without requiring modifications to end-devices. We further show how security modules can enforce protocol messages to limit the attack surface in vulnerable devices. Our analysis shows that the system is effective and adds less than 50 milliseconds of delay to the start of a connection with less than 100 microseconds of delay for subsequent packets.

2018

Security protocol analysis in context: Computing minimal executions using SMT and CPSA

Daniel J. Dougherty, Joshua D. Guttman, and John D. Ramsdell

Integrated Formal Methods, Springer International Publishing

Cryptographic protocols are used in different environments, but existing methods for protocol analysis focus only on the protocols, without being sensitive to assumptions about their environments. LPA is a tool which analyzes protocols in context. LPA uses two programs, cooperating with each other: CPSA, a well-known system for protocol analysis, and Razor, a model-finder based on SMT technology. Our analysis follows the enrich-by-need paradigm, in which models of protocol execution are generated and examined. The choice of which models to generate is important, and we motivate and evaluate LPA's strategy of building minimal models. “Minimality” can be defined with respect to either of two preorders, namely the homomorphism preorder and the embedding preorder (i.e. the preorder of injective homomorphisms); we discuss the merits of each. Our main technical contributions are algorithms for building homomorphism-minimal models and for generating a set-of-support for the models of a theory, in each case by scripting interactions with an SMT solver.

Homomorphisms and Minimality for Enrich-by-Need Security Analysis

Daniel J. Dougherty, J. D. Guttman, and J. D. Ramsdell

ArXiv e-prints

Cryptographic protocols are used in different environments, but existing methods for protocol analysis focus only on the protocols, without being sensitive to assumptions about their environments. LPA is a tool which analyzes protocols in context. LPA uses two programs, cooperating with each other: CPSA, a well-known system for protocol analysis, and Razor, a model-finder based on SMT technology. Our analysis follows the enrich-by-need paradigm, in which models of protocol execution are generated and examined. The choice of which models to generate is important, and we motivate and evaluate LPA's strategy of building minimal models. "Minimality" can be defined with respect to either of two preorders, namely the homomorphism preorder and the embedding preorder (i.e. the preorder of injective homomorphisms); we discuss the merits of each. Our main technical contributions are algorithms for building homomorphism-minimal models and for generating a set-of-support for the models of a theory, in each case by scripting interactions with an SMT solver.

An Experimental Evaluation of Garbage Collectors on Big Data Applications

Lijie Xu,Tian Guo, Wensheng Dou, Wei Wang, and Jun Wei

The 45th International Conference on Very Large Data Bases (VLDB'19)

Popular big data frameworks, ranging from Hadoop MapReduce to Spark, all rely on garbage-collected languages, such as Java and Scala. Big data applications are especially sensitive to the effectiveness of garbage collection (i.e., GC), because they usually process a large number of data objects that lead to heavy GC overhead. Lacking in-depth understanding of GC performance has impeded performance improvement in big data applications. In this paper, we conduct a comprehensive evaluation on three popular garbage collectors, i.e., Parallel, CMS, and G1, using four representative Spark applications. By thoroughly investigating the correlation between these big data applications’ memory usage patterns and the collectors’ GC patterns, we obtain many findings about GC inefficiencies. We further propose empirical guidelines for application developers, and insightful optimization strategies for designing bigdata-friendly garbage collectors.

MODI: Mobile Deep Inference Made Efficient by Edge Computing

Samuel S. Ogden, Tian Guo

The USENIX Workshop on Hot Topics in Edge Computing (HotEdge '18)

In this paper, we propose a novel mobile deep inference platform, MODI, that delivers good inference performance. MODI improves deep learning powered mobile applications performance with optimizations in three complementary aspects. First, ODI provides a number of models and dynamically selects the best one during runtime. econd, MODI extends the set of models each mobile application can use by storing high quality models at the edge servers. Third, MODI manages a centralized model repository and periodically updates models at edge locations ensuring up-to-date models for mobile applications without incurring high network latency. Our evaluation demonstrate the feasibility of trading off inference accuracy for improved inference speed, as well as the acceptable performance of edge-based inference.

Cloud-based or On-device: An Empirical Study of Mobile Deep Inference

Tian Guo

2018 IEEE International Conference on Cloud Engineering (IC2E'18)

Modern mobile applications benefit significantly from the advancement in deep learning, e.g., implementing real-time image recognition and conversational system. Given a trained deep learning model, applications usually need to perform a series of matrix operations based on the input data, in order to infer possible output values. Because of computation complexity and size constrained, these trained models are often hosted in the cloud. When utilizing these cloud-based models, mobile apps will have to send input dat over the network. While cloud-based deep learning can provide reasonable response time for mobile apps, it also restricts the use case scenarios, e.g. mobile apps need to have access to network. With mobile specific deep learning optimizations, it is now possible to employ on-device inference. However, because mobile hardware, e.g. GPU and memory size, can be very limited when compared to desktop counterpart, it is important to understand the feasibility of this new on-device deep learning inference architecture. In this paper, we empirically evaluate the inference efficiency of three Convolutional Neural Networks using a benchmark Android application we developed. Our measurement and analysis suggest that on-device inference can cost up to two orders of magnitude response time and energy when compared to cloud-based inference, and loading model and computing probability are two performance bottlenecks for on-device deep inferences.

2017

DeepContext: An OpenFlow-Compatible, Host-Based SDN for Enterprise Networks

Mohamed E. Najd, and Craig A. Shue

IEEE Conference on Local Computer Networks (LCN)

The software-defined networking (SDN) paradigm promises greater control and understanding of enterprise network activities, particularly for management applications that need awareness of network-wide behavior. However, the current focus on switch-based SDNs raises concerns about data-plane scalability, especially when using fine-grained flows. Further, these switch-centric approaches lack visibility into end-host and application behaviors, which are valuable when making access control decisions.
In recent work, we proposed a host-based SDN in which we installed software on the end-hosts and used a centralized network control to manage the flows. This improve scalability and provided application information for use in network policy. However, that approach was not compatible with OpenFlow and had provided only conservative estimates of possible network performance.
In this work, we create a high performance host-based SDN that is compatible with the OpenFlow protocol. Our approach, DeepContext, provides details about the application context to the network controller, allowing enhanced decision-making. We evaluate the performance of DeepContext, comparing it to traditional networks and Open vSwitch deployments. We further characterize the completeness of the data provided by the system and the resulting benefits.

The power of 'why' and 'why not': enriching scenario exploration with provenance

Tim Nelson, Natasha Danas, Daniel J. Dougherty, and Shriram Krishnamurthi

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017, pages 106-116, 2017

Scenario-finding tools like the Alloy Analyzer are widely used in numerous concrete domains like security, network analysis, UML analysis, and so on. They can help to verify properties and, more generally, aid in exploring a system's behavior. While scenario finders are valuable for their ability to produce concrete examples, individual scenarios only give insight into what is possible, leaving the user to make their own conclusions about what might be necessary. This paper enriches scenario finding by allowing users to ask "why?" and "why not?" questions about the examples they are given. We show how to distinguish parts of an example that cannot be consistently removed (or changed) from those that merely reflect underconstraint in the specification. In the former case we show how to determine which elements of the specification and which other components of the example together explain the presence of such facts. This paper formalizes the act of computing provenance in scenario-finding. We present Amalgam, an extension of the popular Alloy scenario-finder, which implements these foundations and provides interactive exploration of examples. We also evaluate Amalgam's algorithmics on a variety of both textbook and real-world examples.

User studies of principled model finder output

Natasha Danas, Tim Nelson, Lane Harrison, Shriram Krishnamurthi, and Daniel J. Dougherty

Software Engineering and Formal Methods - 15th International Conference, SEFM 2017, Trento, Italy, September 4-8, 2017, Proceedings, pages 168-184, 2017

Model-finders such as SAT-solvers are attractive for producing concrete models, either as sample instances or as counterexamples when properties fail. However, the generated model is arbitrary. To address this, several research efforts have proposed principled forms of output from model-finders. These include minimal and maximal models, unsat cores, and proof-based provenance of facts. While these methods enjoy elegant mathematical foundations, they have not been subjected to rigorous evaluation on users to assess their utility. This paper presents user studies of these three forms of output performed on advanced students. We find that most of the output forms fail to be effective, and in some cases even actively mislead users. To make such studies feasible to run frequently and at scale, we also show how we can pose such studies on the crowdsourcing site Mechanical Turk.

Towards Efficient Deep Inference for Mobile Applications

Tian Guo

arXiv:1707.04610

Modern mobile applications are benefiting significantly from the advancement in deep learning, e.g., implementing real-time image recognition and conversational system. Given a trained deep learning model, applications usually need to perform a series of matrix operations based on the input data, in order to infer possible output values. Because of computational complexity and size constraints, these trained models are often hosted in the cloud. To utilize these cloud-based models, mobile apps will have to send input data over the network. While cloud-based deep learning can provide reasonable response time for mobile apps, it restricts the use case scenarios, e.g. mobile apps need to have network access. With mobile specific deep learning optimizations, it is now possible to employ on-device inference. However, because mobile hardware, such as GPU and memory size, can be very limited when compared to its desktop counterpart, it is important to understand the feasibility of this new on-device deep learning inference architecture. In this paper, we empirically evaluate the inference performance of three Convolutional Neural Networks (CNNs) using a benchmark Android application we developed. Our measurement and analysis suggest that on-device inference can cost up to two orders of magnitude greater response time and energy when compared to cloud-based inference, and that loading model and computing probability are two performance bottlenecks for on-device deep inferences.

Botnet Protocol Inference in the Presence of Encrypted Traffic

L. De Carli, R. Torres, G. Modelo-Howard, A. Tongaonkar, and S. Jha

IEEE INFOCOM 2017

Network protocol reverse engineering of botnet command and control (C&C) is a challenging task, which requires various manual steps and a significant amount of domain knowledge. Furthermore, most of today’s C&C protocols are encrypted, which prevents any analysis on the traffic without first discovering the encryption algorithm and key. To address these challenges, we present an end-to-end system for automatically discovering the encryption algorithm and keys, generating a protocol specification for the C&C traffic, and crafting effective network signatures. In order to infer the encryption algorithm and key, we enhance state-of-the-art techniques to extract this information using lightweight binary analysis. In order to generate protocol specifications we infer field types purely by analyzing network traffic. We evaluate our approach on three prominent malware families: Sality, ZeroAccess and Ramnit. Our results are encouraging: the approach decrypts all three protocols, detects 97% of fields whose semantics are supported, and infers specifications that correctly align with real protocol specifications

CAESAR: context-aware event stream analytics for urban transportation services

Olga Poppe, Chuan Lei, Elke A. Rundensteiner, Daniel J. Dougherty, Goutham Deva, Nicholas Fajardo, James Owens, Thomas Schweich, MaryAnn Van Valkenburg, Sarun Paisarnsrisomsuk, Pitchaya Wiratchotisatian, George Gettel, Robert Hollinger, Devin Roberts, and Daniel Tocco

Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017, Venice, Italy, March 21-24, 2017., pages 590-593, 2017

We demonstrate the first full-fledged context-aware event processing solution, called CAESAR, that supports application contexts as first class citizens. CAESAR offers human-readable specification of context-aware application semantics composed of context derivation and context processing. Both classes of queries are only relevant during their respective contexts. They are suspended otherwise to save resources and to speed up the system responsiveness to the current situation. Furthermore, we demonstrate the context- driven optimization techniques including context window push-down and query workload sharing among overlapping context windows. We illustrate the usability and performance gain of our CAESAR system by a use case scenario for urban transportation services using real data sets.

Cimplifier: Automatically Debloating Containers

V. Rastogi, D. Davidson, L. De Carli, S. Jha, and P. McDaniel

Fast Software Encryption (FSE) 2017, Tokyo, Japan

Application containers, such as those provided by Docker, have recently gained popularity as a solution for agile and seamless software deployment. These light-weight virtualization environments run applications that are packed together with their resources and configuration information, and thus can be deployed across various software platforms. Unfortunately, the ease with which containers can be created is oftentimes a double-edged sword, encouraging the packaging of logically distinct applications, and the inclusion of significant amount of unnecessary components, within a single container. These practices needlessly increase the container size—sometimes by orders of magnitude. They also decrease the overall security, as each included component—necessary or not— may bring in security issues of its own, and there is no isolation between multiple applications packaged within the same container image. We propose algorithms and a tool called Cimplifier, which address these concerns: given a container and simple user-defined constraints, our tool partitions it into simpler containers, which (i) are isolated from each other, only communicating as necessary, and (ii) only include enough resources to perform their functionality. Our evaluation on real-world containers demonstrates that Cimplifier preserves the original functionality, leads to reduction in image size of up to 95%, and processes even large containers in under thirty seconds.

Providing Geo-Elasticity in Geographically Distributed Clouds

Tian Guo, Prashant Shenoy

ACM Transactions on Internet Technology (TOIT'17)

Geographically distributed cloud platforms are well suited for serving a geographically diverse user base. However traditional cloud provisioning mechanisms that make local scaling decisions are not adequate for delivering best possible performance for modern web applications that observe both temporal and spatial workload fluctuations. In this paper, we propose GeoScale, a system that provides geo-elasticity by combining model-driven proactive and agile reactive provisioning approaches. GeoScale can dynamically provision server capacity at any location based on workload dynamics. We conduct a detailed evaluation of GeoScale on Amazon’s geo-distributed cloud, and show up to 40% improvement in the 95th percentile response time when compared to traditional elasticity techniques.

On the Feasibility of Cloud-Based SDN Controllers for Residential Networks

Curtis R. Taylor, Tian Guo, Craig A. Shue, and Mohamed E. Najd

2017 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN'17)

Residential networks are home to increasingly diverse devices, including embedded devices that are part of the Internet of Things phenomenon, leading to new management and security challenges. However, current residential solutions that rely on customer premises equipment (CPE), which often remains deployed in homes for years without updates or maintenance, are not evolving to keep up with these emerging demands. Recently, researchers have proposed to outsource the tasks of managing and securing residential networks to cloud-based security services by leveraging software-defined networking (SDN). However, the use of cloud-based infrastructure may have performance implications. In this paper, we measure the performance impact and perception of a residential SDN using a cloud-based controller through two measurement studies. First, we recruit 270 residential users located across the United States to measure residential latency to cloud providers. Our measurements suggest the cloud controller architecture provides 90% of end-users with acceptable performance with judiciously selected public cloud locations. When evaluating web page loading times of popular domains, which are particularly latency-sensitive, we found an increase of a few seconds at the median. However, optimizations could reduce this overhead for top websites in practice.

Performance and Cost Considerations for Providing Geo-Elasticity in Database Clouds

Tian Guo, and Prashant Shenoy

Transactions on Autonomous and Adaptive Systems (TAAS'17)

Online applications that serve global workload have become a norm and those applications are experiencing not only temporal but also spatial workload variations. In addition, more applications are hosting their backend tiers separately for benefits such as ease of management. To provision for such applications, traditional elasticity approaches that only consider temporal workload dynamics and assume well-provisioned backends are insufficient. Instead, in this paper, we propose a new type of provisioning mechanisms---geo-elasticity, by utilizing distributed clouds with different locations. Centered this idea, we build a system called DBScale that tracks geographic variations in the workload to dynamically provision database replicas at different cloud locations across the globe. Our geo-elastic provisioning approach comprises a regression-based model that infers database query workload from spatially distributed front-end workload, a two-node open queueing network model that estimates the capacity of databases serving both CPU and I/O-intensive query workloads, and greedy algorithms for selecting best cloud locations based on latency and cost. We implement a prototype of our DBScale system on Amazon EC2’s distributed cloud. Our experiments with our prototype show up to a 66% improvement in response time when compared to local elasticity approaches.

Latency-aware Virtual Desktops Optimization in Distributed Clouds

Tian Guo, Prashant Shenoy, K. K. Ramakrishnan, and Vijay Gopalakrishnan

Multimedia Systems (MMSJ'17)

Distributed clouds offer a choice of data center locations for providers to host their applications. In this paper we consider distributed clouds that host virtual desktops which are then accessed by users through remote desktop protocols. Virtual desktops have different levels of latency-sensitivity, primarily determined by the actual applications running and affected by the end users’ locations. In the scenario of mobile users, even switching between 3G and WiFi networks affects the latency sensitivity. We design VMShadow, a system to automatically optimize the location and performance of latency-sensitive VMs in the cloud. VMShadow performs black-box fingerprinting of a VM’s network traffic to infer the latency-sensitivity and employs both ILP and greedy heuristic based algorithms to move highly latency-sensitive VMs to cloud sites that are closer to their end users. VMShadow employs a WAN-based live migration and a new network connection migration protocol to ensure that the VM migration and subsequent changes to the VM’s network address are transparent to end-users. We implement a prototype of VMShadow in a nested hypervisor and demonstrate its effectiveness for optimizing the performance of VM-based desktops in the cloud. Our experiments on a private as well as the public EC2 cloud show that VMShadow is able to discriminate between latency-sensitive and insensitive desktop VMs and judiciously moves only those that will benefit the most from the migration. For desktop VMs with video activity, VMShadow improves VNC’s refresh rate by 90% by migrating virtual desktop to the closer location. Transcontinental remote desktop migrations only take about 4 minutes and our connection migration proxy imposes 13µs overhead per packet.

Managing Risk in a Derivative IaaS Cloud

Prateek Sharma, Stephen Lee, Tian Guo, David Irwin, and Prashant Shenoy

IEEE Transactions on Parallel and Distributed Systems (TPDS'17)

Infrastructure-as-a-Service (IaaS) cloud platforms rent computing resources with different cost and availability tradeoffs. For example, users may acquire virtual machines (VMs) in the spot market that are cheap, but can be unilaterally terminated by the cloud operator. Because of this revocation risk, spot servers have been conventionally used for delay and risk tolerant batch jobs. In this paper, we develop risk mitigation policies which allow even interactive applications to run on spot servers. Our System, SpotCheck is a derivative cloud platform, and provides the illusion of an IaaS platform that offers always-available VMs on demand for a cost near that of spot servers, and supports unmodified applications. SpotCheck’s design combines virtualization-based mechanisms for fault-tolerance, and bidding and server selection policies for managing the risk and cost. We implement SpotCheck on EC2 and show that it i) provides nested VMs with 99.9989% availability, ii) achieves nearly 5× cost savings compared to using on-demand VMs, and iii) eliminates any risk of losing VM state.

2016

A realizability interpretation for intersection and union types

Daniel J. Dougherty, Ugo de'Liguoro, Luigi Liquori, and Claude Stolze

Programming Languages and Systems - 14th Asian Symposium, APLAS 2016, Hanoi, Vietnam, November 21-23, 2016, Proceedings, pages 187-205, 2016

Proof-functional logical connectives allow reasoning about the structure of logical proofs, in this way giving to the latter the status of first-class objects. This is in contrast to classical truth-functional connectives where the meaning of a compound formula is dependent only on the truth value of its subformulas. In this paper we present a typed lambda calculus, enriched with products, coproducts, and a related proof-functional logic. This calculus, directly derived from a typed calculus previously defined by two of the current authors, has been proved isomorphic to the well-known Barbanera-Dezani-Ciancaglini-de'Liguoro type assignment system. We present a logic L featuring two proof-functional connectives, namely strong conjunction and strong disjunction. We prove the typed calculus to be isomorphic to the logic L and we give a realizability semantics using Mints' realizers [?] and a completeness theorem. A prototype implementation is also described.

What ad blockers are (and are not) doing

Craig E. Wills, and Doruk C. Uzunoglu

Proceedings of the IEEE Workshop on Hot Topics in Web Systems and Technologies

The Web has many types of third-party domains and has a variety of available ad blockers. This work evaluates ad blocking tools for their effectiveness in blocking the retrieval of different categories of third-party content during download of popular websites. The results of this work demonstrate that there is much variation in the effectiveness of current ad blocking tools to prevent requests to different types of third-party domains. Non-configurable tools such as Blur and Disconnect provide only modest blockage of third-party domains in most categories. The tool uBlock generally demonstrates the best default configuration performance. By default, Ghostery provides no protection while Adblock Plus and Adguard provide minimal protection. They must be manually configured to obtain effective protection. The behavior of Adblock Plus is particularly notable as usage data indicates it has an 85% share of the ad blocking tool market. Other results based on network traces suggest that approximately 80% of these Adblock Plus users employ its default configuration. Construction of a “composite” ad blocker reflecting current usage of ad blockers and their configurations shows this composite ad blocker provides only a modest range reduction of 13-34% in the set of third-party domains retrieved in each category relative to not employing any ad blocker.

Validating Security Protocols with Cloud-Based Middleboxes

Curtis R. Taylor, and Craig A. Shue

IEEE Conference on Communications and Network Security (CNS)

Residential networks pose a unique challenge for security since they are operated by end-users that may not have security expertise. Residential networks are also home to devices that may have lackluster security protections, such as Internet of Things (IoT) devices, which may introduce vulnerabilities. In this work, we introduce TLSDeputy, a middlebox-based system to protect residential networks from connections to inauthentic TLS servers. By combining the approach with OpenFlow, a popular software-defined networking protocol, we show that we can effectively provide residential network-wide protections across diverse devices with minimal performance overheads.

Elastic Resource Management in Distributed Clouds

Tian Guo

Ph.D. thesis, University of Massachusetts Amherst.

The ubiquitous nature of computing devices and their increasing reliance on remote resources have driven and shaped public cloud platforms into unprecedented large-scale, distributed data centers. Concurrently, a plethora of cloud-based applications are experiencing multi-dimensional workload dynamics—workload volumes that vary along both time and space axes and with higher frequency. The interplay of diverse workload characteristics and distributed clouds raises several key challenges for efficiently and dynamically managing server resources. First, current cloud platforms impose certain restrictions that might hinder some resource management tasks. Second, an application-agnostic approach might not entail appropriate performance goals, therefore, requires numerous specific methods. Third, provisioning resources outside LAN boundary might incur huge delay which would impact the desired agility. In this dissertation, I investigate the above challenges and present the design of automated systems that manage resources for various applications in distributed clouds. The intermediate goal of these automated systems is to fully exploit potential benefits such as reduced network latency offered by increasingly distributed server resources. The ultimate goal is to improve end-to-end user response time with novel resource management approaches, within a certain cost budget. Centered around these two goals, I first investigate how to optimize the location and performance of virtual machines in distributed clouds. I use virtual desktops, mostly serving a single user, as an example use case for developing a black-box approach that ranks virtual machines based on their dynamic latency requirements. Those with high latency sensitivities have a higher priority of being placed or migrated to a cloud location closest to their users. Next, I relax the assumption of well-provisioned virtual machines and look at how to provision enough resources for applications that exhibit both temporal and spatial workload fluctuations. I propose an application-agnostic queueing model that captures the resource utilization and server response time. Building upon this model, I present a geo-elastic provisioning approach—referred as geo-elasticity—for replicable multi-tier applications that can spin up an appropriate amount of server resources in any cloud locations. Last, I explore the benefits of providing geo-elasticity for database clouds, a popular platform for hosting application backends. Performing geo-elastic provisioning for backend database servers entails several challenges that are specific to database workload, and therefore requires tailored solutions. In addition, cloud platforms offer resources at various prices for different locations. Towards this end, I propose a cost-aware geo-elasticity that combines a regression-based workload model and a queueing network capacity model for database clouds. In summary, hosting a diverse set of applications in an increasingly distributed cloud makes it interesting and necessary to develop new, efficient and dynamic resource management approaches.

Whole Home Proxies: Bringing Enterprise-Grade Security to Residential Networks

Curtis R. Taylor, Craig A. Shue, and Mohamed E. Najd

IEEE ICC Communication and Information Systems Security Symposium

While enterprise networks follow best practices and security measures, residential networks often lack these protections. Home networks have constrained resources and lack a dedicated IT staff that can secure and manage the network and systems. At the same time, homes must tackle the same challenges of securing heterogeneous devices when communicating to the Internet. In this work, we explore combining software-defined networking and proxies with commodity residential Internet routers. We evaluate a "whole home" proxy solution for the Skype video conferencing application to determine the viability of the approach in practice. We find that we are able to automatically detect when a device is about to use Skype and dynamically intercept all of the Skype communication and route it through a proxy while not disturbing unrelated network flows. Our approach works across multiple operating systems, form factors, and versions of Skype.

Domain-Z: 28 Registrations Later

Chaz Lever, Robert J. Walls, Yacin Nadji, David Dagon, Patrick McDaniel, and Manos Antonakakis

IEEE Symposium on Security and Privacy

Any individual that re-registers an expired domain implicitly inherits the residual trust associated with the domain’s prior use. We find that adversaries can, and do, use malicious reregistration to exploit domain ownership changes—undermining the security of both users and systems. In fact, we find that many seemingly disparate security problems share a root cause in residual domain trust abuse. With this study we shed light on the seemingly unnoticed problem of residual domain trust by measuring the scope and growth of this abuse over the past six years. During this time, we identified 27,758 domains from public blacklists and 238,279 domains resolved by malware that expired and then were maliciously re-registered. To help address this problem, we propose a technical remedy and discuss several policy remedies. For the former, we develop Alembic, a lightweight algorithm that uses only passive observations from the Domain Name System (DNS) to flag potential domain ownership changes. We identify several instances of residual trust abuse using this algorithm, including an expired APT domain that could be used to revive existing infections.

Contextual, Flow-Based Access Control with Scalable Host-based SDN Techniques

Curtis R. Taylor, Douglas C. MacFarland, Doran R. Smestad, and Craig A. Shue

IEEE INFOCOM Conference

Network operators can better understand their networks when armed with a detailed understanding of the network traffic and host activities. Software-defined networking (SDN) techniques have the potential to improve enterprise security, but the current techniques have well-known data plane scalability concerns and limited visibility into the host's operating context.
In this work, we provide both detailed host-based context and fine-grained control of network flows by shifting the SDN agent functionality from the network infrastructure into the end-hosts. We allow network operators to write detailed network policy that can discriminate based on user and program information associated with network flows. In doing so, we find our approach scales far beyond the capabilities of OpenFlow switching hardware, allowing each host to create over 25 new flows per second with no practical bound on the number of established flows in the network.

Context-aware event stream analytics

Olga Poppe, Chuan Lei, Elke A. Rundensteiner, and Daniel J. Dougherty

Proceedings of the 19th International Conference on Extending Database Technology, EDBT 2016, Bordeaux, France, March 15-16, 2016, Bordeaux, France, March 15-16, 2016., pages 413-424, 2016

Complex event processing is a popular technology for continuously monitoring high-volume event streams from health care to traffic management to detect complex compositions of events. These event compositions signify critical application contexts from hygiene violations to traffic accidents. Certain event queries are only appropriate in particular contexts. Yet state-of-the-art streaming engines tend to execute all event queries continuously regardless of the current application context. This wastes tremendous processing resources and thus leads to delayed reactions to critical situations. We have developed the first context-aware event process- ing solution, called CAESAR, which features the following key innovations. (1) The CAESAR model supports applica- tion contexts as first class citizens and associates appropriate event queries with them. (2) The CAESAR optimizer em- ploys context-aware optimization strategies including con- text window push-down strategy and query workload shar- ing among overlapping contexts. (3) The CAESAR infras- tructure allows for lightweight event query suspension and activation driven by context windows. Our experimental study utilizing both the Linear Road stream benchmark as well as real-world data sets demonstrates that the context- aware event stream analytics consistently outperforms the state-of-the-art strategies by factor of 8 on average.

Placement Strategies for Virtualized Network Functions in a NFaaS Cloud

Xin He, Tian Guo, Erich Nahum and Prashant Shenoy

Fourth IEEE Workshop on Hot Topics in Web Systems and Technologies (HotWeb'16)

Enterprises that host services in the cloud need to protect their cloud resources using network services such as firewalls and deep packet inspection systems. While middleboxes have typically been used to implement such network functions in traditional enterprise networks, their use in cloud environments by cloud tenants is problematic due to the boundary between cloud providers and cloud tenants. Instead we argue that network function virtualization is a natural fit in cloud environments, where the cloud provider can implement Network Functions as a Service using virtualized network functions running on cloud servers, and enterprise cloud tenants can employ these services to implement security and performance optimizations for their cloud resources. In this paper, we focus on placement issues in the design of a NFaaS cloud and present two placement strategies---tenant-centric and service-centric---for deploying virtualized network services in multi-tenant settings. We discuss several trade-offs of these two strategies. We implement a prototype NFaaS testbed and conduct a series of experiments to show to quantify the benefits and drawbacks of our two strategies. Our results suggest that the tenant-centric placement provides lower latencies while service-centric approach is more flexible for reconfiguration and capacity scaling.

Flint: Batch-Interactive Data-Intensive Processing on Transient Servers

Prateek Sharma, Tian Guo, Xin He, David Irwin, Prashant Shenoy

Procceedings of the Eleventh European Conference on Computer Systems (EuroSys'16)

Cloud providers now offer transient servers, which they may revoke at anytime, for significantly lower prices than on-demand servers, which they cannot revoke. Transient servers’ low price is particularly attractive for executing an emerging class of workload, which we call Batch-Interactive Data-Intensive (BIDI), that is becoming increasingly impor- tant for data analytics. BIDI workloads require large sets of servers to cache massive datasets in memory to enable low latency operation. In this paper, we illustrate the challenges of executing BIDI workloads on transient servers, where re- vocations (akin to failures) are the common case. To address these challenges, we design Flint, which is based on Spark and includes automated checkpointing and server selection policies that i) support batch and interactive applications and ii) dynamically adapt to application characteristics. We evaluate a prototype of Flint using EC2 spot instances, and show that it yields cost savings of up to 90% compared to using on-demand servers, while increasing running time by < 2%.

GeoScale: Providing Geo-Elasticity in Distributed Clouds

Tian Guo, Prashant Shenoy, Hakan Hacigumus

Proceedings of 2016 IEEE International Conference on Cloud Engineering (IC2E'16)

Distributed cloud platforms are well suited for serving a geographically diverse user base. However traditional cloud provisioning mechanisms that make local scaling decisions are not well suited for temporal and spatial workload fluctuations seen by modern web applications. In this paper, we argue the need of geo-elasticity and present GeoScale, a system to provide geo-elasticity in distributed clouds. We describe GeoScale’s model-driven proactive provisioning ap- proach and conduct an initial evaluation of GeoScale on Amazon’s distributed EC2 cloud. Our results show up to 31% improvement in the 95th percentile response time when compared to traditional elasticity techniques.

Analyzing the Efficiency of a Green University Data Center

Patrick Pegus II, Benoy Varghese, Tian Guo, David Irwin, Prashant Shenoy, Anirban Mahanti, James Culbert, John Goodhue, Chris Hill

Proceedings of 2016 ACM International Conference on Performance Engineering (ICPE'16)

Data centers are an indispensable part of today’s IT infrastructure. To keep pace with modern computing needs, data centers continue to grow in scale and consume increasing amounts of power. While prior work on data centers has led to significant improvements in their energy-efficiency, detailed measurements from these facilities’ operations are not widely available, as data center design is often considered part of a company’s competitive advantage. However, such detailed measurements are critical to the research community in motivating and evaluating new energy-efficiency optimizations. In this paper, we present a detailed analysis of a state-of-the-art 15MW green multi-tenant data center that incorporates many of the technological advances used in commercial data centers. We analyze the data center’s computing load and its impact on power, water, and carbon usage using standard effectiveness metrics, including PUE, WUE, and CUE. Our results reveal the benefits of optimizations, such as free cooling, and provide insights into how the various effectiveness metrics change with the seasons and increasing capacity usage. More broadly, our PUE, WUE, and CUE analysis validate the green design of this LEED Platinum data center.

2015

Measuring the Impact and Perception of Acceptable Advertisements

Robert J. Walls, Eric D. Kilmer, Nathaniel Lageman, and Patrick D. McDaniel

Proceedings of the ACM 2015 Internet Measurement Conference (IMC)

In 2011, Adblock Plus—the most widely-used ad blocking software— began to permit some advertisements as part of their Acceptable Ads program. Under this program, some ad networks and content providers pay to have their advertisements shown to users. Such practices have been controversial among both users and publishers. In a step towards informing the discussion about these practices, we present the first comprehensive study of the Acceptable Ads program. Specifically, we characterize which advertisements are allowed and how the whitelisting has changed since its introduction in 2011. We show that the list of filters used to whitelist acceptable advertisements has been updated on average every 1.5 days and grew from 9 filters in 2011 to over 5,900 in the Spring of 2015. More broadly, the current whitelist triggers filters on 59% of the top 5,000 websites. Our measurements also show that the program allows advertisements on 2.6 million parked domains. Lastly, we take the lessons learned from our analysis and suggest ways to improve the transparency of the whitelisting process.

The SDN Shuffle: Creating a Moving-Target Defense using Host-based Software-Defined Networking

Douglas C. MacFarland, and Craig A. Shue

ACM CCS Workshop on Moving Target Defense (MTD)

Moving target systems can help defenders limit the utility of reconnaissance for adversaries, hindering the effectiveness of attacks. While moving target systems are a topic of robust research, we find that prior work in network-based moving target defenses has limitations in either scalability or the ability to protect public servers accessible to unmodified clients. In this work, we present a new moving target defense using software-defined networking (SDN) that can service unmodified clients while avoiding scalability limitations. We then evaluate this approach according to seven moving-target properties and evaluate its performance. We find that the approach achieves its security goals while introducing low overheads.

Characterizing Network-Based Moving Target Defenses

Marc Green, Douglas C. MacFarland, Doran R. Smestad, and Craig A. Shue

ACM CCS Workshop on Moving Target Defense (MTD)

The moving target defense (MTD) strategy allows defenders to limit the effectiveness of attacker reconnaissance and exploitation. Many academic works have created MTDs in different deployment environments. However, network-based MTDs (NMTDs) share key components and properties that determine their effectiveness. In this work, we identify and define seven properties common to NMTDs which are key to ensuring the effectiveness of the approach. We then evaluate four NMTD systems using these properties and found two or more key concerns for each of the systems. This analysis shows that these properties may help guide developers of new NMTD systems by guiding the evaluation of these systems and can be used by others as a rubric to assess the strengths and limitations of each NMTD approach.

Exploring theories with a model-finding assistant

Salman Saghafi, Ryan Danas, and Daniel J. Dougherty

Amy P. Felty and Aart Middeldorp, editors, Automated Deduction - CADE-25 - 25th International Conference on Automated Deduction, Berlin, Germany, August 1-7, 2015, Proceedings, volume 9195 of Lecture Notes in Computer Science, pages 434-449. Springer, 2015

We present an approach to understanding first-order theories by exploring their models. A typical use case is the analysis of artifacts such as policies, protocols, configurations, and software designs. For the analyses we offer, users are not required to frame formal properties or construct derivations. Rather, they can explore examples of their designs, confirming the expected instances and perhaps recognizing bugs inherent in surprising instances. Key foundational ideas include: the information preorder on models given by homomorphism, an inductively-defined refinement of the Herbrand base of a theory, and a notion of provenance for elements and facts in models. The implementation makes use of SMT-solving and an algorithm for minimization with respect to the information preorder on models. Our approach is embodied in a tool, Razor, that is complete for finite satisfiability and provides a read-eval-print loop used to navigate the set of finite models of a theory and to display provenance.

Characterizing Optimal DNS Amplification Attacks and Effective Mitigation

Douglas C. MacFarland, Craig A. Shue, and Andrew J. Kalafut

Passive and Active Measurement Conference

Attackers have used DNS amplification in over 34\% of high-volume DDoS attacks, with some floods exceeding 300Gbps. The best current practices do not help victims during an attack; they are preventative measures that third-party organizations must employ in advance. Unfortunately, there are no incentives for these third parties to follow the recommendations. While practitioners have focused on reducing the number of open DNS resolvers, these efforts do not address the threat posed by authoritative DNS servers.
In this work, we measure and characterize the attack potential associated with DNS amplification, along with the adoption of countermeasures. We then propose and measure a mitigation strategy that organizations can employ. With the help of an upstream ISP, our strategy will allow even poorly provisioned organizations to mitigate massive DNS amplification attacks with only minor performance overheads.

SpotOn: A Batch Computing Service for the Spot Market

Supreeth Subramanya, Tian Guo, Prateek Sharma, David Irwin, and Prashant Shenoy

Cloud spot markets enable users to bid for compute resources, such that the cloud platform may revoke them if the market price rises too high. Due to their increased risk, revocable resources in the spot market are often significantly cheaper (by as much as 10X) than the equivalent non-revocable on-demand resources. One way to mitigate spot market risk is to use various fault-tolerance mechanisms, such as checkpointing or replication, to limit the work lost on revocation. However, the additional performance overhead and cost for a particular fault-tolerance mechanism is a complex function of both an application’s resource usage and the magnitude and volatility of spot market prices. We present the design of a batch computing service for the spot market, called SpotOn, that automatically selects a spot market and fault-tolerance mechanism to mitigate the impact of spot revocations without requiring application modification. SpotOn’s goal is to execute jobs with the performance of on-demand resources, but at a cost near that of the spot market. We implement and evaluate SpotOn in simulation and using a prototype on Amazon’s EC2 that packages jobs in Linux Containers. Our simulation results using a job trace from a Google cluster indicate that SpotOn lowers costs by 91.9% compared to using on-demand resources with little impact on performance. paper-url: /assets/papers/spoton.pdf info: Proceedings of the 6th Annual Symposium on Cloud Computing (SoCC'15)

Model-driven Geo-Elasticity In Database Clouds

Tian Guo and Prashant Shenoy

International Conference on Autonomic Computing and Communications (ICAC'15)

Motivated by the emergence of distributed clouds, we argue for the need for geo-elastic provisioning of application replicas to effectively handle temporal and spatial workload fluctuations seen by such applications. We present DBScale, a system that tracks geographic variations in the workload to dynamically provision database replicas at different cloud locations across the globe. Our geo-elastic provisioning approach comprises a regression-based model to infer the database query workload from observations of the spatially distributed frontend workload and a two-node open queueing network model to provision databases with both CPU and I/O-intensive query workloads. We implement a prototype of our DBScale system on Amazon EC2’s distributed cloud. Our experiments with our prototype show up to a 66% improvement in response time when compared to local elasticity approaches.

SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market

Prateek Sharma, Stephen Lee, Tian Guo, David Irwin, and Prashant Shenoy

Procceedings of the Tenth European Conference on Computer Systems (EuroSys'15)

nfrastructure-as-a-Service (IaaS) cloud platforms rent resources, in the form of virtual machines (VMs), under a variety of contract terms that offer different levels of risk and cost. For example, users may acquire VMs in the spot market that are often cheap but entail significant risk, since their price varies over time based on market supply and demand and they may terminate at any time if the price rises too high. Currently, users must manage all the risks associated with using spot servers. As a result, conventional wisdom holds that spot servers are only appropriate for delay-tolerant batch applications. In this paper, we propose a derivative cloud platform, called SpotCheck, that transparently manages the risks associated with using spot servers for users. SpotCheck provides the illusion of an IaaS platform that offers always-available VMs on demand for a cost near that of spot servers, and supports all types of applications, including interactive ones. SpotCheck’s design combines the use of nested VMs with live bounded-time migration and novel server pool management policies to maximize availability, while balancing risk and cost. We implement SpotCheck on Amazon’s EC2 and show that it i) provides nested VMs to users that are 99.9989% available, ii) achieves nearly 5X cost savings compared to using equivalent types of on-demand VMs, and iii) eliminates any risk of losing VM state.

Publications before 2015 can be found in individual websites.

2026

Can Foundation Models Revolutionize Mobile AR Sparse Sensing?

Yiqin Zhao, Tian Guo

International Workshop on Foundation Models for Cyber-Physical Systems & Internet of Things 2026 (FMSys'26)

See, Record, Do: Automated Generation of UI Workflows from Tutorial Videos

Adam Beauchaine, Craig A. Shue

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), March 2026

A High-Fidelity Robotic Manipulator Teleoperation Framework for Human-Centered Augmented Reality Evaluation

Harsh Chhajed, Tian Guo

ACM Multimedia Systems Conference 2026 (MMSys'26)

AR as an Evaluation Playground: Bridging Metrics and Visual Perception of Computer Vision Models

Ashkan Ganj, Yiqin Zhao, Tian Guo

ACM Multimedia Systems Conference 2026 (MMSys'26)

2025

'We just did not have that on the embedded system': Insights and Challenges for Securing Microcontroller Systems from the Embedded CTF Competitions.

Zheyuan Ma, Gaoxiang Liu, Alex Eastman, Kai Kaufman, Md Armanuzzaman, Xi Tan, Katherine Jesse, Robert J Walls, and Ziming Zhao

Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security.

Functional Control: Leveraging Function-as-a-Service Platforms for Software-Defined Networking Controllers

Shuwen Liu, Craig A. Shue

ACM MobiHoc (International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing), October 2025

Carbon-Efficient Internet Video Streaming

Zichen Zhu, Tian Guo, Sheng Wei

IEEE International Workshop on Multimedia Signal Processing (MMSP'25)

CleAR: Robust Context-Guided Generative Lighting Estimation for Mobile Augmented Reality

Yiqin Zhao, Mounika Dasari, Tian Guo

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Computing (IMWUT'25)

REVDECODE: Enhancing binary function matching with context-aware graph representations and relevance decoding.

Tongwei Ren, Ronghan Che, Guin Gilman, Lorenzo De Carli, and Robert J Walls.

34th USENIX Security Symposium, USENIX Security '25

ReFINE: A reactive and fine-grained scheduling framework for concurrency on general purpose GPUs.

Guin Gilman and Robert J Walls.

Proceedings of the 37th ACM Symposium on Parallelism in Algorithms and Architectures

Making (Only) the Right Calls: Preventing Remote Code Execution Attacks in PHP Applications with Contextual, State-Sensitive System Call Filtering

Yunsen Lei, Craig A. Shue

Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA) conference, July 2025

BridgeGC: An Efficient Cross-Level Garbage Collector for Big Data Frameworks

Yicheng Wang, Lijie Xu, Tian Guo, Wensheng Dou, Hongbin Zeng, Wei Wang, Jun Wei, Tao Huang

ACM Transactions on Architecture and Code Optimization (TACO'25)

CarbonDIS: Carbon-Aware DNN Inference Scheduling on Heterogeneous GPUs

Morteza Nabavinejad, Weiwei Jia, Shubbhi Taneja, Tian Guo

IEEE 25th International Symposium on Cluster, Cloud and Internet Computing (CCGrid'25)

HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image Priors

Ashkan Ganj, Hang Su, Tian Guo

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV'25)

2024

CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework

Yiyang Zhao, Yunzhuo Liu, Bo Jiang, Tian Guo

Advances in Neural Information Processing Systems (NeurIPS'24)

Towards In-context Environment Sensing for Mobile Augmented Reality

Yiqin Zhao, Ashkan Ganj, Tian Guo

Proceedings of the 30th Annual International Conference on Mobile Computing and Networking (ImmerCom'24)

Toward Robust Depth Fusion for Mobile AR With Depth from Focus and Single-Image Priors

Ashkan Ganj, Hang Su, Tian Guo

IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct'24)

Multi-Camera Lighting Estimation for Mobile Augmented Reality

Yiqin Zhao, Sean Fanello, Tian Guo

GetMobile: Mobile Computing and Communications (GetMobile'24)

Scoping Sustainable Collaborative Mixed Reality

Yasra Chandio, Noman Bashir, Tian Guo, Elsa Olivetti, Fatima Anwar

IEEE International Symposium on Emerging Metaverse (ISEMV'24)

FairCIM: Fair Interference Mitigation by DNN Switching for Latency-Sensitive Inference Jobs

Morteza Nabavinejad, Sherief Reda, Tian Guo

IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS'24)

MediatorDNN: Contention Mitigation for Co-Located DNN Inference Jobs

Morteza Nabavinejad, Sherief Reda, Tian Guo

IEEE 17th International Conference on Cloud Computing (CLOUD'24)

Multi-Objective Neural Architecture Search by Learning Space Partitions

Yiyang Zhao, Linnan Wang, Tian Guo

Journal of Machine Learning Research (JMLR'24)

ARFlow: A Framework for Simplifying AR Experimentation Workflow

Yiqin Zhao, Tian Guo

25th International Workshop on Mobile Computing Systems and Applications (HotMobile'24)

Mobile AR Depth Estimation: Challenges & Prospects

Ashkan Ganj, Yiqin Zhao, Hang Su, Tian Guo

25th International Workshop on Mobile Computing Systems and Applications (HotMobile'24)

2023

Portrait Expression Editing With Mobile Photo Sequence

Yiqin Zhao, Rohit Pandey, Yinda Zhang, Ruofei Du, Feitong Tan, Chetan Ramaiah, Tian Guo, Sean Fanello

SIGGRAPH Asia 2023 Technical Communications (SIGGRAPH Asia'23)

Toward a (Secure) Path of Least Resistance: An Examination of Usability Challenges in Secure Sandbox Systems