Research
Groundbreaking work and published results in peer reviewed journals across disciplines.
Title
Topic
-
‘Synthesis of Distributed Protocols by Enumeration Modulo Isomorphisms’
“Synthesis of distributed protocols is a hard, often undecidable, problem. Completion techniques provide partial remedy by turning the problem into a search problem. However, the space of candidate completions is still massive. In this paper, we propose optimization techniques to reduce the size of the search space by a factorial factor by exploiting symmetries (isomorphisms) in functionally equivalent solutions. We present both a theoretical analysis of this optimization as well as empirical results that demonstrate its effectiveness in synthesizing both the Alternating Bit Protocol and Two Phase Commit.” Find the paper and the full list of authors at ArXiv.
-
‘”Who is the Right Homeless Client?”: Values in Algorithmic Homelessness Service Provision and Machine Learning Research’
“Homelessness presents a long-standing problem worldwide. Like other welfare services, homeless services have gained increased traction in Machine Learning (ML) research. Unhoused persons are vulnerable and using their data in the ML pipeline raises serious concerns about the unintended harms and consequences of prioritizing different ML values. … Unhoused persons were lost (i.e., humans were deprioritized) at multi-level ML abstraction of predictors, categories and algorithms. Our findings illuminate potential pathways forward … by situating humans at the center to support this vulnerable community.” Find the paper and full list of authors in the Conference on Human Factors in Computing Systems,…
-
‘Why, When and From Whom: Considerations for Collecting and Reporting Race and Ethnicity Data in HCI’
“Engaging diverse participants in HCI research is critical for creating safe, inclusive, and equitable technology. However, there is a lack of guidelines on when, why, and how HCI researchers collect study participants’ race and ethnicity. Our paper aims to take the first step toward such guidelines by providing a systematic review and discussion of the status quo of race and ethnicity data collection in HCI.” Find the paper and full list of authors in the Conference on Human Factors in Computing Systems, 2023, proceedings.
-
‘How to Combine Membership-Inference Attacks on Multiple Updated Machine Learning Models’
“A large body of research has shown that machine learning models are vulnerable to membership inference (MI) attacks that violate the privacy of the participants in the training data. Most MI research focuses on the case of a single standalone model, while production machine-learning platforms often update models over time, on data that often shifts in distribution, giving the attacker more information. This paper proposes new attacks that take advantage of one or more model updates to improve MI.” Find the paper and the full list of authors in the Proceedings on Privacy Enhancing Technologies Symposium.
-
‘TMI! Finetuned Models Leak Private Information from their Pretraining Data’
“Transfer learning has become an increasingly popular technique in machine learning as a way to leverage a pretrained model … to assist with building a finetuned model. … There are reasons to believe that the data used for pretraining is still sensitive, making it essential to understand how much information the finetuned model leaks about the pretraining data. In this work we propose a new membership-inference threat model where the adversary only has access to the finetuned model and would like to infer the membership of the pretraining data.” Find the paper and full list of authors at ArXiv.
-
‘Differentially Private Medians and Interior Points for Non-Pathological Data’
“We construct differentially private estimators with low sample complexity that estimate the median of an arbitrary distribution over ℝ satisfying very mild moment conditions. Our result stands in contrast to the surprising negative result of Bun et al. (FOCS 2015) that showed there is no differentially private estimator with any finite sample complexity that returns any non-trivial approximation to the median of an arbitrary distribution.” Find the paper and the full list of authors at ArXiv.
-
‘DIALITE: Discover, Align and Integrate Open Data Tables’
“We demonstrate a novel table discovery pipeline called DIALITE that allows users to discover, integrate and analyze open data tables. DIALITE has three main stages. First, it allows users to discover tables from open data platforms using state-of-the-art table discovery techniques. Second, DIALITE integrates the discovered tables to produce an integrated table. Finally, it allows users to analyze the integration result by applying different downstreaming tasks over it. Our pipeline is flexible such that the user can easily add and compare additional discovery and integration algorithms.” Find the paper and the full list of authors at ArXiv.
-
‘A Statistical Approach for Finding Property-Access Errors’
“We study the problem of finding incorrect property accesses in JavaScript where objects do not have a fixed layout, and properties (including methods) can be added, overwritten, and deleted freely throughout the lifetime of an object. Since referencing a non-existent property is not an error in JavaScript, accidental accesses to non-existent properties … can go undetected without thorough testing, and may manifest far from the source of the problem. We propose a two-phase approach for detecting property access errors based on the observation that, in practice, most property accesses will be correct.” Find the paper and full list of authors…
-
‘Safe Environmental Envelopes of Discrete Systems’
“A safety verification task involves verifying a system against a desired safety property under certain assumptions about the environment. However, these environmental assumptions may occasionally be violated due to modeling errors or faults. Ideally, the system guarantees its critical properties even under some of these violations, i.e., the system is robust against environmental deviations. This paper proposes a notion of robustness as an explicit, first-class property of a transition system that captures how robust it is against possible deviations in the environment.” Find the paper and the full list of authors at ArXiv.
-
Hoitash and Pei win 2023 Ronald Copeland Best Paper Awards
“Udi Hoitash, Lilian L. and Harry A. Cowan Professor of Accounting, and Amy Pei, Assistant Professor, Marketing, have authored papers awarded the 2023 Copeland Best Paper Awards. Each Group Research Committee nominated a paper published in 2022 by a faculty member in the Group to be considered for a Ronald Copeland Best Paper Award. Two papers were selected as this year’s winners as recommended by the DMSB Research Committee and confirmed by Dean Emery Trahan. Each D’Amore-McKim author will receive a $1,000 stipend.”
-
‘Mississippi River Low-Flows: Context, Causes and Future Projections’
“The Mississippi River represents a major commercial waterway, and periods of anomalously low river levels disrupt riverine transport. These low-flow events occur periodically, with a recent event in the fall of 2022 slowing barge traffic and generating sharp increases in riverine transportation costs. Here we … evaluate historical trends and future projections. … Model simulations from the LENS2 dataset show that, under a moderate-high emissions scenario (SSP3-7.0), the severity and duration of low-flow events is projected to decrease through to the end of the 21st century. ” Find the paper and the full list of authors in Environmental Research: Climate.
-
‘Just-in-Time for Supply Chains in Turbulent Times’
“The Covid-19 pandemic and other recent disruptions in the early 2020s led to sections in the business press blaming just-in-time (JIT) practices for operational failings. … Some scholars argue that JIT is not resilient, while others maintain that JIT can continue providing superior performance even with disruptions. Motivated by this debate, we discuss various misconceptions about JIT that underlie this debate … and argue that companies can improve their supply chain performance if JIT supply chain segments are chosen fittingly—even more so—during disruptions.” Find the paper and the full list of authors at Production and Operations Management.
-
‘Why Not Yet: Fixing a Top-k Ranking That is Not Fair to Individuals’
“This work considers why-not questions in the context of top-k queries and score-based ranking functions. Following the popular linear scalarization approach for multi-objective optimization, we study rankings based on the weighted sum of multiple scores. A given weight choice may be controversial or perceived as unfair. … We introduce various notions of such why-not-yet queries and formally define them as satisfiability or optimization problems, whose goal is to propose alternative ranking functions that address the placement of the entities of interest.” Find the paper and the full list of authors at the Association for Computing Machinery.
-
‘Fast Optimal Locally Private Mean Estimation via Random Projections’
“We study the problem of locally private mean estimation of high-dimensional vectors in the Euclidean ball. Existing algorithms for this problem either incur sub-optimal error or have high communication and/or run-time complexity. We propose a new algorithmic framework, ProjUnit, for private mean estimation that yields algorithms that are computationally efficient, have low communication complexity, and incur optimal error up to a 1+o(1)-factor. Our framework is deceptively simple: each randomizer projects its input to a random low-dimensional subspace, normalizes the result, and then runs an optimal algorithm.” Find the paper and the full list of authors at ArXiv.
-
‘Online and Streaming Algorithms for Constrained k-Submodular Maximization’
“Constrained k-submodular maximization is a general framework that captures many discrete optimization problems such as ad allocation, influence maximization, personalized recommendation, and many others. In many of these applications, datasets are large or decisions need to be made in an online manner, which motivates the development of efficient streaming and online algorithms. In this work, we develop single-pass streaming and online algorithms for constrained k-submodular maximization with both monotone and general (possibly non-monotone) objectives subject to cardinality and knapsack constraints.” Find the paper and the full list of authors at ArXiv.
-
‘Poisoning Network Flow Classifiers’
“As machine learning (ML) classifiers increasingly oversee the automated monitoring of network traffic, studying their resilience against adversarial attacks becomes critical. This paper focuses on poisoning attacks, specifically backdoor attacks, against network traffic flow classifiers. We investigate the challenging scenario of clean-label poisoning where the adversary’s capabilities are constrained to tampering only with the training data—without the ability to arbitrarily modify the training labels or any other component of the training process.” Find the paper and the full list of authors at ArXiv.
-
‘Unleashing the Power of Randomization in Auditing Differentially Private ML’
“We present a rigorous methodology for auditing differentially private machine learning algorithms by adding multiple carefully designed examples called canaries. … First, we introduce Lifted Differential Privacy (LiDP) that expands the definition of differential privacy to handle randomized datasets. … Second, we audit LiDP by trying to distinguish between the model trained with K canaries versus K−1 canaries in the dataset, leaving one canary out. … Third, we introduce novel confidence intervals that take advantage of the multiple test statistics by adapting to the empirical higher-order correlations. Together, this new recipe demonstrates significant improvements in sample complexity.” Find the paper and the full list…
-
‘Freaky Leaky SMS: Extracting User Locations by Analyzing SMS Timings’
“Short Message Service (SMS) remains one of the most popular communication channels since its introduction. … In this paper, we demonstrate that merely receiving silent SMS messages regularly opens a stealthy side-channel that allows other regular network users to infer the whereabouts of the SMS recipient. The core idea is that receiving an SMS inevitably generates Delivery Reports whose reception bestows a timing attack vector at the sender. We conducted experiments across various countries, operators, and devices to show that an attacker can deduce the location of an SMS recipient.” Find the paper and the full list of authors at ArXiv.
-
‘How Creative Versus Technical Constraints Affect Individual Learning in an Online Innovation Community’
“Online innovation communities allow for a search for novel solutions within a design space bounded by constraints. Past research has focused on the effect of creative constraints on individual projects, but less is known about how constraints affect learning from repeated design submissions and the effect of the technical constraints that are integral to online platforms. How do creative versus technical constraints affect individual learning in exploring a design space in online communities? … We find that creative constraints lead to high rates of learning only if technical constraints are sufficiently relaxed.” Find the paper and full list of authors…
-
‘Homo in Machina: Improving Fuzz Testing Coverage via Compartment Analysis’
“Fuzz testing is often automated, but also frequently augmented by experts who insert themselves into the workflow in a greedy search for bugs. In this paper, we propose Homo in Machina, or HM-fuzzing, in which analyses guide the manual efforts, maximizing benefit. As one example … we introduce compartment analysis. Compartment analysis uses a whole-program dominator analysis to estimate the utility of reaching new code, and combines this with a dynamic analysis indicating drastically under-covered edges guarding that code.” Find the paper and full list of authors in the proceedings of the 2023 IEEE Conference on Software Testing, Verification and…
-
‘Incremental Non-Gaussian Inference for SLAM Using Normalizing Flows’
“This article presents normalizing flows for incremental smoothing and mapping (NF-iSAM), a novel algorithm for inferring the full posterior distribution in SLAM problems with nonlinear measurement models and non-Gaussian factors. NF-iSAM exploits the expressive power of neural networks, and trains normalizing flows to model and sample the full posterior. By leveraging the Bayes tree, NF-iSAM enables efficient incremental updates similar to iSAM2, albeit in the more challenging non-Gaussian setting.” Find the paper and the full list of authors in the IEEE Transactions on Robotics.
-
‘A Variational Perspective on Generative Flow Networks’
“Generative flow networks (GFNs) are a class of probabilistic models for sequential sampling of composite objects, proportional to a target distribution that is defined in terms of an energy function or a reward. GFNs are typically trained using a flow matching or trajectory balance objective, which matches forward and backward transition models over trajectories. … We introduce a variational objective for training GFNs, which is a convex combination of the reverse- and forward KL divergences, and compare it to the trajectory balance objective when sampling from the forward- and backward model, respectively.” Find the paper and full list of authors…
-
‘String Diagrams with Factorized Densities’
“A growing body of research on probabilistic programs and causal models has highlighted the need to reason compositionally about model classes that extend directed graphical models. Both probabilistic programs and causal models define a joint probability density over a set of random variables, and exhibit sparse structure that can be used to reason about causation and conditional independence. This work builds on recent work on Markov categories of probabilistic mappings to define a category whose morphisms combine a joint density … with a deterministic mapping from samples to return values.” Find the paper and the full list of authors at ArXiv.
-
‘Efficient Computation of Quantiles over Joins’
“We present efficient algorithms for Quantile Join Queries, abbreviated as %JQ. … Our goal is to avoid materializing the set of all join answers, and to achieve quasilinear time in the size of the database, regardless of the total number of answers. Even for basic ranking functions beyond sum, such as min or max over different attributes, so far it is not known whether there is any nontrivial tractable %JQ. In this work, we develop a new approach to solving %JQ.” Find the paper and the full list of authors at ArXiv.
-
‘Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries’
“We study the question of when we can provide direct access to the k-th answer to a Conjunctive Query (CQ) according to a specified order over the answers in time logarithmic in the size of the database, following a preprocessing step that constructs a data structure in time quasilinear in database size. Specifically, we embark on the challenge of identifying the tractable answer orderings, that is, those orders that allow for such complexity guarantees.” Find the paper and the full list of authors in the ACM Transactions on Database Systems.
-
‘FibeRed: Fiberwise Dimensionality Reduction of Topologically Complex Data with Vector Bundles’
“Datasets with non-trivial large scale topology can be hard to embed in low-dimensional Euclidean space with existing dimensionality reduction algorithms. We propose to model topologically complex datasets using vector bundles, in such a way that the base space accounts for the large scale topology, while the fibers account for the local geometry. This allows one to reduce the dimensionality of the fibers, while preserving the large scale topology. We formalize this point of view and … describe a dimensionality reduction algorithm based on topological inference for vector bundles.” Find the paper and full list of authors at Dagstuhl Research Online.
-
‘Toroidal Coordinates: Decorrelating Circular Coordinates with Lattice Reduction’
“The circular coordinates algorithm of de Silva, Morozov, and Vejdemo-Johansson takes as input a dataset together with a cohomology class representing a 1-dimensional hole in the data; the output is a map from the data into the circle that captures this hole, and that is of minimum energy in a suitable sense. However, when applied to several cohomology classes, the output circle-valued maps can be ‘geometrically correlated’ even if the chosen cohomology classes are linearly independent. … We identify a formal notion of geometric correlation between circle-valued maps.” Find the paper and full list of authors at Dagstuhl Research Online.
-
‘Topological Data Analysis of Electroencephalogram Signals for Pediatric Obstructive Sleep Apnea’
“Topological data analysis (TDA) is an emerging technique for biological signal processing. TDA leverages the invariant topological features of signals in a metric space for robust analysis of signals even in the presence of noise. In this paper, we leverage TDA on brain connectivity networks derived from electroencephalogram (EEG) signals to identify statistical differences between pediatric patients with obstructive sleep apnea (OSA) and pediatric patients without OSA … and show that TDA enables us to see a statistical difference between the brain dynamics of the two groups.” Find the paper and the full list of authors at ArXiv.
-
‘A Video-Based End-to-End Pipeline for Non-Nutritive Sucking Action Recognition and Segmentation in Young Infants’
“We present an end-to-end computer vision pipeline to detect non-nutritive sucking (NNS)—an infant sucking pattern with no nutrition delivered—as a potential biomarker for developmental delays, using off-the-shelf baby monitor video footage. One barrier to clinical (or algorithmic) assessment of NNS stems from its sparsity, requiring experts to wade through hours of footage to find minutes of relevant activity.” Find the paper and the full list of authors at ArXiv.