Research
Groundbreaking work and published results in peer reviewed journals across disciplines.
Title
Topic
-
‘Balancing Biases and Preserving Privacy on Balanced Faces in the Wild’
“There are demographic biases present in current facial recognition (FR) models. To measure these biases across different ethnic and gender subgroups, we introduce our Balanced Faces in the Wild (BFW) dataset. … We found that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results. Additionally, performance within subgroups often varies significantly from the global average. … To mitigate imbalanced performances, we propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.” Find the paper and full list of authors at IEEE Transactions on Image Processing.
-
‘Q: How To Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!’
“Finetuning a large vision language model (VLM) on a target dataset after large scale pretraining is a dominant paradigm in visual question answering (VQA). Datasets for specialized tasks such as knowledge-based VQA or VQA in non natural-image domains are orders of magnitude smaller than those for general-purpose VQA. While collecting additional labels for specialized tasks or domains can be challenging, unlabeled images are often available. We introduce SelTDA (Self-Taught Data Augmentation), a strategy for finetuning large VLMs on small-scale VQA datasets.” Find the paper and full list of authors at ArXiv.
-
‘SnapFusion: Text-to-Image Diffusion Model on Mobile Devices Within Two Seconds’
“Text-to-image diffusion models can create stunning images from natural language descriptions that rival the work of professional artists and photographers. However, these models are large, with complex network architectures and tens of denoising iterations, making them computationally expensive and slow to run. … This is costly and has privacy implications, especially when user data is sent to a third party. To overcome these challenges, we present a generic approach that, for the first time, unlocks running text-to-image diffusion models on mobile devices in less than 2 seconds.” Find the paper and full list of authors at ArXiv.
-
‘Hybrid Pixel-Unshuffled Network for Lightweight Image Super-resolution’
“Convolutional neural network (CNN) has achieved great success on image super-resolution (SR). However, most deep CNN-based SR models take massive computations to obtain high performance. Downsampling features for multi-resolution fusion is an efficient and effective way to improve the performance of visual recognition. Still, it is counter-intuitive in the SR task, which needs to project a low-resolution input to high-resolution. In this paper, we propose a novel Hybrid Pixel-Unshuffled Network (HPUN) by introducing an efficient and effective downsampling module into the SR task.” Find the paper and full list of authors in the AAAI Conference on Artificial Intelligence proceedings.
-
‘Generative Benchmark Creation for Table Union Search’
“Data management has traditionally relied on synthetic data generators to generate structured benchmarks … where we can control important parameters like data size and its distribution precisely. … Our current methods for creating benchmarks involve the manual curation and labeling of real data. These methods are not robust or scalable and … it is not clear how robust the created benchmarks are. We propose to use generative AI models to create structured data benchmarks for table union search. We present a novel method for using generative models to create tables with specified properties.” Find the paper and full list of…
-
‘Generative Multi-Label Correlation Learning’
“In real-world applications, … multi-label learning methods emerged in recent years. It is a more challenging problem for many reasons. … In general, overcoming these challenges and bettering learning performance could be achieved by utilizing more training samples and including label correlations. However, these solutions are expensive and inflexible. Large-scale, well-labeled datasets are difficult to obtain, and building label correlation maps requires task-specific semantic information as prior knowledge. To address these limitations, we propose a general and compact Multi-Label Correlation Learning (MUCO) framework.” Find the paper and full list of authors at ACM Transactions on Knowledge Discovery from Data.
-
‘Multitask Learning via Shared Features: Algorithms and Hardness’
“We investigate the computational efficiency of multitask learning of Boolean functions over the 𝑑-dimensional hypercube, that are related by means of a feature representation of size 𝑘≪𝑑 shared across all tasks. We present a polynomial time multitask learning algorithm for the concept class of halfspaces with margin 𝛾, which is based on a simultaneous boosting technique and requires only poly(𝑘/𝛾) samples-per-task and poly(𝑘log(𝑑)/𝛾) samples in total.” Find the paper and full list of authors in the Machine Learning Research proceedings.
-
‘SNAP: Efficient Extraction of Private Properties with Poisoning’
“Property inference attacks allow an adversary to extract global properties of the training dataset from a machine learning model. … Several existing approaches for property inference attacks against deep neural networks have been proposed, but they all rely on the attacker training a large number of shadow models. … We consider the setting of property inference attacks in which the attacker can poison a subset of the training dataset and query the trained target model.” Find the paper and full list of authors at the IEEE Symposium on Security and Privacy proceedings.
-
‘Smooth Lower Bounds for Differentially Private Algorithms via Padding-and-Permuting Fingerprinting Codes’
“Fingerprinting arguments … are the most widely used method for establishing lower bounds on the sample complexity or error of approximately differentially private (DP) algorithms. Still, there are many problems in differential privacy for which we don’t know suitable lower bounds, and even for problems that we do, the lower bounds are not smooth, and usually become vacuous when the error is larger than some threshold. In this work, we present a simple method to generate hard instances by applying a padding-and-permuting transformation to a fingerprinting code.” Find the paper and full list of authors at ArXiv.
-
‘Layout Representation Learning With Spatial and Structural Hierarchies’
“We present a novel hierarchical modeling method for layout representation learning, the core of design documents (e.g., user interface, poster, template). Existing works on layout representation often ignore element hierarchies, which is an important facet of layouts, and mainly rely on the spatial bounding boxes for feature extraction. This paper proposes a Spatial-Structural Hierarchical Auto-Encoder (SSH-AE) that learns hierarchical representation by treating a hierarchically annotated layout as a tree format.” Find the paper and full list of authors at the Proceedings of the AAAI Conference on Artificial Intelligence.
-
‘A Large Scale Analysis of Semantic Versioning in NPM’
“The NPM package repository contains over two million packages and serves tens of billions of downloads per-week. Nearly every single JavaScript application uses the NPM package manager to install packages from the NPM repository. NPM relies on a ‘semantic versioning’ (‘semver’) scheme to maintain a healthy ecosystem, where bug-fixes are reliably delivered to downstream packages as quickly as possible. … In order to understand how developers use semver, we build a dataset containing every version of every package on NPM and analyze the flow of updates throughout the ecosystem.” Find the paper and full list of authors at ArXiv.
-
‘Improving Cross-Domain Detection with Self-Supervised Learning’
“Cross-Domain Detection (XDD) aims to train a domain-adaptive object detector using unlabeled images from a target domain and labeled images from a source domain. Existing approaches achieve this either by transferring the style of source images to that of target images, or by aligning the features of images from the two domains. In this paper, rather than proposing another method following the existing lines, we introduce a new framework complementary to existing methods.” Find the paper and full list of authors in the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.
-
‘Trainability Preserving Neural Pruning’
“Many recent works have shown trainability plays a central role in neural network pruning — unattended broken trainability can lead to severe under-performance and unintentionally amplify the effect of retraining learning rate, resulting in biased (or even misinterpreted) benchmark results. This paper introduces trainability preserving pruning (TPP), a scalable method to preserve network trainability against pruning, aiming for improved pruning performance and being more robust to retraining hyper-parameters (e.g., learning rate).” Find the paper and full list of authors at Open Review. Published at ICLR 2023.
-
‘Sharing Speaker Heart Rate With the Audience Elicits Empathy and Increases Persuasion’
“Persuasion is a primary goal of public speaking, and eliciting audience empathy increases persuasion. In this research, we explore sharing a speaker’s heart rate as a social cue, to elicit empathy and increase persuasion in the audience. In particular, we developed two interfaces embedding the speaker’s heart rate over a recorded presentation video. … We observed that heart rate sharing significantly increased persuasion for participants with normal baseline empathy levels and increased empathic accuracy for all participants.” Find the paper and full list of authors in the journal of the International Conference on Persuasive Technology.
-
‘Sturgeon-GRAPH: Constrained Graph Generation From Examples’
“Procedural level generation techniques that learn local neighborhoods from example levels (such as WaveFunctionCollapse) have risen in popularity. Usually the neighborhood structure (such as a regular grid) onto which a level is generated is fixed in advance and not generated. In this work, we present a constraint-based approach for graph generation that learns local neighborhood patterns (in the form of labeled nodes and edges) from example graphs. This allows the approach to generate graphs with varying structures that are still locally similar to the examples.”
-
‘LRPRNet: Lightweight Deep Network by Low-Rank Pointwise Residual Convolution’
“Deep learning has become popular in recent years primarily due to powerful computing devices such as graphics processing units (GPUs). However, it is challenging to deploy these deep models to multimedia devices, smartphones, or embedded systems with limited resources. To reduce the computation and memory costs, we propose a novel lightweight deep learning module by low-rank pointwise residual (LRPR) convolution, called LRPRNet.” Find the paper and full list of authors at IEEE Transactions on Neural Networks and Learning Systems.
-
‘Principles and Guidelines for Evaluating Social Robot Navigation Algorithms’
“A major challenge to deploying robots widely is navigation in human-populated environments, commonly referred to as social robot navigation. While the field of social navigation has advanced tremendously in recent years, the fair evaluation of algorithms that tackle social navigation remains hard. … In contrast, clear, repeatable, and accessible benchmarks have accelerated progress in fields like computer vision, natural language processing and traditional robot navigation. … In this paper, we pave the road towards common, widely accessible, and repeatable benchmarking criteria to evaluate social robot navigation.” Find the paper and full list of authors at ArXiv.
-
‘Using Overlapping Methods To Counter Adversaries in Community Detection’
“When dealing with large graphs, community detection is a useful data triage tool that can identify subsets of the network that a data analyst should investigate. In an adversarial scenario, the graph may be manipulated to avoid scrutiny of certain nodes by the analyst. Robustness to such behavior is an important consideration for data analysts in high-stakes scenarios such as cyber defense and counterterrorism. In this paper, we evaluate the use of overlapping community detection methods in the presence of adversarial attacks aimed at lowering the priority of a specific vertex. Find the paper and full list of authors at…
-
‘Fast and Simple Solutions of Blotto Games’
“The Colonel Blotto game is commonly used for analyzing a wide range of applications from the U.S. Presidential election to innovative technology competitions to advertising, sports, and politics. There are persistent efforts to find the optimal strategies for the Colonel Blotto game. However, the first polynomial-time algorithm for that has very recently been provided by Ahmadinejad, [et al.]. … In this paper, we provide the first polynomial-size LP formulation of the optimal strategies for the Colonel Blotto game using linear extension techniques.” Find the paper and full list of authors at Operations Research.
-
‘The Pseudorandom Oracle Model and Ideal Obfuscation’
“We introduce a new idealized model of hash functions, which we refer to as the pseudorandom oracle (PrO) model. Intuitively, it allows us to model cryptosystems that use the code of an ideal hash function in a non-black-box way. Formally, we model hash functions via a combination of a pseudorandom function (PRF) family and an ideal oracle. A user can initialize the hash function by choosing a PRF key k and mapping it to a public handle h using the oracle.” Find the paper and full list of authors in Advances in Cryptology.
-
‘Experimental Security Analysis of DNN-Based Adaptive Cruise Control Under Context-Aware Perception Attacks’
“Adaptive Cruise Control (ACC) is a widely used driver assistance feature for maintaining desired speed and safe distance to the leading vehicles. This paper evaluates the security of the deep neural network (DNN) based ACC systems under stealthy perception attacks that strategically inject perturbations into camera data to cause forward collisions. We present a combined knowledge-and-data-driven approach to design a context-aware strategy for the selection of the most critical times for triggering the attacks and a novel optimization-based method for the adaptive generation of image perturbations at run-time.” Find the paper and full list of authors at ArXiv.
-
‘Calculational Proofs in ACL2s’
“Teaching college students how to write rigorous proofs is a critical objective in courses that introduce formal reasoning. Over the course of several years, we have developed a mechanically-checkable style of calculational reasoning … to teach over a thousand freshman-level undergraduate students how to reason about computation in our ‘Logic and Computation’ class at Northeastern University. … Our calculational proof checker is integrated into ACL2s and is available as an Eclipse IDE plugin, via a Web interface and as a stand-alone tool. It automatically checks proofs for correctness and provides useful feedback.” Find the paper and full list of authors…
-
‘Diagnosing Human-Object Interaction Detectors’
“In this paper, we introduce a diagnosis toolbox for analyzing the error sources of the existing [human-object interaction] HOI detection models. We first conduct holistic investigations in the pipeline of HOI detection. … We define a set of errors and the oracles to fix each of them. By measuring the [mean Average Precision] mAP improvement obtained from fixing an error using its oracle, we can have a detailed analysis of the significance of different errors. We then delve into the human-object detection and interaction classification, respectively, and check the model’s behavior.” Find the paper and full list of authors at…
-
‘Scaling Integer Arithmetic in Probabilistic Programs’
“Distributions on integers are ubiquitous in probabilistic modeling but remain challenging for many of today’s probabilistic programming languages (PPLs). … Our insight is that there is structure in arithmetic that these approaches are not using. We present a binary encoding strategy for discrete distributions that exploits the rich logical structure of integer operations like summation and comparison. We leverage this structured encoding with knowledge compilation to perform exact probabilistic inference, and show that this approach scales to much larger integer distributions with arithmetic.” Find the paper and full list of authors at ArXiv.
-
‘Designing for Playfulness in Human-AI Authoring Tools’
“Many human-AI authoring tools are used in a playful way, while being primarily designed for task-achievement—not playfulness. We argue that playfulness is an important yet overlooked factor of user behaviour and experience when interacting with such tools. … In this paper, we motivate the importance of playfulness as user experience in human-AI authoring tools, and propose concrete strategies to design for playfulness in the human user through UI design, in the AI through algorithms or through interventions to their dialog.” Find the paper and full list of authors in the 18th International Conference on the Foundations of Digital Games proceedings.
-
‘Knowledge Transfer From High-Resource to Low-Resource Programming Languages for Code LLMs’
“Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as a building block for research in programming languages and software engineering. However, the quality of code produced by a Code LLM varies significantly by programming languages. … This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach generates high-quality datasets for low-resource languages, which can then be used to fine-tune any pretrained Code LLM.” Find the paper and list of authors…
-
‘Contrastive Alignment of Vision to Language Through Parameter-Efficient Transfer Learning’
“Contrastive vision-language models (e.g. CLIP) are typically created by updating all the parameters of a vision model and language model through contrastive training. Can such models be created by a small number of parameter updates to an already-trained language model and vision model? … We explore the feasibility and benefits of parameter-efficient contrastive vision-language alignment through transfer learning: creating a model such as CLIP by minimally updating an already-trained vision and language model. We find that a minimal set of parameter updates (<7%) can achieve the same performance as full-model training.” Find the paper and full list of authors at…
-
‘Frame Flexible Network’
“Existing video recognition algorithms always conduct different training pipelines for inputs with different frame numbers, which requires repetitive training operations and multiplying storage costs. If we evaluate the model using other frames which are not used in training, we observe the performance will drop significantly. … To fix this issue, we propose a general framework, named Frame Flexible Network (FFN), which not only enables the model to be evaluated at different frames to adjust its computation, but also reduces the memory costs of storing multiple models significantly.” Find the paper and full list of authors at ArXiv.