Trustworthy Generation
Data is key to technological innovations. We develop theoretical and algorithmic frameworks for generative AI to synthesize realistic, diverse, and targeted data. Our methods facilitate data augmentation for trustworthy machine learning and accelerate novel designs for drug and material discovery, and beyond.
Our work
Tools + code
TabFormer
Pytorch source code and data for tabular transformers for modeling multivariate time series data, showcasing card transactions data synthetic generation and analysis.
View project ↗CLaSS: Controlled Latent attribute Space Sampling
Code for an efficient computational method for attribute-controlled generation of molecules, which leverages guidance from classifiers trained on an informative latent space of molecules modeled using a deep generative autoencoder.
View project ↗Fair Mixup
Code for training fair classifiers across different modalities such as tabular, language and image data, using fair mixup augmentation as a regularizer.
View project ↗Unbalanced Sobolev Descent
Code for unbalanced Sobolev Descent for generating unbalanced data with birth and death processes.
View project ↗Fold2Seq
Code for designing protein sequences conditioned on a specific target 3D fold using a novel transformer-based generative framework.
View project ↗Sobolev Independence Criterion
Code for non-linear feature selection and provable false discovery rate control using generative models and hold out randomized testings.
View project ↗
Publications
Yue Cao, Payel Das, et al.2021ICML 2021
Inkit Padhi, Yair Schiff, et al.2021ICASSP 2021
Ching-Yao chuang, Youssef Mroueh2021ICLR 2021
Youssef Mroueh, Truyen V. Nguyen2021AISTATS 2021
Payel Das, Tom Sercu, et al.2021Nature Biomedical Engineering
Nishtha Madaan, Inkit Padhi, et al.2021AAAI 2021