Divyansh Jhunjhunwala
About Me
Hi! I'm Divyansh, a fifth year PhD candidate in the Electrical and Computer Engineering department
at Carnegie Mellon University, advised by Dr. Gauri Joshi.
My research focuses on efficiently fine-tuning and training ML models on data distributed across individual users (e.g. mobile devices) via on-device training. I aim to
develop both theoretically grounded and practical algorithms that speed-up convergence and improve model accuracy while addressing the communication/computation
constraints inherent to such settings.
Lately, I've also been deeply interested in knowledge transfer - understanding when and why pre-training works
and developing better methods to merge knowledge across multiple fine-tuned models into one or more models (check out some recent work
[1, 2] in this area).
During my PhD, I have had the opportunity to intern at IBM Research (summers 2022 and 2023) and Bosch AI Research (summer 2024) working on problems related to accelerating model training and efficient LLM inference.
Prior to CMU, I completed my Bachelors in Technology (B.Tech) in Electronics and Electrical Communication Engineering from IIT Kharagpur, where I
received the Institute Silver Medal for graduating with the highest CGPA in my department.
I am currently on the industry job market. Please feel free to reach out to discuss potential roles!
Email  / 
Google Scholar
|
|
|
Initialization Matters: Unraveling the Impact of Pre-training on Federated Learning
Divyansh Jhunjhunwala , Pranay Sharma, Zheng Xu, Gauri Joshi
Under submission
Provide the first theoretical explanation for why pre-training significantly boosts performance of FedAvg by introducing the notion of
misaligned filters at initialization and showing that a) data heterogeneity only affects misaligned filters b) pretraining can
reduce the number of misaligned filters at initialization.
|
|
Spanning the Accuracy-Size Trade-Off with Flexible Model Merging
Akash Dhasade, Divyansh Jhunjhunwala , Gauri Joshi, Anne-Marie Kermarrec, Milos Vujasinovic
Under submission
Proposed FlexMerge, a model-merging approach that offers the flexibility to fuse fine-tuned foundation models into one or more models, balancing task accuracy, model size and inference latency.
|
|
Erasure Coded Neural Network Inference via Fisher Averaging
Divyansh Jhunjhunwala * , Neharika Jali *, Shiqiang Wang, Gauri Joshi
IEEE International Symposium on Information Theory (ISIT) 2024
Develop COIN, a model-fusion framework to approximate the sum of outputs of multiple neural
networks with a single neural network for handling demand uncertainty in multi-model inference.
|
|
FedFisher: Leveraging Fisher Information for One-Shot Federated Learning
Divyansh Jhunjhunwala , Shiqiang Wang, Gauri Joshi
International Conference on Artificial Intelligence and Statistics (AISTATS) 2024
Propose FedFisher, an algorithm for learning the global model for federated learning using just one round communication with novel theotetical guarantees for two layer overparameterized ReLU networks.
|
|
FedExP: Speeding up Federated Averaging via Extrapolation
Divyansh Jhunjhunwala , Shiqiang Wang, Gauri Joshi
International Conference on Learning Representations (ICLR), 2023 ( Spotlight, top 25% of accepted papers )
Develop FedExP, a method to adaptively determine the server step size in FL based on dynamically varying pseudo-gradients throughout the FL process.
|
|
Maximizing Global Model Appeal in Federated Learning
Yae Jee Cho, Divyansh Jhunjhunwala , Tian Li, Virginia Smith, Gauri Joshi
Transactions of Machine Learning Research (TMLR), 2024
Propose MaxFL algorithm to explicitly maximize the fraction of clients that are incentivized to use the global model in federated learning.
|
|
FedVARP: Tackling the Variance Due to Partial Client Participation in Federated Learning
Divyansh Jhunjhunwala , Pranay Sharma, Aushim Nagarkatti, Gauri Joshi
Uncertainty in Artificial Intelligence (UAI), 2022
Propose FedVARP algorithm to deal with variance caused by only a few clients participating in every round of federated training.
|
|
Leveraging Spatial and Temporal Correlations in Sparsified Mean Estimation
Divyansh Jhunjhunwala , Ankur Mallick, Advait Gadhikar, Swanand Kadhe, Gauri Joshi
Advances in Neural Information Processing Systems (NeurIPS), 2021
Introduce notions of spatial and temporal correlations and show how they can be used to efficiently compute the mean of a set of vectors in a communication-limited setting.
|
|
Adaptive Quantization of model updates for communication-efficient federated learning
Divyansh Jhunjhunwala , Advait Gadhikar, Gauri Joshi, Yonina C. Eldar
International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021
Propose an adaptive quantization strategy that aims to achieve communication efficiency as well as a low error floor by changing the number of quantization levels during training in federated learning.
|
|