
I am a research intern at the Robotics and Embodied AI Lab (REAL) at Université de Montréal and Mila. I work on representation learning for robotics systems, supervised by professors Liam Paull (UdeM) and Florian Shkurti (UofT). I will be joining the Learning and Intelligent Systems (LIS) Group as a PhD student in EECS at MIT CSAIL this fall. I will work at the intersection of perception and task & motion planing for robotics systems supervised by professors Leslie Pack Kaelbling and Tomas Lozano-Perez, with the overarching goal of building general-purpose and autonomous robots that can seamlessly integrate with humans.
I completed my MS by research from IIIT Hyderabad supervised by professors C V Jawahar and Vinay Namboodiri in the computer vision (CVIT) lab and by Prof. Madhava Krishna in the robotics (RRC) lab. My work spanned the areas of 3D shape completion, video understanding, implicit representations, robotic manipulation, and talking-face generation.
Previously, I was a Software Engineer at Microsoft India in the People Also Ask (PAA) team. I worked on techniques in deep learning and NLP to show a block of related questions and answers for a user query on Bing's search page.
I completed my Bachelors from PES University (formerly PESIT) Bangalore in Computer Science. At PESIT, I worked in the areas of sound event detection and localization. I also spent a summer as a MITACS research intern at the University of Calgary on localizing an audio noise nuisance called the Ranchlands Hum, supervised by Prof. Mike Smith, and a year as an intern at Microsoft Research India, working in the areas of blended learning and AI in healthcare.
Research Interests: My research interests lie broadly at the intersection of computer vision and robotics. My goal is to integrate representation learning with task & motion planning to achieving general-purpose robot autonomy.
Publications
![]() |
HyP-NeRF: Learning Improved NeRF Priors using a HyperNetworkAditya Agarwal* We propose HyP-NeRF, a latent conditioning method for learning generalizable category-level NeRF priors using hypernetworks. We use hypernetworks to estimate both the weights and the multi-resolution hash encodings resulting in significant quality gains. To further improve quality, we incorporate a denoise and finetune strategy that denoises images rendered from NeRFs estimated by the hypernetwork and finetunes it while retaining multiview consistency. |
![]() |
Disentangling Planning and Control for Non-prehensile Tabletop ManipulationAditya Agarwal Paper (Coming Soon)/ Video (Coming Soon) We propose a framework that disentangles planning and control for tabletop manipulation in unknown scenes using a pushing-by-striking method (without tactile feedback) by explicitly modeling the object dynamics. Our method consists of two components: an A* planner for path-planning and a low-level RL controller that models object dynamics. |
![]() |
SCARP: 3D Shape Completion in ARbitrary Poses for Improved GraspingAditya Agarwal* Paper / Project Page / Short Video / Code / Poster / Long Video We propose a mechanism for completing partial 3D shapes in arbitrary poses by learning a disentangled feature representation of pose and shape. We rely on learning rotationally equivariant pose features and geometric shape features by training a multi-tasking objective. SCARP improves the shape completion performance by 45% and grasp proposals by 71.2% over existing baselines. |
![]() |
FaceOff: A Video-to-Video Face Swapping SystemAditya Agarwal* Paper / Project Page / Video / Poster / Code / Supplementary We propose a novel direction of video-to-video (V2V) face-swapping that tackles a pressing challenge in the moviemaking industry: swapping the actor's face and expressions on the face of their body double. Existing face-swapping methods preserve only the identity of the source face without swapping the expressions. In FaceOff, we swap the source's facial expressions along with the identity on the target's background and pose. |
![]() |
Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at ScaleAditya Agarwal* Paper / Project Page / Video / Poster / Supplementary Hard-of-hearing people rely on lipreading the mouth movements of the speaker to understand the spoken content. In this work, we developed computer vision techniques and built upon existing AI models, such as TTS and talking-face generation, to generate synthetic lipreading training content in any language. |
![]() |
INR-V: A Continuous Representation Space for Video-based Generative TasksAditya Agarwal* Paper / OpenReview / Project Page / Video / Code Inspired by the recent works on parameterizing 3D shapes and scenes as Implicit Neural Representations (INRs), we encode videos as INRs. We train a hypernetwork to learn a prior over these INR functions and propose two techniques, i) Progressive Training and ii) Video-CLIP Regularization to stabilize hypernetwork training. INR-V shows remarkable performance on several video-generative tasks on many benchmark datasets. |
![]() |
Approaches and Challenges in Robotic Perception for Table-top Rearrangement and PlanningAditya Agarwal* 3rd in ICRA 2022 Open Cloud Table Organization Challenge Paper / Competition / Video / Slides / Code / News1 / News2 In this challenge, we proposed an end-to-end pipeline in ROS incorporating the perception and planning stacks to manipulate objects from their initial configuration to a desired target configuration on a tabletop scene using a two-finger manipulator. The pipeline involves the following steps - (1) 3D scene registration, (2) Object pose estimation, (3) Grasp generation, (4) Task Planning, and (5) Motion Planning. |
![]() |
Personalized One-Shot Lipreading for an ALS PatientAditya Agarwal* We tackled the challenge of lipreading medical patients in a one-shot setting. There were two primary issues in training existing lipreading models - i) lipreading datasets had people suffering from no disabilities, ii) lipreading datasets lacked medical words. We devised a variational encoder-based domain adaptation technique to adapt models trained on large amounts of synthetic data to enable lipreading with one-shot real examples. |
![]() |
REED: An Approach Towards Quickly Bootstrapping Multilingual Acoustic ModelsAditya Agarwal* We tackled the problem of building a multilingual acoustic model in a low-resource setting. We proposed a mechanism to bootstrap and validate the compatibility of multiple languages using CNNs operating directly on raw speech signals. Our method improves training and inference times by 4X and 7.4X, respectively, with comparable WERs against RNN-based baseline systems. |
![]() |
An Approach Towards Action Recognition using Part Based Hierarchical FusionAditya Agarwal* The human body can be represented as an articulation of rigid and hinged joints, which can be combined to form the parts of the body. In this work, we think of human actions as a collective action of these parts. We propose a Hierarchical BiLSTM network to model the spatio-temporal dependencies of the motion by fusing the pose-based joint trajectories in a part-based hierarchical fashion. |
![]() |
Minimally Supervised Sound Event Detection using a Neural NetworkAditya Agarwal We solve the task of polyphonic sound event detection by training on a minimally annotated dataset of single sounds. Single sounds represented as MFCC features are used to train a neural network. Polyphonic sounds are preprocessed using PCA and NMF, and source-separated sounds are inferred using the learned network. Our system achieves reasonable accuracy of source separation and detection with minimal data. |
News & Announcements
-
[July '23] Served as a reviewer for SIGGRAPH 2023.
-
[May '23] I'll be starting as a research intern at Mila - Quebec Artificial Intelligence Institute, Montreal with professors Liam Paull and Florian Shkurti. I will work on learning representations for 3D robotic manipulation.
-
[Apr '23] I'll be joining MIT CSAIL as a PhD student this Fall. I will be a part of the Learning and Intelligent Systems (LIS) Group with professors Leslie Pack Kaelbling and Tomas Lozano-Perez.
-
[Apr '23] Serving as a reviewer for IROS 2023.
-
[Apr '23] Full page abstract submission on ''Uncovering Biases Against Indian Artists" accepted at ICMPC17-APSCOM7 for a spoken presentation. Awarded a Travel Grant of ¥30,000 to attend the conference in Tokyo, Japan.
-
[Mar '23] Awarded a generous travel grant of $2250.00 by ICRA 2023 IEEE RAS Travel Grant Committee to attend the premier robotics conference in London, UK from 29th May to 2nd Jun.
-
[Mar '23] Invited for a talk at Columbia University - slide deck here. The talk was organized as part of my graduate visit days to Brown and Columbia.
-
[Jan '23] 5 works on Implicit Video Parameterization, V2V Face-Swapping, MOOCs for Lipreading, 3D Shape Completion, and Synergistic Tabletop Manipulation presented at IIIT Hyderabad's RnD showcase.
-
[Jan '23] 1 paper accepted at ICRA 2023 on 3D Shape Completion in Arbitrary Poses. Featured as the "Publication of the Week" in "Weekly Robotics".
-
[Jan '23] Attending Google Reserach Week in Bangalore from 29th Jan to 31st Jan.
-
[Dec '22] Serving as a reviewer at Neural Fields Workshop at ICLR 2023 (NF2023).
-
[Nov '22] Served as a reviewer at ICRA 2023 .
-
[Oct '22] Journal paper on a novel representation space for video-based generative tasks accepted at TMLR 2022.
-
[Aug '22] Two papers on video face swapping and talking-face generation accepted at WACV 2023 round 1 (acceptance rate 21.6%).
-
[Aug '22] Gave a talk on the challenges in tabletop rearrangement and planning at the CVIT Summer School 2022.
-
[Aug '22] Coordinator for the 6th CVIT Summer School on AI.
-
[May '22] We were in the news for winning 3rd place at the ICRA 2022 international robotics competition on tabletop rearrangement and planning. Awarded a grant of $1000.00.
-
[Feb '22] I will be taking month long tutorial sessions in machine learning for faculties across universities in India as part of the CSEDU-ML program conducted jointly by IIIT-H, IIT-H, and IIT-D.
-
[Aug '21] Coordinator for the 5th CVIT Summer School on AI and conducted tutorial sessions on self-supervised learning and multimodal learning.
-
[Oct '21] 1 paper accepted at BMVC on lipreading in a one-shot setting using domain adaptation.
-
[Mar '21] I will be joining IIIT Hyderabad as an MS by Research student.
-
[Nov '20] 1 paper accepted at SLT on building multilingual acoustic model for low-resource languages.
-
[Mar '20] My video shoot conducted by Microsoft for its campus hiring program is available on YouTube.
-
[Mar '18] My work helped scale the Microsoft Community Learning platform to its first 100K users. The work was covered by several media outlets ([1], [2], [3], [4]). I was awarded the "Delight your Customer" Award by Microsoft for my outstanding work.
-
[Sep '17] Completed my Bachelor's degree from PES University in Computer Science. Received Academic Distinction Award for exceptional academic performance.
-
[Aug '17] Winners at the VMWare Global Relay Opensource Borathon among all participating teams across VMWare.
-
[Feb '17] My work on building Microsoft Research India's flagship project Massively Empowered Classroom was deployed by Mauritius Institute of Education. It was inaugurated by Dr. Sriram Rajamani, MD MSR India and Hon Mrs. Leela Devi (Minister of Tertiary Education, Mauritius) and was covered in the press ([1], [2], [3]).
-
[Jan '16] I will be interning at the University of Calgary in Summer 2016 fully-funded through the MITACS Globalink Research Award.
Forked and modified from Viraj Prabhu's adaptation of Pixyll theme