I’m working as an Applied Scientist-2 at Amazon with International Machine Learning team. I’ve helped in building and launching conversational shopping assistant for Amazon (India), both in the pre and post LLM era. I’ve built multiple NLU components (Intent/Planner, NER, Query Reformulation), Safeguard service, Next Question Suggestion service, Clarification and Product QnA tools for the assistant. I’ve also worked on personalization and forecasting problems in my first ~8 months at Amazon.
Previously, I have worked as a Machine Learning Scientist at Jio Haptik on fundamental Conversational-AI problems. I built the Intent Detection System for Haptik’s NLU Engine, which was 25% more accurate than their previous system, owning it from Research to Production.
I have authored research papers which have been accepted at top tier venues like ACL (Findings), EMNLP NLP-OSS workshop, EMNLP Insights workshop, EACL LT-EDI workshop and FIRE. I’ve also published 5 papers (3 orals (<10% acceptance rate) and 2 posters (<30% acceptance rate)) at Amazon’s internal ML conference (AMLC). See my resume for brief details on all external and internal papers.
I am also the creator of open source iNLTK library which provides out of the box support for various NLP tasks, for low resource 13 Indic Languages. The library has 175,000+ downloads, 800+ stars and 100+ forks on GitHub.
Prior to Jio Haptik, I worked at Goldman Sachs as a Software Engineer with the User Experience and Productivity team on Analytics for Desktop Assistant, which is firm-wide used productivity tool.
I have Masters in Computer Science with specialization in ML from Georgia Tech and Bachelor’s in Computer Science from PEC University of Technology.
I am interested in the applications of Machine Learning to solve problems which will impact millions and keep making my little open source contributions towards it.
Accepted at AMLC 2024 (Oral), Under Review @ EMNLP 2024 |
Towards Abstractive Knowledge Representations in LLMs
Gaurav Arora, Shreya Jain, Vaibhav Saxena, Srujana Merugu |
Under Review @ EMNLP 2024 (Industry Track) |
Intent Detection in the Age of LLMs
Gaurav Arora, Shreya Jain, Srujana Merugu |
Accepted at AMLC 2024 (Poster) |
MuST: A Multi Stage Targeting Framework for Promotional Campaigns
Apoorva Singh, Prince Jain, Gaurav Arora, Vivek Sembium |
Accepted at ACL 2023 (Findings), AMLC 2022 (Oral) |
CoMix: Guide transformers to code-mix using POS structure and phonetics
Gaurav Arora, Srujana Merugu, Vivek Sembium [Paper] |
Accepted at AMLC 2023 (Poster) |
Conversational Shopping Assistant: Bridging the gap between online and offline purchase experiences
Gaurav Arora, Pankaj Kumar, et al. |
Accepted at AMLC 2022 (Oral) |
DARE: Deep Affordability Recommendation Engine for ART events
Prince Jain, Gaurav Arora, Mohammed Abdulla, Vivek Sembium |
Accepted at EMNLP-2020 NLP-OSS workshop |
iNLTK: Natural Language Toolkit for Indic Languages
Gaurav Arora [Paper] [GitHub] |
Accepted at EMNLP-2020 Insights workshop |
HINT3: Raising the bar for Intent Detection in the Wild
Gaurav Arora, Chirag Jain, Manas Chaturvedi, Krupal Modi [Paper] [GitHub] |
Accepted at Dravidian Codemix HASOC @ FIRE-2020 |
Pre-training ULMFiT on Synthetically Generated Code-Mixed Data for Hate Speech Detection
Gaurav Arora [Paper] [GitHub] |
Accepted at LT-EDI @ EACL-2021 |
Spartans@ LT-EDI-EACL2021: Inclusive Speech Detection using Pretrained Language Models
Gaurav Arora*, Megha Sharma* [Paper] [GitHub] |
2020 - 2022 |
Masters in Computer Science (specialization in ML)
Georgia Institute of Technology (Georgia Tech) |
2014 - 2018 |
B.Tech in Computer Science
PEC University of Technology |
2012 - 2014 | GMSSS-16, Chandigarh |
Apr 2021 - Present | Amazon, Applied Scientist |
July 2019 - Apr 2021 | Jio Haptik, Machine Learning Scientist |
June 2018 - July 2019 | Goldman Sachs, Software Development Engineer |
Jan 2017 - July 2017 | Goldman Sachs, Software Development Engineer (Intern) |
Nov 2016 - Mar 2018 | Researchshala, Co-Founder and CTO |
Natural Language Toolkit for Indic Languages (iNLTK)Star Fork Watch• iNLTK aims to provide out of the box support for various NLP tasks that an application developer might need for Indic languages • iNLTK provides Data Augmentation, Sentence Similarity, Sentence Encoding, Word Embedding, Tokenization and Text Generation utilities for low resource 13 Indic Languages • The library is backed by ULMFiT Language Models which I had trained using Fastai and Pytorch libraries, producing SOTA LM perplexity and Classification accuracy in 13 Indic Languages Appreciation for iNLTK • By Jeremy Howard, Sebastian Ruder on Twitter • Shared a lot by community on LinkedIn • iNLTK has 100,000+ Downloads on PyPi • Data Augmentation post about iNLTK was trending on LinkedIn • iNLTK was trending on GitHub in May, 2019 • Shared on Reddit, Facebook, Quora etc by the community |
|
Code with AIStar Fork Watch• Tool which predicts which techniques one should use to solve a competitive programming problem to get correct answer • Demo video on YouTube Appreciation for Code with AI • By Jeremy Howard on Twitter • By community on Codeforces • The tool has been used by 3000+ users |
|
NLP for HindiStar Fork Watch• Contains SOTA Language models and Classifier for Hindi • Pretrained Models available for download: TransformerXL, ULMFiT [ Code ] [ Results ] [ Dataset ] [ Embeddings projection ] |
|
NLP for SanskritStar Fork Watch• Contains SOTA Language models and Classifier for Sanskrit • Pretrained Models available for download: TransformerXL, ULMFiT [ Code ] [ Results ] [ Dataset ] [ Embeddings projection ] |
|
NLP for NepaliStar Fork Watch• Contains SOTA Language models and Classifier for Nepali • Pretrained Models available for download: TransformerXL, ULMFiT [ Code ] [ Results ] [ Dataset ] [ Embeddings projection ] |
|
NLP for TamilStar Fork Watch• Contains SOTA Language models and Classifier for Tamil • Pretrained Models available for download: TransformerXL, ULMFiT [ Code ] [ Results ] [ Dataset ] [ Embeddings projection ] |
|
NLP for BengaliStar Fork Watch• Contains SOTA Language models and Classifier for Bengali • Pretrained Models available for download: TransformerXL, ULMFiT [ Code ] [ Results ] [ Dataset ] [ Embeddings projection ] |
|
NLP for PunjabiStar Fork Watch• Contains SOTA Language models and Classifier for Punjabi • Pretrained Models available for download: TransformerXL, ULMFiT [ Code ] [ Results ] [ Dataset ] [ Embeddings projection ] |
|
NLP for MalayalamStar Fork Watch• Contains SOTA Language models and Classifier for Malayalam • Pretrained Models available for download: TransformerXL, ULMFiT [ Code ] [ Results ] [ Dataset ] [ Embeddings projection ] |
|
NLP for OdiaStar Fork Watch• Contains SOTA Language models and Classifier for Odia • Pretrained Models available for download: TransformerXL, ULMFiT [ Code ] [ Results ] [ Dataset ] [ Embeddings projection ] |
|
NLP for GujaratiStar Fork Watch• Contains SOTA Language models and Classifier for Gujarati • Pretrained Models available for download: TransformerXL, ULMFiT [ Code ] [ Results ] [ Dataset ] [ Embeddings projection ] |
Mar 2021 | Indian Achievers Award 2020 from Indian Achiever’s Forum (IAF) in the Young Achievers Category for contribution in nation building through iNLTK |
Mar 2019 | Fast.ai International Fellow for contributions to Fast.ai forums |
Dec 2018 | Top-17% rank in Human Protein Atlas Image Classification, Kaggle for developing Deep Learning model which classified mixed patterns of proteins in microscope images. The competition had 2172 teams, but I participated individually and hence had 100% contribution in the 366th placed solution |
Oct 2017 | 1st Prize in IEEE-Hackathon for developing chat-bot to help people with emotional decisions in life |
Feb 2016 | Top-100 among 500,000 students in IT-Olympiad,2016. |
Oct 2016 | 2nd-Prize in IEEE-Hackathon for developing an Augmented reality application to help teachers |
Mar 2016 | All India Rank-6 in IEEE Programming League, among over 1200 undergraduate students |
Mar 2016 | 2nd Rank, CodeWars,a competitive-programming event hosted by IEEE,PEC on CodeChef |
Nov 2016 - Mar 2018 | Research Scholarship of 10k per month for Personal Emotional Doctor - Bot |
May 2014 | All India Rank-885 in JEE-Mains, among 1.4 million candidates |
Aug 2014 | 1st Rank-Opener, PEC for best JEE-Mains rank among 600 students of the session 2014-2018 |
Dec 2014 | 1 Lakh Scholarship from CBSE for 96.4% marks in 12th Boards and 10 CGPA in 10th |
Dec 2014 | Letter of Appreciation from HRD Ministry,Govt. of India for 96.4% in CBSE-12th exams |
June 2011 | Catch Them Young - was among the top-40 students selected from tricity by INFOSYS for 2-week Programming-Basics training on their campus |
Mathematics |
Discrete Structures for Computer Science, Vector Calculus, Fourier Series and Laplace Transform, Operation Research, Bayesian Statistics |
Computer Science |
Introduction to Graduate Algorithms, Data Structures and Algorithms, Computer Architecture and Organization, OOP, Microprocessor, DBMS, OperatingSystems, Computer Networks, Theory of Computation, Artificial Intelligence, Computer Graphics, Mobile Computing, Machine Learning, Reinforcement Learning, Deep Learning, Computer Vision, Big Data for Healthcare, Knowledge Based AI |
Programming & Web |
C, C++, Python, Javascript, TypeScript, EcmaScript6, AngularJS, ReactJS, Angular4, Webpack, Django with Python |
Frameworks |
Pytorch, Pandas, Numpy, ScikitLearn, SciPy, Fastai, Transformers library |
Last updated on 2021-10-03