Darpan Jain

Lead Machine Learning Engineer with over 7 years of demonstrated experience spearheading initiatives in Applied NLP & Large Language Models (LLMs).
Currently building the AI Platform at
Sony Pictures - Crunchyroll.

Read on!

Work Experience

divider
Crunchyroll Logo

 

Crunchyroll (Sony Pictures)

Senior Machine Learning Engineer


  • As the first Machine Learning Engineer for the organization, leading several initiatives across the organization to build Crunchyroll’s AI Platform.
  • Building the Recommendations & Search Platform to deliver personalized experiences to anime fans worldwide on the streaming platform.
  • Leading efforts to standardize all MLOPs workflows (retrain job scheduling, failure recoveries, model registries, automated feature store updates, continuous evaluation, drift detectionsand more) to ensure reliable inference deliveries for several stakeholders across the organization.

(Jan 2024 - Present)

Pixstory AI logo

 

Pixstory

Lead Machine Learning Engineer


  • Spearheaded Pixstory’s RAG-based Conversational Search, providing precise user query responses grounded in verified knowledge bases. Deployed in production within 5 months, serving queries for over 72K monthly users.
  • Implemented systems to mitigate hallucination and misinformation from self-hosted Large Language Models (LLMs) generations through rigorous evaluations and re-ranking trusted sources for reliable information retrieval.

(May 2023 - Feb 2024)

USC-ISI logo

 

Information Sciences Institue (USC-ISI)

Applied Researcher


  • Led development of conflict resolution dialogue agents for DARPA’s Civil Sanctuary Program, as an Applied Researcher with the Natural Language group (CUTE LAB NAME) at USC-ISI.
  • Designed and deployed a scalable chatbot for mitigating toxic behavior on social media through non-violent communication strategies, using a system of pre-trained and fine-tuned LLMs generation capabilities; also including prompting GPT-3’s advanced transformer-based (the model that drives ChatGPT!) generations.
  • Moderated over 15k multilingual Reddit posts with 85% true positive rate in French and German on popular subreddits.
  • You can find the published research paper here, which will presented at NAACL 2024, Main Conference later this year.

(January 2022 - July 2023)

Lumin.ai logo

 

Lumin.ai

Software Development Intern


  • Contributed in developing the conversational AI for Lumin.ai as a Smart Scheduler & Sales Accelerator to enhance customer interactions and optimize the sales funnel for 5+ product and franchise owners.
  • Advocated for efficient documentation using Markdown and managed customer interactions for multiple clients.

(May 2022 - August 2022)

Warner Bros Discovery logo

 

Warner Bros Discovery

Software Development Engineer - Machine Learning


  • Boosted user engagement on Discovery Plus (D+) as part of the ”Personalization and Recommendations” team.
  • Employed Apache Spark and Airflow to build efficient ETL pipelines in a distributed environment, and executed A/B tests to validate and refine new recommendation features for the D+ platform.
  • Led the development of server-side ad-insertion SDKs for Android and Web, driving increased ad revenue for D+ sports content in Europe.

(September 2020 - August 2021)

AdSparx logo

 

AdSparx Inc

(now Warner Bros. Discovery)

Machine Learning Engineer


  • Spearheaded the research, design, and development of MiDAS, a micro-service-based system serving personalized ads to millions of users on OTT platforms.
  • Designed and trained Computer Vision models with TensorFlow, deployed on Kubernetes to enable dynamic server-side ad-insertion for 40+ channels, boosting revenue for U.S. publishers.
  • Developed a tool to automate the dataset annotation process, by leveraging SCTE-35 markers in HLS and DASH streams.
  • Leveraged multi-step recurrent networks on time series data for ad break forecasting. This resulted in a 30% reduction in resource consumption and network calls to the SSAI backend.
  • Member of the senior development team at AdSparx that was acquired by Warner Bros Discovery in September 2020.

(March 2019 - September 2020)

IoTIoT logo

 

IoTIoT

Machine Learning Engineer


  • Led and mentored a team of 30 for catering Machine Learning applications with on-Edge capabilities, for Embedded Linux platforms.
  • Developed a crowd flow analysis product using face and voice recognition for event registration, optimized for on-chip GPUs to showcase AI-on-Edge computing capabilities.
  • Worked on Pose estimation for detection and tracking of human body pose to create an immersive experience for users for a video gaming platform.

(May 2018 - January 2019)

Consulting Projects

divider
USC Logo

 

USC Dornsife

Lead Research Engineer


  • Leading the development of an end-to-end system for evaluating teacher fidelity in the Pathways-to-Success program at USC Mind and Society Center, applying Identity-Based Motivation principles in classrooms across the United States.
  • Constructed a pipeline using self-hosted inference models for audio extraction and transcription, and LLM-based fidelity evaluation models to score instructors based on custom metrics.
  • Ran experiments and evaluations of the system's performance on 100+ hours of teacher sessions, with final results within 5% of human evals.

(Jan 2024 - Present)

Defeat Covid-19 logo

 

Defeat COVID-19

Machine Learning Lead


  • In light of the COVID-19 pandemic, co-founded DefeatCovid, a not-for-profit organization, which leveraged technology to increase public awareness and aid in curbing the spread of the disease.
  • Led a team to build a tool to provide early stage diagnosis of the symptoms and risk assesment for Covid-19, using crowdsourced data and Deep Learning techniques.
  • The tool included a Question-Answer Chatbot based on BERT to help navigate users through the website, provide information from government verified sources and address misinformation about Covid-19.
  • It also included a real-time dashboard and interactive heatmap interface to identify at-risk localities.

(March 2020 - May 2020)

Prasaurus logo

 

Prasaurus Sports Analytics

Artificial Intelligence Consultant


  • Led a team to design a framework for real-time action recognition and perform statistical analysis, providing deep insights for player performance tracking and improvement, for an academy training national badminton players.
  • The models captured Spatio-temporal features across frames based on two-stream networks and ActionVLAD for Action Recognition, and shuttle and player tracking using a custom trained MobileNet, to provide metrics such as 'Dominant areas'.

(June 2019 - September 2019)

InProspect logo

 

InProspect Technologies

Machine Learing and Strategic Consultant


  • Managed a team of engineers to design and develop a product to propel undergraduates towards career goals by providing curated career guidance. The platform provided analytics for educational institutions via the 'Enterprise Resource Planning' system.
  • Created a personalization system for recommendations to help achieve user-specified goals by aiding in coursework selection and extra-curricular activities. This ensured high success rates in the individuals' field of interest.
  • Was part of the strategic planning for incorporating product changes to drive user engagement.

(March 2019 - May 2019)

Ubuntoo logo

 

Ubuntoo LLC

Emerging Technology Consultant


  • A global environmental solutions platform promoting sustainable economic growth.
  • Researched and created technical reports on companies and organizations innovating in the use of Machine Learning and Computer Vision.
  • The solutions included high accuracy waste sorting systems and autonomous waste collection bots; centered around using technology to tackle environmental issues and promote sustainable development.

(September 2019 - December 2019)

Occipital Tech logo

 

Occipital Tech

(now AgroGrade)

Computer Vision Engineer


  • A startup offering products in Agrotech to classify and grade agricultural produce using Computer Vision, resulting in reduced wastage and maximizing profits.
  • Worked on the pipeline for preprocessing of real-time image data by Background Segmentation using methods like Gaussian Mixture Models and Watershed Algorithm to tackle camera jitter and dynamic lighting conditions.
  • The effort led to an ~20% reduction in preprocessing time, along with enhancing image noise reduction by 14-15%.

(February 2019 - March 2019)

Publications

divider

Can Language Model Moderators Improve the Health of Online Discourse? 


H Cho, S Liu, T Shi, Darpan Jain, B Rizk, Y Huang, Z Lu, N Wen, J Gratch, E Ferrara, J May

North American Chapter of the Association for Computational Linguistics (NAACL) 2024

Curriculum

divider
  • Education
  • Leadership
Image

University of Southern California (USC), Los Angeles, California

Master of Science in Electrical & Computer Engineering
Specialization in Machine Learning & Data Science
GPA 3.92/4.0

Coursework Highlights

  • Advanced Natural Lanuage Processing - A PhD course, helping develop expertise in reproduction of state-of-the-art NLP research and technical publication writing.
  • Advanced Computer Vision - A PhD course providing an overview on the traditional and Deep Learning techniques for Computer Vision.
  • Information Retrieval - Examines key aspects of information retrieval as they apply to search engines; web crawling, indexing, querying and quality of results are studied.
  • Machine Learning for Medical Data -A course that gives great insight on using Graph Neural Networks and strategies to handle highly imbalanced data and skewed data distribution.



Image

University of Pune, India

Bachelor of Engineering in Electrical Engineering
First Class with Distinction
Entrepreneurship Development Cell   
  • Advisory Board Member    -    August 2017 to July 2018
  • Head of Sponsorship          -   August 2016 to July 2017
  • Documentation Officer      -   August 2015 to July 2016

IEEE Student Committee   
  • Vice Chairman    -   July 2016 to September 2017]

Pune Model United Nations (PMUN)
  • Organizing Committee Member  -   May 2012 to January 2013

Technical Skills

divider
Programming Languages

Programming Languages


Python / Java / C / C++ / Javascript / GoLang / Scala / R / MATLAB / Julia / Embedded C

Icon

Machine Learning


Tensorflow / PyTorch / MXNet / PySpark / OpenCV / CUDA / Librosa / NLTK / SpaCy / OpenNLP / TensorRT / TFLite

Databases icon

Databases


SQL / NoSQL / Hadoop / Redis / MongoDB / InfluxDB / ElasticSearch / Kafka / Splunk / Datadog

ML & AI Frameworks

Frameworks


Docker / Kubernetes / Airflow / Tableau / Django / Flask

ML Platforms

Cloud Platforms


Amazon Web Services (AWS) / Google Cloud Platform (GCP) / Microsoft Azure

Web development

Frontend


HTML / CSS / Javascript / ReactJS / NodeJS / Bootstrap / VueJS

Testimonials

divider

Get In Touch

divider
        
darpan's spotify now playing