Dheeraj Dhillon

Hi there, from one Machine Learning enthusiast to another!

TL;DR I am Dheeraj, currently pursuing an MS in Computer Science at UW-Madison where I will be graduating in May 2027. I have two years of industry experience as a data scientist at two organizations: Gartner, and HiLabs, a startup based on healthcare data interoperability. I completed my undergrad in ECE at IIT Roorkee. Here is my resume.

I am looking for opportunities that will challenge me to solve meaningful problems by making use of methodologies in my domain of study which spans Machine Learning, Natural Language Processing, and Reinforcement Learning. Some of the applications/use cases that fascinate me the most are locomotion, robotics, language models, and recommendation systems. I am always excited to be part of teams that will require me to punch above my weight. Consequently, I am even more excited to be part of and contribute in a startup.

I am actively looking for internship opportunities for Summer and Fall 2026 to work as a ML researcher, applied scientist, or data scientist. Ping me on email or linkedin, I would be happy to connect!

Project work

See my projects page here which goes into extensive details on each of the projects, and implementations are available at my GitHub. Brief summaries of select projects are:

TRPO/PPO. I studied the broad domain of policy optimization in RL where the recurring idea of constrained policy changes gave way to powerful methods starting from mixture updates methods to TRPO and PPO. I implemented TRPO, PPO, and NPG in PyTorch and benchmarked them on several MuJoCo locomotion and Atari environments. See the code and project report. I have also expressed my fervor for RL here.

RLHF using PPO Motivated with its practical applications, I deployed PPO for LLM alignment towards human preference using the Reinforcement Learning with Human Feedback (RLHF) mechanism. Using the HelpSteer3 dataset, I performed supervised fine-tuning (SFT) on Qwen2.5-0.5B Instruct LLM followed by training a reward model. Consequently, I utilized the reward model for optimizing the fine-tuned LLM with PPO at the token level with LoRA.

StanLyric Implemented information retrieval (IR) system to identify songs based on input queries consisting of a few lyrical sentences from a songs lyrics corpus. Developed using the BM25-Okapi method, I deployed online the static inverted indices for 44,480 songs to have a running lyric search engine. This app also exhibits the ranked result’s interpratability based on quantified contributions of matched keywords which vary for each candidate song based on its underlying term frequencies (TFs).
WordPlay This project uses character-level language models to play the Hangman Challenge. I trained forward and reverse n-gram models with padding at word start and end. During training, I also used smoothing techniques such as add-k smoothing and kneser key smoothing to mitigate sparsity and encourage exploration. For predicting during game simulation, I incorporating backoff and interpolation. With the trained probabilistic distributions, it can also be used to generate plausible new english words. I also designed interactive simulator for game playing and word generation.

Warfarin Investigated contextual multi-arm bandits for personalized Warfarin dosage selection in online learning setting with sequential feedback. Evaluated LinUCB and its regularized ridge and lasso variants, and the Linear Thompson sampling bandit. Compared performances along with those obtained through clinically recommended Pharmacogenetic formula.

Coursework

Machine Learning, Reinforcement Learning, Game Theory, Operating Systems, Linear Optimization, Non-linear optimization

I have written about my passion for these courses in my blogs.

Research work

InfraNet: An Ensemble Approach for Real-time Wildlife Detection using Infrared Thermal Imaging, IEEE AVSS August 2025.
Read the published paper pdf. Find the IEEE eXplore link. The source code is available at the infrared repo on the pipeline branch. See overview in my work blogs.

Industry experiences

At Gartner, working as an associate data scientist in the client retention analytics team. Before that, I worked at Hilabs, in Bangalore, as a data scientist in the Roster Automation team. See my work details here in my blog posts.

Education and work

University of Wisconsin-Madison August 2025 - Present
Masters of Science, Computer Science
Data Scientist at Gartner Inc., Mar 2024 - August 2025
Bachelor of Technology, Electronics and Communication Engineering
Indian Institute of Technology Roorkee, July 2019 - May 2023

Emails and Socials

dhillondheeraj84@gmail.com
ddhillon@wisc.edu
dheeraj_d@ec.iitr.ac.in

LinkedIn \ Twitter
g scholar, ORCID , IEEE Xplore

My experiences with music

I am very happy when I am listening to music, and this has led me to create a lot of playlists over at spotify. I am also working on a project to make the playlists better, more inclusive of songs, that you may not have already added to your spotify playlists. My spotify insights dashboard can be found here.

timeline

Mar 11, 2024	Joined Gartner as an Associate Data Scientist.
Jul 17, 2023	Joined HiLabs Inc as a Data Scientist.
May 25, 2023	Graduated from IIT Roorkee.
Jul 15, 2022	Applied and Data Scientist Intern at Microsoft India R&D.
Jul 20, 2019	Started at IIT Roorkee.