SAMEER SHUKLA

DATA SCIENTIST  |  AI ENTHUSIAST  |  DATA ANALYST

Data Scientist & Open-Source Contributor building practical AI and automation tools using Python.

5+PROJECTS
2+INTERNSHIPS
5+CERTIFICATIONS
CURIOSITY

WHO AM I

I'm Sameer Shukla, a BS Life Science graduate from the University of Delhi with a Computer Science minor, working at the intersection of Data Science, Machine Learning, and open-source software.

My journey started with science and curiosity — and led me to Python, Power BI, machine learning, and AI agents. I love building tools that solve real problems and sharing them with the community.

I believe the most impactful software is open, collaborative, and built in public. I actively use, study, and contribute to open-source projects — from large ecosystems like webpack to my own public repositories.

As an open-source contributor, I focus on tools I genuinely use — submitting fixes, improving documentation, and engaging with maintainers. Every contribution, however small, is a step toward making the ecosystem better for everyone.

// LANGUAGES & DATA
PythonSQL StatisticsNumPyPandas
// ML & AI
Machine LearningAI Agents Predictive ModelingOpenCVface_recognition
// TOOLS & INFRA
Power BIDocker Git / GitHubpytest ReactNode.js
// SOFT SKILLS
Analytical ThinkingResearch Prompt EngineeringAutomation

WHAT I'VE BUILT

01 /
AI AGENTS & AUTOMATION
Workflow Automation · Intelligent Tools

Exploring AI-powered automation agents for workflow automation, intelligent responses, and data-driven decision support. Integrating Python tools with modern AI frameworks and APIs to build practical productivity-enhancing applications.

PythonAI Frameworks APIsAutomationPrompt Engineering
VIEW ON GITHUB →
02 /
ROBOSPEAKER
Smart Text-to-Speech Application

A cross-platform Python TTS app using pyttsx3 with automatic fallback to native system commands (Windows PowerShell / macOS say). Supports interactive CLI and one-shot argument mode with a full pytest test suite.

Pythonpyttsx3 CLIpytestCross-platform
VIEW ON GITHUB →
03 /
FACE ATTENDANCE SYSTEM
Computer Vision · Real-time Recognition

Automates attendance tracking via real-time webcam facial recognition. Matches faces against stored encodings, logs name + timestamp to CSV with duplicate-prevention. Demonstrates ML-based encoding and real-time image processing.

PythonOpenCV face_recognitionNumPyCSV
VIEW ON GITHUB →
04 /
NETFLIX DATASET ANALYSIS
Data Analysis · Visualization · EDA

Deep exploratory data analysis of the Netflix content library. Uncovers trends across genres, countries, release years, and content types. Combines statistical analysis with rich visualizations to drive actionable insights.

PythonPandas MatplotlibSeabornEDA
VIEW ON GITHUB →
05 /
US STORE SALES FORECAST
Predictive Modeling · Time Series

Sales forecasting model for US retail stores using time-series analysis and predictive modeling. Identifies seasonal trends and demand patterns to help businesses optimize inventory and strategy decisions.

PythonScikit-learn Time SeriesForecastingPandas
VIEW ON GITHUB →
06 /
TASK MANAGER APP IN DEV
Full-Stack · Productivity Tool

A productivity app that schedules and tracks tasks with completion status. Explores enforcing focus by restricting device usage when tasks aren't completed on time — combining productivity management with behavioral accountability.

ReactNode.js DatabaseFull-stack
FOLLOW PROGRESS →

OPEN-SOURCE CONTRIBUTIONS

I contribute to the open-source tools I rely on every day — from build systems to data libraries. Here's where I've engaged with the community beyond my own projects.

📦
webpack
webpack / webpack

The most widely used JavaScript module bundler — powering millions of projects worldwide including React, Vue, and Angular toolchains.

JavaScript Bundler Build Tools
// MY CONTRIBUTION

Contributed to the webpack core repository. Engaged with the maintainer community on one of the most critical tools in the modern JS ecosystem.

🐍
Python Ecosystem
NumPy · Pandas · Scikit-learn

The foundational data science stack — NumPy for numerical computing, Pandas for data manipulation, and Scikit-learn for machine learning. Tools I use in every project.

Python Data Science ML
// MY ENGAGEMENT

Active user of these libraries across all my data science projects. Engaged with community discussions, issue tracking, and documentation improvements.

👁️
OpenCV
opencv / opencv-python

The world's most popular computer vision library. Used in my Face Attendance System for real-time webcam capture, image preprocessing, and frame analysis.

Computer Vision Python C++
// MY USAGE

Built a real-time face recognition attendance system on top of OpenCV. Engaged with the community around Python bindings and image processing pipelines.

🧪
pytest
pytest-dev / pytest

The go-to Python testing framework. I use pytest in my projects to write clean, maintainable test suites — including a full suite for RoboSpeaker.

Testing Python TDD
// MY USAGE

Wrote comprehensive test suites using pytest for my CLI and automation projects. Followed pytest's own documentation standards and community best practices.

⚙️
n8n
n8n-io / n8n

A powerful open-source workflow automation tool. I use n8n to build automated pipelines integrating Telegram bots, Google Sheets, Gmail, and AI APIs.

Automation No-code Workflows
// MY USAGE

Built production automation workflows for inventory management and order tracking systems. Actively engaged with the n8n community for workflow design patterns.

🐳
Docker
docker / docker-ce

The container platform that enables consistent, reproducible development environments. An essential tool in my infrastructure and deployment workflow.

Containers DevOps Infrastructure
// MY USAGE

Used Docker to containerise Python applications and ensure consistent environments across development and deployment pipelines.

WORK HISTORY

DEC 2025 — MAR 2026
Research & Operations Intern
Satvify
  • Completed an Operations & Research internship supporting data-driven decision making and workflow optimization.
  • Developed automation workflows integrated with Google Sheets to streamline order tracking and reporting.
  • Conducted operational research to identify process inefficiencies and proposed improvements to enhance productivity.
  • Analyzed operational data and generated structured reports summarizing performance metrics for internal review.
  • Collaborated with cross-functional teams to automate repetitive tasks and improve overall workflow efficiency.
OCT 2025 — NOV 2025
Power BI / Healthcare Data Analyst Intern
National Skill Development Corporation (NSDC)
  • Analyzed and transformed healthcare datasets to create interactive dashboards supporting evidence-based decision-making.
  • Performed data cleaning, preprocessing, and reporting to improve the quality of healthcare analytics.
  • Collaborated with cross-functional teams to identify system inefficiencies and contributed to process improvements.
  • Utilized Power BI to visualize operational insights, enabling better monitoring of healthcare performance metrics.
  • Contributed technical insights that supported a measurable reduction in system failures.

CERTIFICATIONS

🤖
Intro to AI Agents & Agentic AI
365 DATA SCIENCE
NOV 2025 · ID: CC-EFFED0597A
🧮
Math Foundation for ML
365 DATA SCIENCE
NOV 2025 · ID: CC-96178A8B41
🧠
GenAI Job Simulation
BCG × FORAGE
JUN 2025 · ID: Twycj8fLaEddCQwtS
📊
Data Analytics Job Simulation
DELOITTE × FORAGE
JUN 2025 · ID: hzMXmmNeyxY7ecmAv
🏆
Hansraj Case Challenge 3.0
HANSRAJ COLLEGE × HK UNIVERSITY
APR 2023 · CERTIFICATE OF EXCELLENCE

GET IN TOUCH

Open to open-source collaborations, internships, and data science opportunities.