Kanishk Kumar
Data Scientist | Python Associate | ML Analyst
βCode Smart. Think Python.β
kanishkk202@gmail.com | +91-9990465116/7042693420 | Noida, Uttar Pradesh
π About ME
Data Scientist with an MSc in Data Science from HSE University and 2+ years of hands-on project experience in Machine Learning, Python and
NLP. Proficient in Python, model deployment (Flask, Docker), and deep learning techniques including Transformers and LSTMs. Strong foundation
in statistics and big data (Spark), with a keen interest in MLOps and Generative AI. Seeking an entry-level role to apply data-driven solutions to
real-world problems.
π‘ IT Skills
- Languages & Libraries: Python (Advanced), SQL (Intermediate), Pandas, NumPy, Scikit-learn, PyTorch, TensorFlow/Keras, XGBoost.
- Machine Learning & AI: Natural Language Processing (BERT, GPT, Transformers), Time-Series Analysis (LSTM), Computer Vision, Statistical
Modeling, A/B Testing.
- MLOps & Deployment: Docker, Flask, FastAPI, Celery, MLflow, Git.
- Big Data & Cloud: Apache Spark (RDD, DataFrames), Hadoop, Azure
π§ Projects{GitHub Repository ^ YouTube Demo}
Real-Time Stock Price Prediction Project | Python, LSTM, Data Visualization (Mar, 2026 - Present)
- Automated data ingestion (yfinance), scaling, and sliding window (60βday) preprocessing for LSTM modelling.
- Designed a 2βlayer LSTM with dropout and early stopping, training on 80% data to predict nextβday closing prices.
- Visualised results interactively using Plotly (actual vs. predicted, error bands, future forecast).
π GitHub Repository
Automated Sales Performance & Customer Analytics Dashboard Project | Python, SQL, Power Query (Mar, 2026 - Present)
- Designed an automated analytics workflow to clean and visualize transactional data, surfacing key revenue drivers.
- Identified that Premium customers generate 68% of total revenue, while overall 3-month retention declines to 40%.
π Football Match Outcome Prediction with Deep Learning Project | Python, LSTM, Scikit-learn
Higher School of Economics - Moscow, Russian Federation
- Engineered sequential time-series features from historical match data to predict game outcomes.
- Compared performance of LSTM networks Logistic Regression, Random Forest AUC-ROC optimization & log-loss reduction.
π YouTube Demo
Multilingual Health Assistant Chatbot Project | Python, NLP, Dialog Management
Google Data Project sources based on Python Language
- Built an NLP-driven chatbot capable of processing health-related queries in 3 different languages.
- Implemented custom intent classification and entity recognition to navigate over 50 distinct conversational paths.
π YouTube Demo
BoolQ Question Answering with Fine-Tuned BERT Project | PyTorch, Transformers, NLP
NLP Chatbot using BERT, word2vec, Stopwords | π οΈ Tools: Python, Sklearn, Pandas
- Fine-tuned a BERT-base model on the BoolQ dataset, achieving 87% accuracy in binary question answering.
- Engineered comparison feature pipelines using word2vec and fastText embeddings to benchmark transformer performance SVM.
π GitHub Repository
End-to-End MLOps Pipeline: Iris Classification Web App Project | Flask, Docker, Celery, MLflow
SVM, LR, DT, K-NN, LSML, Data Visualization | π οΈ Tools: Python, Flask, Docker, Celery, MLflow
- Containerized a Flask inference API using Docker to ensure environment consistency across development and production.
- Implemented Celery for asynchronous prediction task management, improving response times and user experience.
π GitHub Repository
IMDB Exploratory Data Analysis & Visualization with Machine Learning
Logistic Regression, K Neighbors, Decision Tree Classifier Model | π οΈ Tools: Python, Pandas, NumPy, Matplotlib, Seaborn, Plotly & Transformer
π GitHub Repository
Click prediction analysis with Spark RDD & DataFrame APIs
Advanced Hadoop RDD Python DataFrame LSML | π οΈ Tools: Python, Spark, HDFS, RDD
π GitHub Repository
π§ͺ Other Experiences
- IT Coordinator - AISATS (Aug, 2024 - Nov, 2024)
AIR INDIA - Gurugram, HR, IN
- Troubleshoot and resolve hardware and network issues.
- Collaborate with IT teams to plan and execute system upgrades.
- π¬π± Associate Analyst, Content Engineering - [Hitachi Group Company] (Nov, 2022 - April, 2023)
GlobalLogic Technologies Pvt Ltd - Gurugram, Haryana
- Google Data Project sources based on Python Language & ML.
- Use Data Analytical Tool tasks for image detection, recognition.
π Certifications
-
π Udemy - The Complete Machine Learning Course with Python (2024 - 2025)
Statistics & Linear Algebra, EDA, Supervised, Unsupervised, Deep Learning, Model Selection, NLP
π GitHub ML Python
-
π Udemy - Python 3: Project-based Python, Algorithms, Data Structures (2024 - 2025)
Data Structure, Sorting, Graph & Search Algorithms, Recursion & Backtracking, Dynamic Programming
π GitHub Data Structure & Algorithm
-
IBM Python 101 for Data Science
π§ Education & Internships
- MS Data Science β HSE, Moscow (2021β2023)
- Relevant coursework: Advanced Machine Learning, Deep Learning (CNNs, RNNs, Transformers), Natural Language Processing, Big Data Analytics
(Spark), Statistical Inference, Bayesian Methods.
- BTech Food Tech β HBTI Kanpur, India (2012-2017)
π§ kanishkk202@gmail.com
π WhatsApp
π± M.No: +91-9990465116
π Available for remote/office work, freelancer, mentoring & collaborations
π Languages Known