Portfolio — 2026

Soumyadeep
Roy

Data Analyst · Data Engineer · MSc Computer Science (Merit), University of Leeds. Building insight from data — from end-to-end ML pipelines to cloud-based ETL workflows.

Location
Siliguri, West Bengal
Availability
15 days or less
95% NPTEL Score — IIT Roorkee
Open to relocate Data · ML Cloud BFSI
Python SQL Machine Learning AWS ETL Pipelines Data Engineering Predictive Modelling scikit-learn PostgreSQL Python SQL Machine Learning AWS ETL Pipelines Data Engineering Predictive Modelling scikit-learn PostgreSQL
01

About me

I'm a data analytics professional who enjoys turning messy, real-world datasets into clean, actionable insight. With an MSc (Merit) in Computer Science from the University of Leeds and a BTech from SRM, I've worked across the full data stack — from building ML models that predict dengue fever spread across South American cities, to building cloud-based ETL pipelines on AWS.

At Sky UK, I worked in a high-volume service environment where I analysed operational datasets and system logs to identify recurring technical failure patterns — translating investigative findings into process improvements used by cross-functional teams.

Now back in India and actively seeking entry-level roles as a Data Analyst, Business Analyst, or Data Engineer at MNCs, BFSI institutions, or consulting firms. I bring strong Python and SQL skills, hands-on cloud experience, and genuine curiosity for data-driven problems.

LocationSiliguri, West Bengal, India
Open to RelocateAcross India
Focus AreaData · ML · Cloud
LanguagesEnglish · Hindi · Bengali
EducationMSc CS — University of Leeds (Merit)
Contact+91 98322 05144
02

Work Experience

Jun 2024 – Jul 2025
Data · Operations
Sales Advisor — Technical Support & Data Operations
Sky UK · Leeds, United Kingdom · Full-time
  • Analysed customer service records and operational datasets to identify recurring technical issues and service disruption patterns, enabling proactive resolution strategies
  • Investigated system logs and backend service records to determine root causes of technical incidents, supporting faster resolution times across service teams
  • Maintained structured documentation of analytical findings within internal CRM and reporting systems, ensuring data accuracy and auditability
  • Conducted requirement analysis by gathering and interpreting customer-reported technical data for cross-functional resolution teams
  • Communicated data-backed investigation updates to technical and operations teams, contributing to process improvement initiatives
May – Aug 2022
Cloud · AWS
Cloud & Data Engineering Intern
Kritter Software Technology · Bengaluru, Karnataka · Internship
  • Assisted development of cloud-based data workflows and automated ETL pipelines using AWS S3 and AWS Lambda
  • Performed data validation, cleaning, and transformation of structured datasets used in backend data processing
  • Supported monitoring and debugging of automated data pipelines, ensuring data integrity and reliability across production systems
  • Identified and documented data inconsistencies within backend systems, contributing to improved data quality and pipeline performance
Dec 2019 – Jan 2020
Python · Backend
Backend Developer Intern
Workex · Bengaluru, Karnataka · Internship
  • Assisted backend development tasks using Python, supporting application module development and debugging
  • Contributed to unit testing, QA validation, and bug reporting for software components
  • Maintained version-controlled codebase using Git and documented technical observations for senior developers
03

Selected Projects

001
MSc Research · University of Leeds
DengAI — Dengue Spread Prediction

End-to-end ML research project predicting dengue fever spread across San Juan and Iquitos using epidemiological and environmental time-series data. Benchmarked against the DrivenData competition dataset. Evaluated Random Forest and Gradient Boosting models against regression baselines for weekly case count prediction.

PythonPandasscikit-learnRandom ForestGradient BoostingMatplotlibSeaborn
View on GitHub ↗
002
Independent Project
Credit Card Fraud Detection System

Full-pipeline data science project detecting fraudulent financial transactions on real-world imbalanced datasets. Applied SMOTE and undersampling to handle class imbalance. Built and compared Logistic Regression and Random Forest classifiers, achieving strong precision-recall performance. Documented end-to-end pipeline from data ingestion to model evaluation.

PythonPandasscikit-learnSMOTELogistic RegressionMatplotlibSeaborn
003
Internship Project · Workex
Recruitment Chatbot Workflow

Collaborated on designing and implementing a conversational chatbot system for a job recruitment platform at Workex. The bot automated candidate screening through 15+ conditional decision nodes, evaluating education, language fluency, experience, and salary expectations — with automated SMS and email scheduling upon qualification.

PythonChatbot DesignJSON WorkflowAutomation
04

Skills & Tools

Languages
  • Python (Pandas, NumPy, scikit-learn)
  • SQL
  • C++
Data & Analytics
  • Exploratory Data Analysis
  • Data Cleaning & Feature Engineering
  • Statistical & Predictive Modelling
Machine Learning
  • Classification & Regression
  • Random Forest, Gradient Boosting
  • Class Imbalance Handling (SMOTE)
Databases
  • MySQL
  • PostgreSQL
  • Microsoft Excel (Advanced)
Cloud & DevOps
  • AWS S3
  • AWS Lambda
  • Git & GitHub
Visualisation
  • Matplotlib
  • Seaborn
  • Python data visualisation libraries
05

Education

Master of Science
Computer Science
University of Leeds, United Kingdom
Sep 2022 – Sep 2023 · Merit
Relevant modules: Machine Learning, Data Mining, Cloud Computing, Software Engineering. Research focus: Epidemiological data analysis and predictive modelling using Python.
Bachelor of Technology
Computer Science Engineering
SRM Institute of Science & Technology, Kattankulathur
Jan 2017 – Jan 2021 · 76%
Core coursework: Data Structures, Algorithms, Database Systems, Object-Oriented Programming.
06

Certifications

Data Analytics with Python 95%
NPTEL · IIT Roorkee (SWAYAM)
Jan – Apr 2020
Machine Learning
Stanford University · Coursera
Oct 2019
Building Web Applications in PHP
University of Michigan · Coursera
Apr 2020
Blockchain & Decentralised AI Workshop
Microsoft Student Partners Club
Oct 2019

Let's work
together.

Open to Data Analyst, Business Analyst, and Data Engineer roles at MNCs, BFSI institutions, and consulting firms. Available within 15 days.