Data Science Intern Resume



  • Big Data Engineer with 3+ years of experience in Hadoop, Spark, Java, SQL & NoSQL Database Systems, ETL and Message queues
  • Data Science Enthusiast with excellent knowledge in Machine Learning and Statistical Modelling using Python and R


Languages & Software's: Python, Java, Scala, SQL Server, Cassandra, HBase, Linux, Micro Services, Rest API, Node.Js, Celery

Big Data Tools: Hadoop (HDFS, Map - Reduce), Hive, Sqoop, Spark, Kafka, AWS EMR, Lambda, API Gateway, S3, Redshift

Data Science Tools: Numpy, Pandas, Scipy, Scikit-learn, Keras, Tensorflow, Seaborn, MatplotLib, dplyr, ggplot2, caret, Tableau, D3

Machine Learning: Classification, Regression, Clustering, Feature engineering, Time series, Hypothesis Testing


Data Science Intern

Confidential, Chicago

  • Core member of data and analytics cloud platform team for building ETL framework and analytics engine in Python
  • Implemented auto ingestion of various file formats and HL7 messages in PySpark using both heuristic & machine learning methods
  • Developed Hadoop profiler engine in Python which generates dynamic hive queries to prepare data quality metrics and cleansing
  • Developed and integrated microservices with message queues for scheduling various data processing jobs/services
  • Implemented auto mapping of disparate medical codes (morphology) to its standard target class using Developed K-means Clustering and KNN algorithms

Data Engineer

Confidential, Chicago


  • Developed data transfer utility in python using Sqoop to incrementally update hive tables in Cloudera cluster from SQL Server
  • Identified patterns causing delays in completing research completion in UIC by Hive queries & tableau visualizations

Project Intern

Confidential, Chicago


  • Designed revenue forecasting for sales data using ARIMA models in R which helped adjusting the new target to the sales team
  • Implemented Survival Analysis Model in Python to classify the sales opportunity, and communicated these solutions to executive stakeholders using Seaborn visualizations

Software Development Engineer-II



  • Acquired ratings and review data from Cassandra servers and built a recommendations engine using collaborative filtering for Video on Demand service in Confidential Set-Top Box Service resulted in increased VOD usage
  • Built a deployment ready regression model to determine the project burndown by acquiring data from multiple sources like Jira, Rally and created a dashboard using D3 visualizations which helped project stakeholders to allocate resources based on trends
  • Developed Java based web application to network provisioning, discovery and assurance of L3VPN and SONET services

Software Engineer



  • Implemented Jenkins log parsers using regex for identifying errors/warnings patterns which helped in resolving build failures quickly
  • Introduced TDD and optimized the apache ant build scripts by managing dependencies which reduced Jenkins build time by 50%

