Data Science Intern Resume
ChicagO
SUMMARY
- Big Data Engineer with 3+ years of experience in Hadoop, Spark, Java, SQL & NoSQL Database Systems, ETL and Message queues
- Data Science Enthusiast with excellent knowledge in Machine Learning and Statistical Modelling using Python and R
TECHNICAL SKILLS
Languages & Software's: Python, Java, Scala, SQL Server, Cassandra, HBase, Linux, Micro Services, Rest API, Node.Js, Celery
Big Data Tools: Hadoop (HDFS, Map - Reduce), Hive, Sqoop, Spark, Kafka, AWS EMR, Lambda, API Gateway, S3, Redshift
Data Science Tools: Numpy, Pandas, Scipy, Scikit-learn, Keras, Tensorflow, Seaborn, MatplotLib, dplyr, ggplot2, caret, Tableau, D3
Machine Learning: Classification, Regression, Clustering, Feature engineering, Time series, Hypothesis Testing
PROFESSIONAL EXPERIENCE
Data Science Intern
Confidential, Chicago
Responsibilities:- Core member of data and analytics cloud platform team for building ETL framework and analytics engine in Python
- Implemented auto ingestion of various file formats and HL7 messages in PySpark using both heuristic & machine learning methods
- Developed Hadoop profiler engine in Python which generates dynamic hive queries to prepare data quality metrics and cleansing
- Developed and integrated microservices with message queues for scheduling various data processing jobs/services
- Implemented auto mapping of disparate medical codes (morphology) to its standard target class using Developed K-means Clustering and KNN algorithms
Data Engineer
Confidential, Chicago
Responsibilities:
- Developed data transfer utility in python using Sqoop to incrementally update hive tables in Cloudera cluster from SQL Server
- Identified patterns causing delays in completing research completion in UIC by Hive queries & tableau visualizations
Project Intern
Confidential, Chicago
Responsibilities:
- Designed revenue forecasting for sales data using ARIMA models in R which helped adjusting the new target to the sales team
- Implemented Survival Analysis Model in Python to classify the sales opportunity, and communicated these solutions to executive stakeholders using Seaborn visualizations
Software Development Engineer-II
Confidential
Responsibilities:
- Acquired ratings and review data from Cassandra servers and built a recommendations engine using collaborative filtering for Video on Demand service in Confidential Set-Top Box Service resulted in increased VOD usage
- Built a deployment ready regression model to determine the project burndown by acquiring data from multiple sources like Jira, Rally and created a dashboard using D3 visualizations which helped project stakeholders to allocate resources based on trends
- Developed Java based web application to network provisioning, discovery and assurance of L3VPN and SONET services
Software Engineer
Confidential
Responsibilities:
- Implemented Jenkins log parsers using regex for identifying errors/warnings patterns which helped in resolving build failures quickly
- Introduced TDD and optimized the apache ant build scripts by managing dependencies which reduced Jenkins build time by 50%