We provide IT Staff Augmentation Services!

Data Scientist Resume

SUMMARY

  • A dedicated and innovative IT professional with 12 years of experience and knowledge in the IT industry. Have hands - on experience in the end-to-end project lifecycle including project design, test driven development, performance tuning, and predictive analytics. Strong critical thinking, problem-solving, and application development skills.
  • Java developer with 12 years end-to-end Project experience using Agile.
  • 3+ years of hands-on experience in Python for application development, and machine learning.
  • Hands-on experience in working with Apache PySpark (Spark Core, Spark SQL) and Apache Hive.
  • Hands-on experience in Spark Streaming (Structured Streaming API).
  • Hands-on experience in integrating Eclipse with AWS Elastic Beanstalk and deploy app on cloud.
  • Experience in integrating different SAP/non-SAP systems using Datahub, data-extraction, transformation and loading into one single Datastore.
  • Hands-on experience in machine learning algorithms using Decision Tree, Support Vector Machine, Random Forest, KNN, K-Means, Hierarchical Clustering.
  • Hands-on experience in Text Wrangling using nltk and Genism library for Natural Language Processing.
  • Hands-on experience in SQL.
  • Data Visualization using Tableau and Python libraries like Matplotlib and Seaborn.
  • Hands-on experience in projects starting from data extraction, data cleaning, data processing and modelling.
  • Hands-on experience in data engineering using Pandas, NumPy and PySpark and R.
  • Good applied Statistics skills, Inferential statistics, Regression analysis etc
  • Experience in client handling and 3rd party stakeholders.
  • Team Lead experience with technical support to team and Project Management.

TECHNICAL SKILLS

Programming Languages: Java 1.8, Python 3.6

Framework/Platform: Spring, Spring Integration, REST Webservices, Datahub

Big Data Ecosystem: HDFS, Apache Hive, Apache PySpark (Spark Core, Spark SQL, Spark Streaming)

Machine Learning: Python NumPy, Pandas, Scikit Learn, NLTK, Regression, Statistical Analysis

Databases: MySQL, SQL Server, NoSQL- MongoDB

BI Tool: Tableau

Cloud: AWS (IAM, S3, EMR, Lambda, Glue), GCP (Google Colab, DataProc)

PROFESSIONAL EXPERIENCE

Data Scientist

Confidential

Responsibilities:

  • Built a Recommendation engine using the concepts of Collaborative Filtering.
  • Incremental data loads into the Google storage using Kafka and Spark streaming job from an existing web-application that captures user reviews.
  • Built an ALS model on the existing dataset and optimized performance by tweaking hyperparameters.
  • Built a Spark job that reads these user-rating from Google storage and after cleaning and transformation uses the pre-built ALS model to build item-based recommendations and store in MongoDB.
  • A Spring-boot REST application consumes the recommendations to return to the user.

Loan Recommendation system/Big Data

Confidential

Responsibilities:

  • Built a spark application to read prospect customers data from AWS S3 for a Loan approval application, cleanse, transform and load the same in Apache Hive table.
  • Submitted the spark job using AWS EMR.
  • Built a web application that provides a user interface to query data from AWS RDS(MySQL/Redshift) and save the results in PDF format. The application was deployed using Elastic Beanstalk.

Technology Lead/Data Analyst

Confidential

Responsibilities:

  • Responsibilities include requirement analysis, solution design, project management, client demos and technical support to team.
  • Core developer for building a robust PIM (Product Information System) for Hershey’s USA that aggregates product data from multiple SAP/Non-SAP systems into a single data storage using Hybris Datahub/Spring Integration.
  • Worked as part of the Confidential Core Product team to develop an ecommerce platform designed to handle Apparel and Telco based websites. Development environment used Core Java, Spring (Core, MVC), Hybris, Apache Tomcat and Oracle 10.2g.
  • Evaluation and Implementation of Hadoop Ecosystem (Apache Spark/ Hive) as an alternative to Oracle based processing for big data ingestion, transformation and storage for a Retail Client.

Hire Now