We provide IT Staff Augmentation Services!

Data Scientist Resume

4.00/5 (Submit Your Rating)

Youngstown, OH

SUMMARY:

  • 10 years of experience in analyzing data and solving real world problems in various domains like Banking/Finance, Healthcare and E - Commerce.
  • Working knowledge in managing entire data science life cycle with large datasets in an agile environment.
  • Skilled in analytical and statistical programming languages such as Python, R and SQL.
  • Experience in performing data scraping using Python (Beautiful Soup and Scrappy).
  • Performed Exploratory Data Analysis and Feature Engineering using Numpy, Pandas and SciKit-learn in Python and dplyr, reshape2, tidyr in R.
  • Experienced in translating business requirements to data science and technical requirements as well as providing analytical feedback.
  • Created data models using machine learning algorithms like K-Means, KNN, Linear and Logistic Regression, SVM, Decision Trees, Random Forest and XGBoost.
  • Strong skills in statistical methodologies such as A/B test, Hypothesis test and ANOVA.
  • Performed model evaluations using Confusion Matrix, ROC Curve, RMSE and other metrics.
  • Hands on experience in querying relation databases including SQL Server, MySQL and Oracle 11g.
  • Working knowledge in Big Data Ecosystem including Hadoop 2.x (HDFS, MapReduce), Hive and Spark.
  • Knowledge on NoSQL databases such as Cassandra, MongoDB and HBase.
  • Adept in designing visualizations using Tableau, JMP Pro, D3.js, Python (matplotlib, Seaborn, Bokeh) and R (ggplot2, shiny, plotly) for publishing and presenting Dashboards and Automated Reports.
  • Worked on PyCharm, Jupyter Notebook and RStudio IDEs.
  • Worked with applications on AWS and Hortonworks cloud systems.
  • Used Git repository for version control.
  • Quick learner with strong interpersonal skills. Worked individually as well as a team member while collaborating with others in distributed environments.

TECHNICAL SKILLS:

Languages Data Analysis/Visualization: Python 3.0, R 3.3 Tableau, JMP Pro, D3.js JAVA, PL/SQL

Databases Big Data Tools: SQL Server, MySQL, Oracle 11g Hadoop 2.x, Map Reduce, Hive, Spark Cassandra, MongoDB, HBase

Packages Machine Learning Algorithms: Python - Pandas, Numpy, SciKit-learn Naïve Bayes, Decision trees, Linear and statsmodels, SciPy,Scrapy, Beautiful Soup Logistic Regression, Decision Tree, SVM Seaborn, Matplotlib; R- rpart, e1071, dplyr Random Forest, k-means, LDA, Bagging tidyr, reshape2, stats,caret, ggplot2, shiny Gradient Boosting, XGBoost, Time- Series

Cloud Tools Tools: Anaconda 3.0 (Jupyter notebook) Continuous Integration - Jira, Jenkins Amazon S3, Hortonworks ETL - OBIee, Control - GIT

PROFESSIONAL EXPERIENCE:

Confidential, Youngstown, OH

Data Scientist

Responsibilities:

  • Retrieved millions of records of housing data from relational databases as well as external sources using PL/SQL and HiveQL.
  • Aggregated and manipulated the batch data collected by a scheduled job to scrape data from necessary websites regularly.
  • Worked on data cleaning by ensuring data quality and consistency using Pandas and
  • Performed outlier identification with box-plots, studentized residuals and K-
  • Explored the data through univariate and multivariate analysis to identify any underlying patterns and associations between the variables with SciPy and Seaborn.
  • Performed feature engineering such as feature generation, feature normalization and label encoding using SciKit-learn.
  • Created data models such as Decision Trees, Random Forest and XGBoost using SciKit-learn.
  • Aided in Parameter Tuning the XGBoost model to improve performance and efficiency.
  • Used F-Score, RMSE, and Confusion Matrix for evaluating model performances.
  • Generated data visualizations using Tableau to report to the management regularly.
  • Coordinated with business analysts and subject matter experts for requirements gathering.
  • Conducted comprehensive analysis and evaluations of business needs and influenced decisions for different models.
  • Collaborated with dev-ops teams for production deployment.

Environment: Python 3.0, Amazon S3, Oracle, HDFS, Spark, Hive, HBase, Sqoop, Tableau 9.0, D3.js, Jira, GIT.

Confidential, Westfield, OH

Data Analyst

Responsibility:

  • Performed Predictive and Statistical Modeling along with the other aspects of data analytics techniques to collect, explore, and extract insights from data.
  • Coordinated with data engineers in setting up HDFS by mining data from various internal sources.
  • Merged and queried the data as a part of data extraction using SQL and HiveQL.
  • Performed data aggregation and cleansing using Numpy and Pandas.
  • Analyzed the various distributions and correlations using matplotlib.
  • Extracted valuable features through Feature Engineering using SciKit-learn.
  • Built the machine learning models such as Linear Regression and XGBoost using SciKit-learn.
  • Performed time-series analysis using ARIMA model to improve prediction accuracy and precision using statsmodels and SciPy.
  • Evaluated the model performance using Cross Validation, Confusion Matrix and RMSE.
  • Generated weekly and monthly reports while maintaining and manipulating data using JMP Pro 12.2, MS Excel 2015, SSRS and SSIS.
  • Reported findings to the management regularly and coordinated with business analysts for evaluations.
  • Used Git extensively throughout the course of the project.
  • Continuously coordinated with the development team for model deployment and maintenance.

Environment: Python 2.7, Horton Works, Hadoop, Hive, SQL Server 2012, Git, JMP Pro 12.2, MS Excel 2015, SSRS, SSIS.

Confidential

Business Intelligence Analyst

Responsibilities:

  • Analyzed the technical requirements with the business users and developers.
  • Extracted data from flat files and relational databases to load them in the central database after applying business logic.
  • Created jobs, workflows and data flows according to specifications and implemented their business logic.
  • Extensively used stored procedures, triggers and functions for implementing business rules and transformations.
  • Scheduled ETL jobs using data services management console.
  • Regularly carried out performance tuning of existing jobs.
  • Involved in monitoring the daily run of the scripts as well as troubleshooting in the event of any errors in the entire process.

Environment: OBIee 11.1.1.9, Oracle 11g, MS Office Suite.

Confidential

Data Analyst

Responsibilities:

  • Analyzed online user behavior through multi-channel attribution and funnel analysis.
  • Worked on segmentation analysis to explore different site variations.
  • Developed predictive models implementing Linear Regression and Naïve Bayes Using R’s e1071 and stats.
  • Generated graphs and reports using ggplot and plotly in RStudio for analyzing models.
  • Generated dashboards in MS Excel to report to the management regularly.
  • Used available data sources to deep dive and troubleshoot campaign performance issues.

Environment: R 3, SQL Server 2012, MS Office Suite.

Confidential

PL/SQL developer

Responsibilities:

  • Worked on requirements gathering, analysis, design, change management and deployment.
  • Created database objects like Tables, Procedures and Integrity Constraints.
  • Wrote queries using joins, sub queries and correlated sub queries to retrieve data from database.
  • Created indexes on the tables for faster retrieval of the data to enhance database performance.
  • Performed data cleaning, imputed missing values and made datasets ready for analysis.

Environment: Oracle 11g, PL/SQL, MS Office Suite.

We'd love your feedback!