Data Scientist Resume Youngstown, OH - Hire IT People

SUMMARY:

10 years of experience in analyzing data and solving real world problems in various domains like Banking/Finance, Healthcare and E - Commerce.
Working knowledge in managing entire data science life cycle with large datasets in an agile environment.
Skilled in analytical and statistical programming languages such as Python, R and SQL.
Experience in performing data scraping using Python (Beautiful Soup and Scrappy).
Performed Exploratory Data Analysis and Feature Engineering using Numpy, Pandas and SciKit-learn in Python and dplyr, reshape2, tidyr in R.
Experienced in translating business requirements to data science and technical requirements as well as providing analytical feedback.
Created data models using machine learning algorithms like K-Means, KNN, Linear and Logistic Regression, SVM, Decision Trees, Random Forest and XGBoost.
Strong skills in statistical methodologies such as A/B test, Hypothesis test and ANOVA.
Performed model evaluations using Confusion Matrix, ROC Curve, RMSE and other metrics.
Hands on experience in querying relation databases including SQL Server, MySQL and Oracle 11g.
Working knowledge in Big Data Ecosystem including Hadoop 2.x (HDFS, MapReduce), Hive and Spark.
Knowledge on NoSQL databases such as Cassandra, MongoDB and HBase.
Adept in designing visualizations using Tableau, JMP Pro, D3.js, Python (matplotlib, Seaborn, Bokeh) and R (ggplot2, shiny, plotly) for publishing and presenting Dashboards and Automated Reports.
Worked on PyCharm, Jupyter Notebook and RStudio IDEs.
Worked with applications on AWS and Hortonworks cloud systems.
Used Git repository for version control.
Quick learner with strong interpersonal skills. Worked individually as well as a team member while collaborating with others in distributed environments.

TECHNICAL SKILLS:

Languages Data Analysis/Visualization: Python 3.0, R 3.3 Tableau, JMP Pro, D3.js JAVA, PL/SQL

Databases Big Data Tools: SQL Server, MySQL, Oracle 11g Hadoop 2.x, Map Reduce, Hive, Spark Cassandra, MongoDB, HBase

Packages Machine Learning Algorithms: Python - Pandas, Numpy, SciKit-learn Naïve Bayes, Decision trees, Linear and statsmodels, SciPy,Scrapy, Beautiful Soup Logistic Regression, Decision Tree, SVM Seaborn, Matplotlib; R- rpart, e1071, dplyr Random Forest, k-means, LDA, Bagging tidyr, reshape2, stats,caret, ggplot2, shiny Gradient Boosting, XGBoost, Time- Series

Cloud Tools Tools: Anaconda 3.0 (Jupyter notebook) Continuous Integration - Jira, Jenkins Amazon S3, Hortonworks ETL - OBIee, Control - GIT

PROFESSIONAL EXPERIENCE:

Confidential, Youngstown, OH

Data Scientist

Responsibilities:

Retrieved millions of records of housing data from relational databases as well as external sources using PL/SQL and HiveQL.
Aggregated and manipulated the batch data collected by a scheduled job to scrape data from necessary websites regularly.
Worked on data cleaning by ensuring data quality and consistency using Pandas and
Performed outlier identification with box-plots, studentized residuals and K-
Explored the data through univariate and multivariate analysis to identify any underlying patterns and associations between the variables with SciPy and Seaborn.
Performed feature engineering such as feature generation, feature normalization and label encoding using SciKit-learn.
Created data models such as Decision Trees, Random Forest and XGBoost using SciKit-learn.
Aided in Parameter Tuning the XGBoost model to improve performance and efficiency.
Used F-Score, RMSE, and Confusion Matrix for evaluating model performances.
Generated data visualizations using Tableau to report to the management regularly.
Coordinated with business analysts and subject matter experts for requirements gathering.
Conducted comprehensive analysis and evaluations of business needs and influenced decisions for different models.
Collaborated with dev-ops teams for production deployment.

Environment: Python 3.0, Amazon S3, Oracle, HDFS, Spark, Hive, HBase, Sqoop, Tableau 9.0, D3.js, Jira, GIT.

Confidential, Westfield, OH

Data Analyst

Responsibility:

Performed Predictive and Statistical Modeling along with the other aspects of data analytics techniques to collect, explore, and extract insights from data.
Coordinated with data engineers in setting up HDFS by mining data from various internal sources.
Merged and queried the data as a part of data extraction using SQL and HiveQL.
Performed data aggregation and cleansing using Numpy and Pandas.
Analyzed the various distributions and correlations using matplotlib.
Extracted valuable features through Feature Engineering using SciKit-learn.
Built the machine learning models such as Linear Regression and XGBoost using SciKit-learn.
Performed time-series analysis using ARIMA model to improve prediction accuracy and precision using statsmodels and SciPy.
Evaluated the model performance using Cross Validation, Confusion Matrix and RMSE.
Generated weekly and monthly reports while maintaining and manipulating data using JMP Pro 12.2, MS Excel 2015, SSRS and SSIS.
Reported findings to the management regularly and coordinated with business analysts for evaluations.
Used Git extensively throughout the course of the project.
Continuously coordinated with the development team for model deployment and maintenance.

Environment: Python 2.7, Horton Works, Hadoop, Hive, SQL Server 2012, Git, JMP Pro 12.2, MS Excel 2015, SSRS, SSIS.

Confidential

Business Intelligence Analyst

Responsibilities:

Analyzed the technical requirements with the business users and developers.
Extracted data from flat files and relational databases to load them in the central database after applying business logic.
Created jobs, workflows and data flows according to specifications and implemented their business logic.
Extensively used stored procedures, triggers and functions for implementing business rules and transformations.
Scheduled ETL jobs using data services management console.
Regularly carried out performance tuning of existing jobs.
Involved in monitoring the daily run of the scripts as well as troubleshooting in the event of any errors in the entire process.

Environment: OBIee 11.1.1.9, Oracle 11g, MS Office Suite.

Confidential

Data Analyst

Responsibilities:

Analyzed online user behavior through multi-channel attribution and funnel analysis.
Worked on segmentation analysis to explore different site variations.
Developed predictive models implementing Linear Regression and Naïve Bayes Using R’s e1071 and stats.
Generated graphs and reports using ggplot and plotly in RStudio for analyzing models.
Generated dashboards in MS Excel to report to the management regularly.
Used available data sources to deep dive and troubleshoot campaign performance issues.

Environment: R 3, SQL Server 2012, MS Office Suite.

Confidential

PL/SQL developer

Responsibilities:

Worked on requirements gathering, analysis, design, change management and deployment.
Created database objects like Tables, Procedures and Integrity Constraints.
Wrote queries using joins, sub queries and correlated sub queries to retrieve data from database.
Created indexes on the tables for faster retrieval of the data to enhance database performance.
Performed data cleaning, imputed missing values and made datasets ready for analysis.

Environment: Oracle 11g, PL/SQL, MS Office Suite.

We provide IT Staff Augmentation Services!

Data Scientist Resume

Youngstown, OH

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship