Data Scientist Resume
Youngstown, OH
SUMMARY:
- 10 years of experience in analyzing data and solving real world problems in various domains like Banking/Finance, Healthcare and E - Commerce.
- Working knowledge in managing entire data science life cycle with large datasets in an agile environment.
- Skilled in analytical and statistical programming languages such as Python, R and SQL.
- Experience in performing data scraping using Python (Beautiful Soup and Scrappy).
- Performed Exploratory Data Analysis and Feature Engineering using Numpy, Pandas and SciKit-learn in Python and dplyr, reshape2, tidyr in R.
- Experienced in translating business requirements to data science and technical requirements as well as providing analytical feedback.
- Created data models using machine learning algorithms like K-Means, KNN, Linear and Logistic Regression, SVM, Decision Trees, Random Forest and XGBoost.
- Strong skills in statistical methodologies such as A/B test, Hypothesis test and ANOVA.
- Performed model evaluations using Confusion Matrix, ROC Curve, RMSE and other metrics.
- Hands on experience in querying relation databases including SQL Server, MySQL and Oracle 11g.
- Working knowledge in Big Data Ecosystem including Hadoop 2.x (HDFS, MapReduce), Hive and Spark.
- Knowledge on NoSQL databases such as Cassandra, MongoDB and HBase.
- Adept in designing visualizations using Tableau, JMP Pro, D3.js, Python (matplotlib, Seaborn, Bokeh) and R (ggplot2, shiny, plotly) for publishing and presenting Dashboards and Automated Reports.
- Worked on PyCharm, Jupyter Notebook and RStudio IDEs.
- Worked with applications on AWS and Hortonworks cloud systems.
- Used Git repository for version control.
- Quick learner with strong interpersonal skills. Worked individually as well as a team member while collaborating with others in distributed environments.
TECHNICAL SKILLS:
Languages Data Analysis/Visualization: Python 3.0, R 3.3 Tableau, JMP Pro, D3.js JAVA, PL/SQL
Databases Big Data Tools: SQL Server, MySQL, Oracle 11g Hadoop 2.x, Map Reduce, Hive, Spark Cassandra, MongoDB, HBase
Packages Machine Learning Algorithms: Python - Pandas, Numpy, SciKit-learn Naïve Bayes, Decision trees, Linear and statsmodels, SciPy,Scrapy, Beautiful Soup Logistic Regression, Decision Tree, SVM Seaborn, Matplotlib; R- rpart, e1071, dplyr Random Forest, k-means, LDA, Bagging tidyr, reshape2, stats,caret, ggplot2, shiny Gradient Boosting, XGBoost, Time- Series
Cloud Tools Tools: Anaconda 3.0 (Jupyter notebook) Continuous Integration - Jira, Jenkins Amazon S3, Hortonworks ETL - OBIee, Control - GIT
PROFESSIONAL EXPERIENCE:
Confidential, Youngstown, OH
Data Scientist
Responsibilities:
- Retrieved millions of records of housing data from relational databases as well as external sources using PL/SQL and HiveQL.
- Aggregated and manipulated the batch data collected by a scheduled job to scrape data from necessary websites regularly.
- Worked on data cleaning by ensuring data quality and consistency using Pandas and
- Performed outlier identification with box-plots, studentized residuals and K-
- Explored the data through univariate and multivariate analysis to identify any underlying patterns and associations between the variables with SciPy and Seaborn.
- Performed feature engineering such as feature generation, feature normalization and label encoding using SciKit-learn.
- Created data models such as Decision Trees, Random Forest and XGBoost using SciKit-learn.
- Aided in Parameter Tuning the XGBoost model to improve performance and efficiency.
- Used F-Score, RMSE, and Confusion Matrix for evaluating model performances.
- Generated data visualizations using Tableau to report to the management regularly.
- Coordinated with business analysts and subject matter experts for requirements gathering.
- Conducted comprehensive analysis and evaluations of business needs and influenced decisions for different models.
- Collaborated with dev-ops teams for production deployment.
Environment: Python 3.0, Amazon S3, Oracle, HDFS, Spark, Hive, HBase, Sqoop, Tableau 9.0, D3.js, Jira, GIT.
Confidential, Westfield, OH
Data Analyst
Responsibility:
- Performed Predictive and Statistical Modeling along with the other aspects of data analytics techniques to collect, explore, and extract insights from data.
- Coordinated with data engineers in setting up HDFS by mining data from various internal sources.
- Merged and queried the data as a part of data extraction using SQL and HiveQL.
- Performed data aggregation and cleansing using Numpy and Pandas.
- Analyzed the various distributions and correlations using matplotlib.
- Extracted valuable features through Feature Engineering using SciKit-learn.
- Built the machine learning models such as Linear Regression and XGBoost using SciKit-learn.
- Performed time-series analysis using ARIMA model to improve prediction accuracy and precision using statsmodels and SciPy.
- Evaluated the model performance using Cross Validation, Confusion Matrix and RMSE.
- Generated weekly and monthly reports while maintaining and manipulating data using JMP Pro 12.2, MS Excel 2015, SSRS and SSIS.
- Reported findings to the management regularly and coordinated with business analysts for evaluations.
- Used Git extensively throughout the course of the project.
- Continuously coordinated with the development team for model deployment and maintenance.
Environment: Python 2.7, Horton Works, Hadoop, Hive, SQL Server 2012, Git, JMP Pro 12.2, MS Excel 2015, SSRS, SSIS.
Confidential
Business Intelligence Analyst
Responsibilities:
- Analyzed the technical requirements with the business users and developers.
- Extracted data from flat files and relational databases to load them in the central database after applying business logic.
- Created jobs, workflows and data flows according to specifications and implemented their business logic.
- Extensively used stored procedures, triggers and functions for implementing business rules and transformations.
- Scheduled ETL jobs using data services management console.
- Regularly carried out performance tuning of existing jobs.
- Involved in monitoring the daily run of the scripts as well as troubleshooting in the event of any errors in the entire process.
Environment: OBIee 11.1.1.9, Oracle 11g, MS Office Suite.
Confidential
Data Analyst
Responsibilities:
- Analyzed online user behavior through multi-channel attribution and funnel analysis.
- Worked on segmentation analysis to explore different site variations.
- Developed predictive models implementing Linear Regression and Naïve Bayes Using R’s e1071 and stats.
- Generated graphs and reports using ggplot and plotly in RStudio for analyzing models.
- Generated dashboards in MS Excel to report to the management regularly.
- Used available data sources to deep dive and troubleshoot campaign performance issues.
Environment: R 3, SQL Server 2012, MS Office Suite.
Confidential
PL/SQL developer
Responsibilities:
- Worked on requirements gathering, analysis, design, change management and deployment.
- Created database objects like Tables, Procedures and Integrity Constraints.
- Wrote queries using joins, sub queries and correlated sub queries to retrieve data from database.
- Created indexes on the tables for faster retrieval of the data to enhance database performance.
- Performed data cleaning, imputed missing values and made datasets ready for analysis.
Environment: Oracle 11g, PL/SQL, MS Office Suite.