We provide IT Staff Augmentation Services!

Data Scientist Resume

4.00/5 (Submit Your Rating)

St Louis, MO

PROFESSIONAL SUMMARY:

  • Data Scientist with an experience of around 8+ years, working through healthcare and marketing industries, holding master’s degree in Information Science and Technology with a specialization in Data Science.
  • Highly skilled in machine learning, data analysis, data visualization and data science methodology.
  • Experience using technology to work efficiently with datasets such as scripting, data cleaning tools, statistical software packages.
  • Extensive working experience in data science projects, machine learning algorithms and tools like R, Python, SQL, SAS, Hadoop, Tableau and Power BI.
  • Experienced with machine learning algorithm such as logistic regression, random forest, Xgboost, KNN, SVM, neural network, linear regression, and k - means.
  • Proficiency in developing story boards and advanced visualizations using Tableau, Python and Power BI.
  • Enough practical knowledge in performing Data Analysis process using Python like Importing datasets, Data wrangling, Exploratory Data Analysis, Model development and Model Evaluation.
  • Enough experience in agile methodology and ability to manage all phases of SDLC ranging from requirement analysis, design, development, testing to deployment of a data science project.
  • Analyzed pre-existing predictive model for predicting the conversion rate of customers from retail to mail developed by Advanced Analytics team and re-built predictive model using machine learning algorithms by considering factors that better influenced the conversion rate. Increase in the conversion rate is beneficial for both customers and company.
  • Skilled in implementing natural language processing and neural-networks through libraries like PyTorch, Keras.
  • Improved the accuracy of the predictive models from 65% to 86% using support vector machine algorithm. Proficient in R and Python scripting Language, Data extraction, Data cleaning, Data Loading, Data Transformation, Predictive Modeling using R, Python (SciKit-learn, Pandas, Numpy) and Data visualization using Tableau.
  • Worked extensively on Union, joining tables, multiple data connections using blending, worked on Data Extracts and managing queries.
  • Skilled in implementing machine learning techniques like Regression, Classification, Clustering and Recommender systems including Random forest, decision trees, Support Vector Machines, K means clustering using packages of Python and R studio.
  • Strong experience in ETL(Informatica) data warehousing and implementing all phases of SDLC which includes requirement gap analysis, design, Datawarehouse implementation, development, testing, deployment and production support maintenance. Practical understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.
  • Highly qualified in using Informatica Power Center designer tools and Workflow manager tools.
  • Proficient with RDBMS like Oracle 10g, Toad Oracle and SQL developer.
  • Extensive working experience in developing mappings in Informatica, tuning them to achieve optimal performance and migrate objects in all environments including DEV, QA testing and PROD.
  • Possess hands on experience of creating UNIX shell scripts required to control the ETL flow and implementing complex ETL logic.
  • Strong background knowledge in Data mining process using SAS e-miner and extracting data from twitter using twitteR, OAuth packages in RStudio.

SKILL:

Programming Languages: R, Python

RDBMS: Teradata, Oracle 11g, SQL*Plus, MS Access, SQL developer

Frameworks: Hadoop Ecosystem Machine Learning algorithms Linear Regression, Logistic Regression, Decision TreesSupport Vector Machines, Random Forest, K Means Clustering, Trees, Bayes Model, SVM, Ensemble Methods, Neural Networks, RNN, CNN, Ensemble SVM, Majority voting, Linear models, Classification, Regression, Logistic Regression, Clustering, Kernel methods, Dimension reduction

Tools: RStudio, Jupyter notebooks, Informatica Power Center, Teradata, SQL, Toad for Oracle, PowerBI, Tableau, UNIX, Shell scripting, SAS e-miner, Microsoft Azure MLMicroStrategy, HP quality center, MS Visio, Erwin Data Modeler, Forecast, gdata, ggplot2, ggmap, Mass, randomForest, stats, spatiall, dplyr, jsonlite, plyr, curl

WORKING EXPERIENCE:

Confidential, St. Louis, MO

Data Scientist

Responsibilities:

  • Analyzed pre-existing predictive model developed by advanced analytics team and factors considered during model development.
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation and visualization.
  • Analyzed metadata and processed data to get better insights of the data.
  • Created initial data visualizations in tableau to provide basic insights of data to the project stakeholders. worked as an individual contributor for deliverables withminimal supervision to deliver solutions but also able to collaborate with other data scientists and analysts to design, implement and execute AIprojects
  • Application of various machine learning algorithms and statistical modeling like decision trees, regression models, clustering, SVM to identify Volume using scikit-learn package in Python.
  • Conducted regular communications with leaders of other teams to get better understanding of the data at a deeper level.
  • Analyzed dataset of 14M record count and reduced it to 1.3M by filtering out rows with duplicate customer IDs and removed outliers using boxplots and univariate algorithms.
  • Performed extensive exploratory data analysis using Teradata to improve the quality of the dataset and developed Machine Learning algorithms using Python for predicting the model quality and created Data Visualizations using Tableau.
  • Developed visualizations using R packages like ggplot2, choroplethr to identify patterns and trends in the preprocessed data. Data wrangling and profiling and data prep for analytics and data science
  • Machine learning models, pipelines, and deployment with Data visualization and storytelling presentations to partners and stakeholders
  • Built propensity model using advanced machine learning algorithms in RStudio using caret package.
  • Performed parameter tuning procedures to achieve optimal performance of the model.
  • Worked on Machine learning algorithms like logistic regression, Decision trees, Support Vector Machine and Random forest to achieve best accuracy for the propensity model.
  • Extensive working experience in RStudio packages and Python libraries like SciKit-Learn to improve the model accuracy from 65% to 86%.
  • Strong practical experience in various Python libraries like Pandas, One dimensional NumPy and Two dimensional NumPy.
  • Strong experience with using PyTorch library and implementing natural language processing.
  • Developed data visualizations in Tableau to display day to day accuracy of the model with newly incoming data.
  • Hold a point-of-view on the strengths and limitations of statistical models and analyses in various business contexts and is able to evaluate and effectively communicate the uncertainty in the results.
  • Used Keras library to build and train deep learning models and fetched good results.
  • Published Power BI Reports in the required origination's and Made Power BI Dashboards available in Web clients.
  • Propensity model developed that was beneficial with a greater ROI compared to other models. Achieved 0.95 million dollars ROI per cycle with a cycle duration of one quarter year.
  • Identified factors to be considered for phase 2 development of the project and documented those findings with clear explanations.
  • Implemented complete data science project involving data acquisition, data wrangling, exploratory data analysis, model development and model evaluation.

Environment: Teradata, Advanced SQL, RStudio (ggplot2, choroplethr, dplyr, caret), Python (Pandas, NumPy), Machine Learning (Logistic Regression, Decision trees, SVM, Random forest), PyTorch, Keras, Tableau, Excel

Confidential, Minneapolis, MN

Data Scientist

Responsibilities:

  • Collaborate with business leaders for data initiatives, with focus on the use of data to optimize business KPIs such as revenue and circulation, along with the team of data professionals with specific focus on: Analytics & Insight, Data Engineering and Data Science.
  • Designed and developed Power BI graphical and visualization solutions with business requirement documents and plans for creating interactive dashboards.
  • Gathered usage reports of Microsoft applications (Word, Excel, SharePoint online, Teams) of all employees from Microsoft office portal in the form of excel sheets
  • Considered one-month period at a time and analyzed usage report data using RStudio packages ggplot2 to identify the patterns and trends of usage.
  • Extensively used PyTorch and Keras to build and train deep learning models.
  • Worked with the data science team to build and deploy machine learning based models to predict customer churn and optimize customer acquisition using Teradata, Oracle, SQL, BTEQ, and UNIX.
  • Created story boards in Tableau and PowerBI for each application usage report categorized country, region and state wise.
  • Created Macros, to generate reports daily, monthly basis and moving files from Test to Production.
  • Analyzed feedbacks from employees regarding Microsoft applications usage in their day to day tasks and built predictive models using machine learning algorithms to understand the main issues those are hindering usage of these apps by the employees.
  • Documented results obtained and supplied Digital fluency reports of each individual team to their respective team leads.
  • Suggested individual teams’ better practices of using these apps to improve their overall efficiency.
  • Created quick start guides and designed product pages for Microsoft applications in the company portal.
  • Maintained and managed Yammer groups of these applications and helped employees in getting started with novel applications of Microsoft.

Environment: SQL*Plus, Hadoop framework, RStudio (dplyr, caret packages), Python (NumPy, Pandas), Machine learning algorithms (Logistic Regression, Decision trees, SVM), SharePoint Online, PyTorch, Keras, Tableau, PowerBI, Excel, SQL, Hadoop ecosystem, Spark, Python, R, Excel, Tableau, teamwork in JIRA, Confluence and MS Office

Confidential

Data Analyst /Engineer

Responsibilities:

  • Responsible for reporting of findings that will use gathered metrics to infer and draw logical conclusions from past and future behavior.
  • Implemented and managed several ETL projects using Informatica PowerCenter by loading data from a variety of Data sources like flat files, JSON, XML files to Oracle for reporting. Source and target data were synced using Informatica and finally transformed data was stored in staging tables.
  • Performed Multinomial Logistic Regression, Random Forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
  • Used Principal Component Analysis & Factor Analysis in feature engineering to analyze high dimensional data in MatLab. Experience in A/B testing
  • Created reporting tables for comparing source and target data and report data discrepancies (mismatch, missing scenarios) found in the data. Experience in customer journey mapping and or journey analytics.
  • Performed validations not received in the requirement document from the customer end and learnt the SQL queries which helped to attend defect triage calls.
  • Results obtained from report mappings were displayed using MicroStrategy which is a better User Interface tool.
  • Implemented rule-based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
  • Created test plans for conducting unit testing of developed code.
  • Created deployment groups in QA environment and deployed workflows from DEV to QA environment.
  • Performed debugging of the code as per inputs given by IST (Integrated System Testing) team and deployed code into PROD environment after receiving approval from IST team.
  • Created and maintained complete documentation of project from beginning till end.
  • Performed internal enhancements of the jobs running in PROD environment.
  • Successfully maintained and managed all the jobs running in production environment by offering production support.
  • Extensive hands on experience of HP Quality Center tool used for performing production support activities.
  • Ability to design models and algorithms to getting hands on for coding complex areas
  • Strong communications skills to explain their modeling approach and reasoning to others ability to understand and vet business requirements and constraints and iterate models accordingly proficiency in regression, classification, and cluster analysis as well as proficiency in Python & SQL
  • Hands on deep learning model building experience and expert in understanding of Statistical Inference, Statistical Methods and experiment design and expertise with sk-learn, Tensorflow, CloudML Familiarity w/ NLP package (eg NLTK, Genism, etc.)

Environment: Informatica Power Center 9.1(Repository Manger, Designer, Workflow Monitor, Workflow Manager), Oracle 11g, Toad for Oracle, SQL, UNIX, Shell scripting, SQL*Plus, MS Visio, Erwin Data Modeler, MicroStrategy.

Confidential

Data Analyst

Responsibilities:

  • Communicated and coordinated with other departments to gather business requirements.
  • Gathering all the data that is required from multiple data sources and creating datasets that will be used in analysis. Experience in translating/mapping relational data models into Data Base Schemas.
  • Capable of facilitating data discovery sessions involving business subject matter experts.
  • Ability to create and maintain conceptual/business, logical and physical data models.
  • Performed Exploratory Data Analysis and Data Visualizations using R, and Python.
  • In Preprocessing phase, used Pandas and Scikit-Learn to remove or impute missing values, detect outliers, scale features, and applied feature selection (filtering) to eliminate irrelevant features.
  • Conducted Exploratory Data Analysis using Python Matplotlib and Seaborn to identify underlying patterns and correlation between features.
  • Used Python (NumPy, SciPy, Pandas, Scikit-Learn, Seaborn to develop variety of models and algorithms for analytic purposes.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.

We'd love your feedback!