We provide IT Staff Augmentation Services!

Data Scientist Resume

2.00/5 (Submit Your Rating)

Bend, OR

SUMMARY:

  • To work in pragmatic way in an organization where I can show my talent and enhance my skills to meet company goals and objective with full integrity and zest. Utilizing statistics & Machine learning, I’ve been able to identify, describe, propose, and implement solutions that resolve problems impacting the business. I prioritize learning new techniques and tools as the exciting field of data science is ever evolving
  • 7+ years of professional experience working as Data Scientist executing data - driven solutions to increase efficiency, accuracy and utility of internal data processing. Experienced in creating predictive models using various algorithms and analyzing data mining algorithms to deliver insights and implement action-oriented solutions to complex business problems.
  • Experience with data gathering, data quality, system architecture, coding best practices.
  • Responsible for understanding, preparing, processing and analyzing data to make the data valuable and useful for decision-making purposes.
  • Experience in preparing and analyzing data. This includes locating, extracting, profiling, cleansing, mapping, importing, transforming, validating or modeling.
  • Good Experience in using various Python libraries (Scikit-Learn, NumPy, SciPy, matplotlib, Pandas, Seaborn, MySQL dB for database connectivity)
  • Hands on experience with deploying Machine Learning Projects on Google Cloud Platform.
  • Knowledge on Big data tools including Hadoop, Map Reduce, PySpark and Big data Ecosystem.
  • Experience using statistical computer languages (R, Python, SQL, etc.) to manipulate data and draw insights from large data sets.
  • Hands on advanced SQL experience summarizing, transforming, segmenting, joining datasets
  • Knowledge of a variety of machine learning techniques (clustering, Linear Regression, Logistics Regression, Decision Trees, Random Forest, SVM, K-Nearest Neighbors) in Forecasting/Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, PCA, Ensembles.) and their real-world advantages/drawbacks.
  • Knowledge of advanced statistical techniques and concepts (regression, properties of distributions, statistical tests and proper usage, etc.) and experience with applications.
  • Strong analytical skills and detail oriented. Excellent analytical thought process ability to understand the question and devise an analytical approach to reach actionable answers.
  • Interprets insights and patterns from the data, results of analyses, identifies trends and issues, and develops recommendations to support business objectives.
  • Experience in communicating complex information through visualizations so that it is easy to understand to the business teams.
  • Ability to innovate, investigate and create solutions to business problems.

TECHNICAL SKILLS:

Big Data Technologies: Hive, PySpark, HDFS, Cassandra, SparkMlib

Supervised Learning: Linear/Logistic Regression, Lasso, Ridge, Elastic Nets, Decision Trees, Ensemble Methods, Random Forests, Support Vector Machines, Gradient Boosting, XGBoost, Deep Neural Networks, Bayesian Learning

Unsupervised Learning: Principal Component Analysis, Association Rules, Factor Analysis, K-Means, Hierarchical Clustering, Gaussian Mixture Models, Market Basket Analysis, Collaborative Filtering and Low Rank Matrix Factorization

Time Series: ARIMA, Exponential smoothing

R: caret, glmnet, forecast, xgboost, rpart, survival, arules, sqldf, dplyr, nloptr, lpSolve, ggplot

Python: Pandas, Numpy, Scikit-Learn, Scipy, Statsmodels, Matplotlib

SAS: SAS Procedures and Data Steps

Visualization: Tableau, ggplot2 and RShiny

Database: Teradata, SQL Server, Postgres and Hadoop (MapReduce)

Cloud Technologies: AWS, GCP

Big Data Tools: Hadoop, Hive, Bigdata Eco system

PROFESSIONAL EXPERIENCE:

Confidential, Bend, OR

Data Scientist

Responsibilities:

  • Structured the data collected by the representatives at the worksite locations. Fixed a lot of irregularities on the customer data such as make name, model name, year of the vehicles. Used Fuzzy Wuzzy technique to match the data provided by customers with the DMV data.
  • Built the customer churn model. Defined the churned and prepared the dataset by extracting data from database. Used various attributes such as Unsubscribed data, vehicle data, data regarding the visits and costs and other internal features. Used Classification models such as logistic regression and Random Forests. This helped the marketing team to identify the customers at the verge of churn and acted accordingly. Used PySpark for the feature engineering and transformations.
  • Responsible for design and development of advanced Python programs to prepare transform and harmonize data sets in preparation for modeling.
  • Performed Data Profiling to learn about customer behavior and merge data from multiple data sources.
  • Frequently met the business team and conveyed my insights through which they could make some changes in customer approach strategy.
  • Involved in the testing of Dataiku, an autonomous Machine Learning tool.
  • Created various poc models in Dataiku and verified them against the models built manually in Python.
  • Developed complex SQL and Hive Queries to extract the required data and created various dashboards for the business teams in Tableau.
  • Provide technical & requirement guidance to the team members.
  • Participated in Business meetings to understand the business needs & requirements.
  • Provide technical & requirement guidance to the team members.
  • Participated in Business meetings to understand the business needs & requirements.
  • Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine Learning applications, executed Machine Learning use cases under Spark ML and Mllib.
  • Designed several high-performance prediction models using various packages in Python like Pandas, Numpy, Seaborn, SciPy, Matplotlib, Scikit-learn, Pandas-data reader, and Stats models
  • Performed a POC in the Sentiment analysis of the feedback provided by the customers using (Natural Language Toolkit) NLTK

Environment: Python, Spark, NLTK, Tableau, Dataiku, Hive, SQL, Hadoop, MS Office Suite

Confidential, Dallas, TX

Data Scientist

RESPONSIBILITIES:

  • Participate in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users
  • Worked on Python Open stack API's, used Python scripts to update content in the database and manipulate files
  • Implemented machine learning schemes using Python libraries Scikit-learn and SciPy.
  • Worked on several python packages like Matpoltlib, Pillow, Numpy, sockets.
  • Worked on data transformation, data sourcing and mapping, Conversion and loading.
  • Used Pandas API to put the data as time series and tabular form for timestamp data manipulation and retrieval to handle time series data and do data manipulation.
  • Used Machine-learning techniques like unsupervised Classification, optimization and prediction.

Environment: Python, Spark, Tableau, SQL Server, Postgres, Hadoop, MS Office Suite

Confidential, Irvine, CA

Data Scientist

RESPONSIBILITIES:

  • Communicated and coordinated with end client for collecting data and performed ETL to define the uniform standard format. Queried and retrieved data from Oracle database servers to get the dataset.
  • In the preprocessing phase, used Pandas to remove or replace all the missing data and balanced the dataset with Over-sampling the minority label class and Under-sampling the majority label class.
  • Used PCA and other feature engineering, feature scaling, Scikit-learn preprocessing techniques to reduce the high dimensional data using entire patient visit history, proprietary comorbidity flags and comorbidity scoring from over 12 million EMR and claims data.
  • In data exploration stage used correlation analysis and graphical techniques in Matplotlib and Seaborn to get some insights about the patient admission and discharge data.
  • Experimented with predictive models including Logistic Regression, Support Vector Machine (SVM), Gradient Boosting and Random Forest using Python Scikit-learn to predict whether a patient might be readmitted.
  • Designed and implemented Cross-validation and statistical tests including ANOVA, Chi-square test to verify the models’ significance.
  • Implemented, tuned and tested the model on AWS with the best performing algorithm and parameters.
  • Set up data preprocessing pipeline to guarantee the consistency between the training data and new coming data.
  • Deployed the model AWS. Collected the feedback after deployment, retrained the model and tweaked the parameters to improve the performance. Designed, developed and maintained daily and monthly summary, trending and benchmark reports in Tableau Desktop.

Environment: Python, Spark, Tableau, SQL Server, Postgres, Hadoop, MS Office Suite.

Confidential

Data Analyst

Responsibilities:

  • Analyze the market data and reports to provide the valuable insights that would help the marketing team achieve their goals.
  • Used advanced Excel features like Pivot tables and Charts for generating Graphs.
  • Development of high-level data dictionary of database objects, data elements, data mappings and transformations for reference purpose.
  • Wrote complex SQL queries to extract and validate the data from the database.
  • Analyzed the data in the source system to map into the correct field and attribute in the target storage
  • Created jobs to schedule the data refresh for the Tableau reports and dashboards, from the local and cloud data sources, on periodic basis to enable the display of the most current data on the dashboards.
  • Extracted, summarized and compiled data for required external reporting, Including Weekly Reports for the Data Integrity.
  • Delivered reports and ad-hoc analysis focused in client behavior and profiling using SQL and Excel.
  • Maintain and update documentation on all processes and projects
  • Assist on new or existing projects as needed and ensure deadlines are met

Confidential

Data Analyst/BI Developer

Responsibilities:

  • Prepare weekly, monthly and quarterly Excel reports as directed by management.
  • Developed SQL queries to obtain complex data from tables in remote databases.
  • Analyzed and visualized visual marketing cash flow distribution by cities across the country, generated share maps, infographics, and reports on a weekly basis.
  • Responsible for creating ETL design specification document to load data from operational data store to data warehouse
  • Prevented miscommunication and highlight relevant trends, findings and opportunities by creating daily hand-off reports for cross functional teams.
  • Reviewing data and raise appropriate questions and highlighting significant findings.
  • Created various SSIS packages to populate the data from flat files, Excel and Access into SQL Server. Performed full loads for current data and incremental uploads for historical data (transaction-based data).
  • Created and configured SSIS package configurations using SQL server table and environment variables and loaded data from XML, Flat files and SQL Server incrementally.
  • Developed various types of reports drill down, drill through, matrix and sub reports using SSRS.

We'd love your feedback!