Data Scientist Resume
Palo Alto, CA
SUMMARY:
- 9+ years of overall IT Experience with 3.10 years of experience in Data Science and Analytics including Machine Learning, Data Mining and Statistical Analysis and 5.2 years' experience ranging over Product Management, Product Development, Product Data Analytics and Business Analysis
- Hands on experience communicating business insights by dash boarding in Tableau. Developed automated tableau dashboards that helped evaluate and evolve existing user data strategies, which include user metrics, measurement frameworks, and methods to measurement. Also developed and deployed dashboards in Tableau and RShiny to identify trends and opportunities, surface actionable insights, and help teams set goals, forecasts and prioritization of initiatives
- Experience in architecting and building comprehensive analytical solutions in Marketing, Sales and Operations functions across Technology, Retail and Banking industries. Worked closely with functional team leaders (in Product, Operations, Marketing, etc.) to explain analysis, findings, and recommendations
- Experience in acquiring, merging, cleaning, analyzing and mining structured, semi - structured and unstructured data sets for analysis
- Strong track record of contributing to successful end-to-end analytic solutions (clarifying business objectives and hypotheses, communicating project deliverables and timelines, and informing action based on findings)
- Expertise writing production quality code in SQL, R, Python and Spark. Hands on experience building regression and classification models and other unsupervised learning algorithms with large datasets in distributed systems and resource constrained environments
- Expert knowledge in supervised and unsupervised learning algorithms such as Ensemble Methods (Random forests), Logistic Regression, Regularized Linear Regression, SVMs, Deep Neural Networks, Extreme Gradient Boosting, Decision Trees, KMeans, Gaussian Mixture Models, Hierarchical models, and time series models (ARIMA,GARCH, VARCH etc.)
- Involved in independent research and experimentation of new methodologies to discover insights, improvements for problems. Delivered findings and actionable results to management team through data visualization, presentation, or training sessions. Proactively involved in roadmap discussions, data science initiatives and the optimal approach to apply the underlying algorithms
- Experience building interpretable machine learning models, and building end to end data pipelines which included extracting, transforming and combine all incoming data with the goal of discovering hidden insight, with an eye to improve business processes, address business problems or result in cost savings
- Experience working with large data and metadata sources; interpret and communicate insights and findings from analysis and experiments to both technical and non-technical audiences in ad, service, and business.
- Experience in statistical programming languages like R and also Python including BigData technologies like Hadoop 2, HAVE, HDFS, MapReduce, and Spark and also experienced in Spark 2.1, Spark SQL and PySpark.
- Skilled in using dplyr and pandas in R and python for performing Exploratory data analysis and experience working with data modeling tools like Erwin, Power Designer and ERStudio.
- Experience with data analytics, data reporting, Ad-hoc reporting, Graphs, Scales, PivotTables and OLAP reporting.
- Experience in Data Modeling retaining concepts of RDBMS, Logical and Physical Data Modeling until 3NormalForm (3NF) and Multidimensional Data Modeling Schema (Star schema, Snow-Flake Modeling, Facts and dimensions). Hands on experience in optimizing the SQL Queries and database performance tuning in Oracle, SQL Server and Teradata databases
TECHNICAL SKILLS:
Exploratory Data Analysis: Univariate/Multivariate Outlier detection, Missing value imputation, Histograms/Density estimation, EDA in Tableau
Supervised Learning: Linear/Logistic Regression, Lasso, Ridge, Elastic Nets, Decision Trees, Ensemble Methods, Random Forests, Support Vector Machines, Gradient Boosting, XGB, Deep Neural Networks, Bayesian Learning
Unsupervised Learning: Principal Component Analysis, Association Rules, Factor Analysis, K-Means, Hierarchical Clustering, Gaussian Mixture Models, Market Basket Analysis, Collaborative Filtering and Low Rank Matrix Factorization
Feature Engineering: Stepwise, Recursive Feature Elimination, Relative Importance, Filter Methods, Wrapper Methods and Embedded Methods
Statistical Tests: T Test, Chi-Square tests, Stationarity tests, Auto Correlation tests, Normality tests, Residual diagnostics, Partial dependence plots and Anova
Sampling Methods: Bootstrap sampling methods and Stratified sampling
Model Tuning/Selection: Cross Validation, AUC, Precision/Recall, Walk Forward Estimation, AIC/BIC Criterions, Grid Search and Regularization
Time Series: ARIMA, Holt winters, Exponential smoothing, Bayesian structural time series caret, glmnet, forecast, xgboost, rpart, survival, arules, sqldf, dplyr, nloptr, lpSolve, ggplot
Python: pandas, numpy, scikit-learn, scipy, statsmodels, matplotlib, tensorflow
SAS: Forecast server, SAS Procedures and Data Steps
Spark: MLlib, GraphX
SQL: Subqueries, joins, DDL/DML statements
Databases/ETL/Query: Teradata, Netezza, SQL Server, Postgres and Hadoop (Map Reduce); SQL, Hive, Pig and Alteryx.
Visualization: Tableau, ggplot2 and RShiny
Prototyping: PowerPoint, RShiny and Tableau
Operating System: Windows, Linux, Unix, Macintosh HD, Red Hat.
WORK EXPERIENCE:
Confidential, Palo Alto, CA
Data Scientist
Responsibilities:
- Developing different Data Lakes (Oracle, Hadoop, Hive, and Vertica) and applying various Predictive Models, Forecasting methods and algorithms.
- SAAS (Software as a Service), Big Data implementation and business logic.
- Data Mining, text mining, recommendation system to Litmos product(Learning management system)
- Developed test framework for data validation at each stage of Data Lake.
- Developed Map Reduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
- Created ecosystem models (e.g. conceptual, logical, physical, canonical) that are required for supporting services within the enterprise data architecture (conceptual data model for defining the major subject areas used, ecosystem logical model for defining standard business meaning for entities and fields, and an ecosystem canonical model for defining the standard messages and formats to be used in data integration services throughout the ecosystem).
- Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
- Spearheaded chatbot development initiative to improve customer interaction with application.
- Developed the chatbot using api.ai.
- Automated csv to chatbot friendly Json transformation by writing NLP scripts to minimize development time by 20%.
- Conducted studies, rapid plots and using advance data mining and statistical modelling techniques to build a solution that optimize the quality and performance of data.
- Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Data environments.
- Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
- Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of the database.
- Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
- Worked on customer segmentation using an unsupervised learning technique - clustering.
- Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
- Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
Confidential, Dallas, TX
Data Scientist
Responsibilities:
- Performed data extraction, validation, summarizations insights related to such analytical projects from marketing database.
- Involved in assessing analytical projects to determine the required data elements to be utilized for desired result.
- Worked with external database host vendor as needed to complete analytical projects.
- Use SAS procedures like PROC FREQ, PROC MEANS, PROC SORT, PROC PRINT, PROC TABULATE and PROC REPORT in order to prepare summary data.
Confidential, Columbus, OH
Data Analyst
Responsibilities:
- Perform Data Profiling to learn about user behavior and merge data from multiple data sources.
- Implemented big data processing applications to collect, clean and normalization large volumes of open data using Hadoop ecosystems such as PIG, HIVE, and HBase.
- Designing and developing various machine learning frameworks using Python, R, and MATLAB.
- Integrate R into Micro Strategy to expose metrics determined by more sophisticated and detailed models than natively available in the tool.
- Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
- Solution architecting BIG Data solution for Projects & Proposal using Hadoop, Spark, ELK Stack, Kafka, Tensor flow.
- Correct minor data errors that prevent loading of EDI files
- Worked on Clustering and classification of data using machine learning algorithms. Used Tensor Flow machine learning to create sentimentally and time series analysis.
- Develop documents and dashboards of predictions in Micro Strategy and present it to the Business Intelligence team.
- Used Cloud Vision API integrate vision to detection features within applications, including image labeling, face and landmark detection, optical character recognition (OCR), and tagging of explicit content.
- Implemented Text mining to transposing words and phrases in unstructured data into numerical values
- Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Bigdata.
- As Architect delivered various complex OLAPdatabases/cubes, scorecards, dashboards, and reports. a utility in Python that used multiple packages (scipy, numpy, pandas)
- Implemented Classification using supervised algorithms like Logistic Regression, Decisiontrees, KNN, NaiveBayes.
- Gained knowledge about Open CV and learned to apply it to achieve the red color object identifying with the drone's camera.
- Used Teradata15 utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
- Handled importing data from various data sources, performed transformations using Hive, MapReduce, and load eddata into HDFS.
- Collaborate with data engineers to implement ETL process, write and optimized SQL queries to perform data extraction from Cloud and merging from Oracle 12c.
- Collect unstructured data from MongoDB 3.3 and completed data aggregation.
- Perform data integrity checks, data cleaning, exploratory analysis and feature engineer using R 3.4.0.
- Work with freight carriers to correct EDI issues as they arise
- Conducted analysis of assessing customer consuming behaviors and discover the value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
- Work on outlier’s identification with box-plot, K-means clustering using Pandas, NumPy.
Confidential, Auburn Hills, MI
Data Modeler/Data Analyst
Responsibilities:
- Analyzed large datasets to provide strategic direction to the company. Performed quantitative analysis of ad sales trends to recommend pricing decisions.
- Conducted cost and benefit analysis on new ideas. Scrutinized and tracked customer behavior to identify trends and unmet needs.
- Developed statistical models to forecast inventory and procurement cycles. Assisted in developing internal tools for data analysis.
- Designed scalable processes to collect, manipulate, present, and analyze large datasets in production ready environment, using Akamai's big data platform
- Achieved a broad spectrum of end results putting into action the ability to find, and interpret rich data sources, merge data sources together, ensure consistency of data-sets, create visualizations to aid in understanding data, build mathematical models using the data, present and communicate the data insights/findings to specialists and scientists in their team
- Implemented full lifecycle in Data Modeler/Data Analyst, Data warehouses and DataMart’s with Star Schemas, Snowflake Schemas, and SCD& Dimensional Modeling Erwin. Performed data mining on data using very complex SQL queries and discovered pattern and used extensive SQL for data profiling/analysis to provide guidance in building the data model.