We provide IT Staff Augmentation Services!

Data Scientist Resume

2.00/5 (Submit Your Rating)

TX

SUMMARY

  • Over all 6+ years working experience as Data Scientist, Data Analyst, Business Intelligence (BI) with high proficiency in Big Data Analytics, Predictive Modeling, Text mining, and Machine Learning.
  • Experience in Analytics, Visualization, Data Modeling, Data Mining, Text Mining, Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, Data Export and Reporting.
  • Experienced Data Scientist with hands on experience in developing Machine Learning models.
  • Strong focus on R and Python Statistical Analysis with ML techniques in challenging environments.
  • Experienced in data analysis, designing experiments, interpreting data behavior, business decision support.
  • Expertise in quantitative analysis, data mining, and the presentation of data to see beyond the numbers and understand how users interact with core/business products
  • Experience in diverse set of procedures but not limited to methods such as machine learning, deep learning, Bayesian algorithm, Regressions, cluster analysis, decision trees, time series, resampling and regularization, NLP and other techniques.
  • Strong working experience with various python libraries such as NumPy, SciPy for mathematical calculations, Pandas for data preprocessing/wrangling, Mat plot, Seaborn for data visualization, Sklearn for machine learning, Theano, TensorFlow, Keras for Deep leaning and NLTK for NLP.
  • Experienced in SQL Queries and optimizing the queries in Oracle, SQL Server,MONGODB, Postgresqland Teradata.
  • Experience in using Tableau, creating dash boards and quality story telling.
  • Ability to provide wing - to-wing analytic support including pulling data, preparing analysis, interpreting data, making strategic recommendations and presenting to client/product teams.
  • Excellent visual representation of data and communicating analysis with Python, Tableau, to all levels of business users within the organization, automate analyses and build analytics data pipelines via SQL and python based ETL framework.
  • Experience with common Machine Learning Algorithms like Regression, Classification and Ensemble models.
  • Experience in analytics, visualization, meetings for business importance with clients, manage SLAs, modelling, reporting and providing actionable insights to managers and C-level executives.
  • Good knowledge in establishing classification and forecast models, automate processes, text mining, sentiment analysis, statistical models, risk analysis, platform integrations, optimization models, models to increase user experience, A/B testing using R, Python, tableau, etc.
  • Strong familiarity in Build the data models by extracting and stitching data from various sources, integrated systems with R to cater efficient data analysis.
  • Experience using machine learning models such as random forest, KNN, SVM, logistic regressions and used packages such as ggplot, dpylr, lm, e1071, rpart, random Forest, nnet, tree, PROC- (pca, dtree, corr, princomp, gplot, logistic, cluster), numpy, sci-kit learn, pandas, etc., in R, and python respectively.
  • Expertise in Marketing & Customer Analytics focused on Market basket analysis, Campaign measurement, Private brand strategy, Sales forecasting, Customer segmentation and lifetime value analyses, SKU rationalization and Marketing mix modeling.
  • Experience in Developed complex database objects like Stored Procedures, Functions, Packages and Triggers using SQL and PL/SQL.
  • Experience in installing, configuring and maintaining the databases like PostgreSQL, Oracle, Big Data HDFS systems.
  • Expertise in Model Development, Data Mining, Predictive Modeling, Descriptive Modelling Data Visualization, Data Clearing and Management, and Database Management.
  • Expertise in applying data mining techniques and optimization techniques and proficient in Machine Learning, Data/Text Mining, Statistical Analysis and Predictive Modeling.
  • Good communication, problem solving and interpersonal skills, versatile team player as well as independent contributor with adaptability and understanding of business processes.
  • Highly motivated team player with analytical, organizational and technical skills, unique ability to adapt quickly to challenges and changing environment.
  • Excellent interpersonal skills, proven team player with an analytical bent to problem solving and delivering under high stress environment.

TECHNICAL SKILLS

Programming languages: Python, R

Scripting languages: Python (Numpy, SciPy, Pandas, Matplotlib), R (ggplot, Caret, Weka)

Python, R libraries: Numpy, SciPy, Pandas, Scikit-learn, Matplotlib, Seaborn, ggplot2, caret, dplyrl, tidyr, NLP,, plyr

BI/Reporting Tools: Tableau - Tableau Desktop, Tableau Server

Databases: Oracle, PostgreSQL, MS SQL, SQL Server, Mongo DB

NLP/Machine Learning/Deep Learning: LDA (Latent Dirichlet Allocation), NLTK, Apache Open NLP, Sentiment Analysis

Operating Systems: Windows, Linux

Data Science Algorithms: Supervised Learning - Linear regression, Logistic regression, K-nearest neighbors, Bayesian Inference, Neural networks Unsupervised Learning - K-means, Principal components, Factor analysis, Survival models, Customer lifetime value analysis

BI Tools: Tableau, Power BI

PROFESSIONAL EXPERIENCE

Confidential, TX

Data Scientist

Responsibilities:

  • Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from RedShift.
  • Explored and analyzed the customer specific features by using Spark SQL.
  • Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.
  • Performed data imputation using Scikit-learn package in Python.
  • Participated in features engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn preprocessing.
  • Used Python 3.X (numpy, scipy, pandas, scikit-learn, seaborn) and Spark 2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.
  • Developed and implemented predictive models using machine learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, Random Forests, K-means clustering, KNN, PCA and regularization for data analysis.
  • Conducted analysis on assessing customer consuming behaviors and discover value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
  • Built regression models include: Lasso, Ridge, SVR, XGboost to predict Customer Life Time Value.
  • Built classification models include: Logistic Regression, SVM, Decision Tree, Random Forest to predict Customer Churn Rate.
  • Used F-Score, AUC/ROC, Confusion Matrix, MAE, RMSE to evaluate different Model performance.
  • Designed and implemented recommender systems which utilized Collaborative filtering techniques to recommend course for different customers and deployed to AWS EMR cluster.
  • Utilized natural language processing (NLP) techniques to Optimized Customer Satisfaction.
  • Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.

Environment: s: AWS RedShift, EC2, EMR, Hadoop Framework, S3, HDFS, Spark (Pyspark, MLlib, Spark SQL), Python 3.x (Scikit-Learn/Scipy/Numpy/Pandas/NLTK/Matplotlib/Seaborn), Tableau Desktop (9.x/10.x), Tableau Server (9.x/10.x), Machine Learning (Regressions, KNN, SVM, Decision Tree, Random Forest, XGboost, LightGBM, Collaborative filtering, Ensemble), NLP, Teradata, Git 2.x, Agile/SCRUM

Confidential, New Jersey

Data Scientist

Responsibilities:

  • Documented logical, physical, relational and dimensionaldatamodels. Designed theDataMarts in dimensionaldatamodeling using star and snowflake schemas.
  • Predicting results using linear regression model with an error of 1%, using the statistical analytical tools and algorithms and helping in integrating the results into their sales and operations tools.
  • Built Tableau dashboards that tracked the pre and post changes in customer behavior post campaign launch.
  • Modelling and exponential smoothening for multivariate time series data.
  • Developed segments using K-means, Gaussian mixture techniques.
  • Developed a machine learning system that predicted purchase probability Confidential a particular offer based on customer’s real time location data and past purchase behavior.
  • Wrote scripts in Python, R for regular expression (regex) project in Windows environment. Used clustering technique K-Means to identify outliers and to classify unlabeleddata.
  • Used various python libraries like Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn to develop the machine learning (ML) algorithms.
  • Implemented Normalization and Standardization preprocessing techniques for models like regression and KNN to reduce loss.
  • Explore and visualize thedatato get descriptive statistics and inferential statistics for better understanding the dataset.
  • Performed Grid search for a better choice of hyper parameters.
  • Performs analyses of structured and unstructureddatato solve multiple and/or complex business problems utilizing advanced statistical techniques and l analyses.
  • Build complex calculations using advanced functions (ATTR, DATEDIFF, STR, IFs, nested IFs, OR, NOT, AND, SUMIF, COUNTIF, LOOKUPS and QUICK TABLE calculations).
  • Wrote SQL and UNIX shell scripts and used putty to automate the execution of SQL scripts, manage them on server.
  • Worked withdatacompliance teams,datagovernance team to maintaindatamodels, Metadata,DataDictionaries; define source fields and its definitions.
  • Performed Source System Analysis, database design,datamodeling for the warehouse layer using MLDM concepts and package layer using Dimensional modeling.
  • Created tables, sequences, synonyms, joins, functions and operators in Various Databases.
  • PerformedDataAnalysis andDataProfiling and worked ondatatransformations anddataquality rules.
Environment: Windows, Linux, Bayes, Random Forests, K-means, Big Data-Hadoop, HDFS, Pig, Hive, R, Python- Pandas, NumPy, SciPy, Matplotlib, Scikit-learn, MapReduce, Time series analysis, SQL Server, DB2, Tableau.

Confidential, OH

Data Scientist

Responsibilities:

  • Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
  • Participated in all phases ofdatamining,datacollection,datacleaning, developing models, validation, and visualization and performed Gap analysis.
  • Developed Python modules for machine learning & predictive analytics on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN fordataanalysis.
  • Conducted studies, rapid plots and using advancedatamining and statistical modelling techniques to build solution that optimize the quality and performance ofdata.
  • Demonstrated experience in design and implementation of Statistical models, Predictive and descriptive models, enterprisedatamodel, metadata solution anddatalife cycle management in both RDBMS, BigData environments.
  • Analyzed largedatasets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Created SQL queries using complex joins, group by, having, sub queries, functions, and stored procedures using temporary tables, cursors, table type variables.
  • Wrote complex stored procedure and SQL queries. Created reports in Tableau on geo-based parameters.
  • Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of database.
  • Utilized Matplotllib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Coding new tables, views and modifications as well as Pl/PgSQL stored procedures, data types, triggers, constraints.
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to loaddatafrom source flat files and RDBMS tables to target tables.
Environment: Windows, Linux, Big Data Hadoop, Map Reduce Hive, Pig, Python, Scala, PostgreSQL, Tableau, SQL, PL/SQL, OLAP, OLTP, Spark, HBase, and Kafka.

Confidential

Data Scientist

Responsibilities:

  • Used Star Schema methodologies in building and designing the logical data model into Dimensional Models extensively.
  • Developed Star and Snowflake schemas based dimensional model to develop thedatawarehouse.
  • Designed Context Flow Diagrams, Structure Chart and ER- diagrams.
  • Worked on database features and objects such as partitioning, changedatacapture, indexes, views, indexed views to develop optimal physicaldatamode.
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to loaddatafrom source flat files and RDBMS tables to target tables.
  • Worked with SQL Server Integration Services in extractingdatafrom several source systems and transforming thedataand loading it into ODS.
  • Worked with SME's and other stakeholders to determine the requirements to identify Entities and Attributes to build Conceptual, Logical and PhysicaldataModels.
  • Worked with SQL, SQL PLUS, Oracle PL/SQL Stored Procedures, Triggers, SQL queries and loadingdataintoDataWarehouse/DataMarts.
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetchdatafrom legacy SQL Server database systems.
  • Created Logical and Physicaldatamodels with Star and Snowflake schema techniques using Erwin inDatawarehouse as well as inDataMart.
  • Performed data analysis and data profiling using complex SQL on various sources systems including Oracle.
  • Involved in Data Analysis,Data Validation,Data Cleansing,Data Verification and identifying data mismatch.
  • Designeddatamodel, analyzeddatafor online transactional processing (OLTP) and Online Analytical Processing (OLAP) systems.
  • Wrote and executed customized SQL code for ad-hoc reporting duties and other tools for routine.
  • Developed stored procedures and complex packages extensively using PL/SQL and shell programs.
  • Customized reports using PROC REPORT, PROC TABULATE and PROC.
Environment: Windows, Linux, Python, Tableau, Erwin, SQL Server 2005, PL/SQL, SQL, ETL, OLAP, OLTP, Oracle, DQ Analyzer.

We'd love your feedback!