We provide IT Staff Augmentation Services!

Data Scientist Resume

Dallas, TX

SUMMARY:

  • Machine Learning and Big Data professional with strong technical expertise, business experience, and communication skills to drive high - impact business outcomes through data-driven innovations and decisions and having over four years of experience in all phases of diverse technology projects specializing in Data Science & Analytics, Data Quality, Azure Machine Learning, RPA Uipath Certified and Tableau.
  • Team builder with excellent communications, time & resource management & continuous client relationship development skills. Strong interest in Computer Vision and latest neural network architectures like Mask-RCNN, YOLOv3 using TensorFlow.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data.
  • Extensive experience in data science including collecting data, clean data, exploratory data analysis, used machine learning algorithms for developing predictive models and created Visualizations for making decisions.
  • Developed predictive models using Neural Networks, NLP, Random Forest, XGboost, SVM and Naive Bayes.
  • Built models using decision trees, segmentation, Regression and Clustering intelligent decision models to analyze customer response behaviors, interaction patterns and propensity.
  • Involved in Detecting Patterns with Unsupervised Learning like K-Means Clustering. Worked with several R packages including knitr, dplyr, SparkR, CausalInfer.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
  • Gathering all the data that is required from multiple data sources and creating datasets that will be used in analysis.
  • Performed advanced analytics team with many responsibilities, such as assisting clients with designing and implementing data analytical strategies for large claims and clinical data sets using predictive modeling, data mining, econometrics, and statistical methods. Expertise in drawing insights out of complex, often unstructured data.
  • Assessed information from a range of data stored in disparate systems, integrating data, and mining data to answer specific business questions as well as identifying unknown trends and relationships in data.
  • Interpreted data, analyze results using statistical techniques and provide ongoing reports.
  • Developed category segmentation using SQL which provides customizable view of market share and led to decreased labor cost by 50%. Worked with Data governance, Data quality, Data Lineage, Data Architect to design various models and processes.
  • Facilitated the design and implementation of the Azure chatbot framework.
  • Proficient in R and Python Scripting Language and libraries specifically Tensorflow and P yTorch, Data extraction, Data cleaning, Data Loading, Data Transformation, Predictive Modeling and Data Visualization using R.

DATA SCIENCE EXPERTISE:

Data and Quantitative Analysis * Decision Analytics * Predictive Modeling * Data-Driven Personalization * Big Data Queries and Interpretation * Data Mining and Visualization Tools * Convolutional Neural Network * Recurrent Neural Network * TensorFlow * PyTorch * Machine Learning Algorithms * Research, Reports and Forecasts * Computer Vision * Deep Learning.

TECHNICAL SKILLS:

Technical skills: Python, R, SAS, C++, Java, PL/SQL, Git, Linux.

Libraries expertise: NumPy, Panda, D3.js, jQuery, SciPy, Matplotlib, sklearn, TensorFlow, Keras, PyTorch, NLTK, OpenCV.

Tools: Tableau, Oracle, SSIS, SSMS, Azure Machine Learning, Power BI, Jupyter Notebooks, Excel, Talend.

Big Data platform: Cloudera Hadoop, MapReduce, Scala, Kafka, Spark, AWS EC2, Hive, NO SQL, PostgreSQL, MySQL, Microsoft SQL, Neo4J, PostgreSQL, TigerGraph, Cassandra, Solr Cloud, Spark, Hbase.

WORK EXPERIENCE:

Confidential, Dallas, TX

Data Scientist

Responsibilities:

  • Expert in building models like clustering, classification, sentiment analysis, time series. Validated them, back tested them and interpret a confusion matrix or a ROC curve.
  • Employed Natural Language Processing techniques such as Information Retrieval (IR) and Named Entity Recognition (NER) to extract Non - Public Information such SSN, TIN, customer name from unstructured text fields improving overall efficiency of data pipeline by 200 %.
  • Built XGBoost to predict customer potential dissatisfaction and delivered the model in full production. Created testable hypothesis and testing them to come up with new models/strategies to solve business problems and Implemented A/B testing to test the performance of the new models/strategies after implementation. Performed ongoing Model/Segmentation validation tests assessing the strength/stability/accuracy of the models.
  • Completed a tariff code multi-class classification model which will take a description of goods and output a code label. Conducted data preprocessing (solved imbalance, imputed missing values, etc) and trained it on AWS Sockeye Convolutional Neural Network Encoder-Decoder model
  • Compared and contrasted a variety of deep learning architectures designed for Intent Recognition. Developed approach to reframe SpaCy's default named entity recognition system to accommodate banking entities.
  • Developed intricate algorithms based on deep-dive statistical analysis and predictive data modeling that were used to reduce risk, limit exposure and improve profitability.
  • Analyzed and processed complex data sets using advanced querying using HIVE and Teradata, visualization using Tableau and analytics tools such as Python and SAS.
  • Performed spending and competitor analysis based on the transactions data using text mining to effectively capacitive customers in the offer generation process.
  • Created NLP chatbot models using regular techniques like rule-based models (regex), BOW (TF-IDF, n-gram, LDA); state-of-arts techniques word2vec, sent2Vec, doc2vec & Glove word embedding; CNN, RNN (LSTM & GRU) based architecture for user intent classification, named entity recognition, etc.
  • Develop Spark programs using Python API (PySpark) and Spark-SQL to import data from S3 into Spark Data-frames to perform transformations and actions on data in various file formats - JSON, Parquet, CSV.

Confidential, Jersey, NJ

Data Scientist

Responsibilities:

  • Involved in the design, model, validate and testing of multiple Machine Learning models against various data sets including behavioral data and deploy models in the backend.
  • Worked closely with marketing team to deliver actionable insights from huge volume of data, coming from different marketing campaigns and customer interaction matrices such as web portal usage, email campaign responses, public site interaction, and other customer specific parameters.
  • Implemented a Mask RCNN instance segmentation model to gauge high-level sentiments in economic trends (such as identifying a retail firm actual traffic levels), as well as human emotion.
  • Implemented deblurring algorithm for a single image using Deep Multi-scale Convolutional Neural Network and Created automated tooling for training CNN-based object detectors and classifiers-used for all productionable CNN’S.
  • Created a web-based R Shiny dashboard to visualize and evaluate performance of sensors, allowing client to detect outliers in various performance indicators as well as accurately identify particular sensors with anomalous readings.
  • Develop Spark programs using Python API (PySpark) and Spark-SQL to import data from S3 into Spark Data-frames to perform transformations and actions on data in various file formats - JSON, Parquet, CSV.
  • Integrated with the Machine learning (Regression models) for predicting the capacity by using the historic data. This made capacity planning easy for the Infrastructure teams.
  • Created sequence models (Bayesian Hidden Markov Models using Distributed TensorFlow / TensorFlow Probability and LSTM models using Keras) to predict the stock prices.
  • Sentiment Analysis on client companies using NLP, Embedding Layers, LSTM and Recurrent Neural Networks for Loan approvals.
  • Strong experience in Software Development Life Cycle (SDLC), SAS and in supervising internationally distributed teams of domain specific experts to meet product specifications and benchmarks within the deadlines given.
  • Used R( dplyr) to perform root cause analysis to identify process breakdowns within departments and provide information through use of various Data Visualizations (R Shiny) to find and communicate solutions to breakdown in the Client investments.
  • Worked on Data Infrastructure to perform ETL, requirement analysis on the tool to build data sets for operational analysis.
  • Used Hadoop, MapReduce, and HQL to access big data sources in the cloud for the purpose of cleaning, organizing, validating, and transforming raw data in preparation for data mining/ exploratory analysis.
  • Trained our proprietary NLP software to extract sentiment from text sources under different models.
  • Created tools in Python and R perform Web scraping, Data Mining, Sentiment Analysis, Machine Learning, Document Comparison, Data Compression, and other NLP analyses.
  • Testing and development of Convolutional Neural Networks for the purpose of document segmentation of non-imagery data.
  • Developed NLP models that can classify customer survey responses. Models that were developed were 85% accurate in classifying 160 categories (33% more accurate than previous efforts by other teams).
  • Implemented AI trading strategy (Machine Learning Factor Tracking, Neural Network Security Selection) which helps in Data Mining of Investor Behavior, Market.
  • Led to develop a deep learning recurrent neural network (RNN) and long-short term memory (LSTM) time series trading algorithm.
  • Used python pandas for sentiment signal analysis. Developed personalized products recommendation with Machine learning algorithms including Collaborative filtering and Gradient Boosting Tree, to better meet the needs of existing customers and acquire new customers.
  • Demonstrated and created NLP models and solutions to solve all kinds of problems related to text data (Machine translation, sentiment, document classification, NER, Word2Vec, LDA, etc).
  • Automated csv to chatbot friendly JSON transformation by writing NLP scripts to minimize development time by 20%.
  • Predicted the stock’s price severity using XGBoost and demonstrated insight into better ways to predict the client company stock prices.
  • Modeled a supervised algorithm (XGBoost) after applying principal component analysis (PCA) on the anonymous data.
  • Utilized a trained Decision Tree that will sort each customer into five potential buying groups to help business maximize profits.
  • Tested data with multiple Machine learning algorithms before settling on Decision Tree. (Random Forest had same accuracy with higher cost).
  • Evaluated and compared the predictive accuracy of logistic regression, decision trees, random forests, SVMs, principal component regression, and both stochastic and Xtreme gradient boosting models.
  • Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC).
  • Performed data ETL by collecting, exporting, merging and massaging data from multiple sources and platforms including SSIS (SQL Server Integration Services) in SQL Server. Designed different types of reports like drill down, drill through, sub reports, parameterized reports and cascading reports in SSRS and Power BI.
  • Created R Shiny Web Apps Leveraging Data from Various sources to deliver Statistical Insights with reactive Shiny Input widgets, Slider inputs and with interactive data Visualizations using RBokeh, Lattice, trelliscope etc.
  • Used Python to perform ANOVA test to analyze the differences among hotel clusters.
  • Documented the entire workflow and processes involved in the Financial Reporting Power BI Solution
  • Implemented application of various machine learning algorithms and statistical modeling like Decision Tree, Naive Bayes, Logistic Regression and Linear Regression using Python to determine the accuracy rate of each model.
  • Worked on UCI breast cancer data to predict if the data is malign or benign using TensorFlow, tensorflow graph, Eager execution and CNN. And achieved an accuracy of 90%.
  • Participated in Hump Back Whale Identification in Kaggle competition and achieved an accuracy of 77.83% and improved the RESNEXT architecture using Tensorflow.
  • Worked with cross-functional teams (including data engineer team) to extract data and rapidly execute from MongoDB through MongoDB connector for Hadoop.
  • Performed data cleaning and feature selection using MLlib package in PySpark.

Environment: R, Machine Learning, Teradata 14, Hadoop Map Reduce, Pyspark, Spark, R, Spark MLLib, Tableau, Informatica, SQL, Excel, Erwin, SAS, Scala NLP, SAS, Cassandra, Oracle, MongoDB, Cognos, SQL Server 2012, Teradata, DB2, T-SQL, PL/SQL, Flat Files, XML, and Tableau.

Confidential

Data Scientist

Responsibilities:

  • Involved in application of statistical prediction modeling, data models, machine learning classification techniques and/or econometric forecasting techniques.
  • Cleaned and organized big data sources using R.
  • Executed entire Data Science Life Cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, feature scaling, feature engineering and statistical modeling.
  • Applied statistical techniques, such as statistical sampling, hypothesis testing, regression, etc. to a variety of real-world business problems for clients using R and Python (depending on the client/project).
  • Implemented diagnostic reporting with R for benchmarking performance across models and final model selection.
  • Collected the past revenue data and conducted the Time series analysis using ARIMA model in R.
  • Created multiple workbooks, dashboards, and charts using calculated fields, quick table calculations, Custom hierarchies, sets & parameters to meet business needs using Tableau.
  • Delivered Interactive Visualizations/dashboards using ggplot and Tableau to present analysis outcomes in terms of patterns, anomalies and predictions.
  • Used SparkR Data frames, Spark-SQL, Spark MLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
  • Grid search and cross validation for different kernel, degree and gamma in SVM model.
  • Used eight-folder cross validation technique to choose parameters of SVM model. Implemented cross validation approach to tune parameters of KNN & SVM classifiers.
  • Created MDM, OLAP data architecture, analytical data marts, and cubes optimized for reporting.
  • Developed several advanced MapReduce programs in Java as part of functional requirements for Big Data.
  • Worked with different sources such as Oracle, Teradata, SQLServer2012 and Excel, Flat, ComplexFlat File, Cassandra, MongoDB, HBase, and COBOL files.
  • Performed K-means clustering, Multivariate analysis and Support Vector Machines in R.
  • Used Python, R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks of welfare dependency.
  • Led teams performing Data mining, Data Modeling, Data/Business Analytics, Data Visualization, Data Governance & Operations, and Business Intelligence (BI) Analysis and communicated insights and results to the stake holders.
  • Developed Multivariate Gaussian Anomaly Detection algorithm in Python to identify suspicious patterns in network traffic.
  • Applied decision tree analysis, using R to predict whether an email is spam. Implemented work flow of coding, testing, debugging, and documenting all project activities related to the development and maintenance of the R Shiny interface.

Environment: Python, PySpark, Tableau, MongoDB, Hadoop, SQL Server, SDLC, ETL, SSIS, Recommendation systems, Machine Learning Algorithms, text-mining process, A/B test.

Confidential

Data Scientist

Responsibilities:

  • Prepared comprehensive documented observations, analyses and interpretations of results including technical reports, summaries, protocols and quantitative analyses.
  • Worked closely with marketing team to deliver actionable insights from huge volume of data, coming from different marketing campaigns and customer interaction matrices such as web portal usage, email campaign responses, public site interaction, and other customer specific parameters.
  • Contributed to Finance and Risk management, Operations management, and Marketing to maximize ROI using Data Analytics.
  • Analyzed 5 million records using SQL to create reference tables to score incoming claims.
  • Created association rules using R to find common procedure groups accounting for 15% of total claims.
  • Designed, modelled, validated and tested statistical algorithms against various real-world data sets including behavioral data and deployed models in the backend.
  • Performed Data Transformation method for Normalizing variables.
  • Applied Business Objects best practices during development with a strong focus on reusability and better performance.
  • Analyzed various assignments that improve decision-making and generate financial reports using the latest analytics tools such as R, R shiny and visualization tools.
  • Co-ordinated with various business users, stakeholders and SME to get Functional expertise, design and business test scenarios review, UAT participation and validation of financial data.
  • Performed data transformation from various resources, data organization, features extraction from raw and stored.
  • Creating SSIS Packages by using different data Transformations like Derived Column, Lookup, Data Conversion, Conditional Split, Pivot, Union all and Execute SQL Task to load data into Database(T-SQL).
  • Designing and Deployment of Reports for the End-User requests using Web Interface & SSRS.
  • Used MLlib, Spark's Machine learning library to build and evaluate different models.
  • Implemented rule-based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
  • Experience in Spark Streaming using Scala from real time queue for streaming data and created RDDs in Spark using Spark Context and used Scala APIs to read multiple data formats.
  • Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python. built code for real time data ingestion using Java, MapR-Streams (Kafka) and STORM.
  • Automated Sqoop jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
  • Developed MapReduce pipeline for feature extraction using Hive.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
  • Communicated the results with operations team for taking best decisions.
  • Collected data needs and requirements by Interacting with the other departments.

Environment: Python 3.x, CDH5, HDFS, SAS, Hadoop, Hive, Impala, Linux, Spark, Tableau Desktop, SQL Server 2012, Microsoft Excel, Matlab, Spark SQL, PySpark.

Hire Now