We provide IT Staff Augmentation Services!

Sr. Data Scientist/ Data Engineer Resume

San Antonio, TX


  • Around 13 years of IT industry experience, and more than 9 years of experience in Data Science, Data Analysis, Machine Learning, Deep Learning, Data mining with large data sets of Structured and Unstructured Data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • Leverage a wide range of data analysis, machine learning, and statistical modeling algorithms and methods to solve business problems.
  • Have expertise in Insurance, Banking, and Healthcare domains.
  • Experience working in Agile Scrum Software Development.
  • Good experience in Python, R programming on different platforms of Data Science and Machine Learning.
  • Experience in implementing Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
  • Knowledge in Cloud services such as Microsoft Azure and Amazon AWS.
  • Experience in transforming natural language data into useful features using NLP techniques to feed classification algorithms.
  • Good knowledge in Azure cloud services, Azure Storage to manage and configure the data.
  • Experience in Big Data Tools like Apache Spark, MapReduce, Hadoop, and HBase.
  • Good working experience on implementation of Dynamic programming in Machine learning as Reinforcement learning.
  • Knowledge in Cloud services such as Amazon AWS.
  • Hands - on experience in Regression, Classification, Clustering, and Segmentation use cases.
  • Professional working experience in Machine learning algorithms such as Linear Regression, Logistic Regression, Random Forests, Decision Trees, K-Means Clustering, and Association Rules.
  • Experience with multiple ML and Deep Learning frameworks - TensorFLow, Keras, PyTorch
  • Deep expertise with Statistical Analysis, Data mining, and Machine Learning Skills using Python and SQL.
  • Experience in Machine learning using NLP text classification using Python.
  • Proficient in managing the entire Data Science project life cycle and actively involved in all the phases of the project.
  • Experience in using various packages in Python and R like ggplot2, caret, dplyr, gmodels, tm, C50, twitter, NLP, Reshape2, rjson, plyr, pandas, NumPy, seaborn, scipy, matplotlib, scikit-learn, Beautiful Soup, Rpy2
  • Extensive experience in Text Analytics, generating Data Visualization using Python and R creating dashboards using tools like Tableau.
  • Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, ZOOKEEPER, SQOOP, Kafka, Storm.
  • Expertise in writing Hadoop Jobs to analyze data using MapReduce, Hive, Pig and Solr.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MRv1 and MRv2 (YARN).
  • Experience in analyzing data using Hive QL, Pig Latin, and custom MapReduce programs in Java.
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Strong understanding of deep learning algorithms such as CNN, LSTM, CTC, and workflows, in particular working with large-scale images and documents.
  • Strong working knowledge in all the phases of Project Life Cycle including Data Acquisition (sampling methods: SRS/ stratified/ cluster/ systematic/ multistage), Power Analysis, A/B testing, Hypothesis Testing, Data Cleaning, Data Imputation (outlier detection via chi square detection, residual analysis, PCA analysis, multivariate outlier detection), Data Transformation, Features Scaling, Features Engineering, Statistical Modeling.
  • Extensive experience with AWS services like S3, ELB, EBS, Auto-Scaling, Route53, Storefront, IAM, Cloud Watch, RDS, etc
  • Strong experience using the Microsoft Azure BI Stack (ADFv2, Azure SQL DB, Azure SQL Datawarehouse, Azure Data Lake, Azure Data bricks
  • Expertise in linear and nonlinear (logistic, linear, Naïve Bayes, Decision Trees, Random Forest, Neural Networks, SVM, Clustering, KNN), Dimensionality reduction using Principal Component Analysis (PCA) and Factor Analysis, testing and validation using ROC plot, K- fold cross validation, statistical significance testing, Data visualization.
  • Extensively worked on Spark Streaming application integration with Kafka.
  • Proficient in statistical programming languages like R and Python including Big Data technologies like Hadoop, Spark, and Hive.
  • Experience working with Big Data toolkits like Spark ML.
  • Experienced in Data Modeling techniques employing Data warehousing modeling concepts like star/snowflake schema and Extended Star.
  • Designed web crawlers using Beautiful soup.
  • Strong skills in Machine Learning algorithms such as Naive Bayes, Support Vector Machine, K-Nearest-Neighbours, Neural networks, Ensemble Methods.
  • Knowledge in NoSQL databases such as HBase, Cassandra, and Mongo DB etc
  • Expertise in Data Analysis, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export with multiple ETL tools such as Informatica Power Center.
  • Hands on Spark Mlib utilities such as including classification, regression, clustering, collaborative filtering, dimensionality reduction.
  • Expertise in Natural Language Processing (NLP) for response modeling and fraud detection.
  • Strong working experience in supporting client by developing Machine Learning Algorithms using Python programming and big data using PySpark to analyze transaction fraud, Cluster Analysis etc.
  • Expert in creating PL/SQL Schema objects like Packages, Procedures, Functions, Subprograms, Triggers, Views, Materialized Views, Indexes, Constraints, Sequences, Exception Handling, Dynamic SQL/Cursors, Native Compilation, Collection Types, Record Type, Object Type using SQL Developer.
  • Proficient in data visualization tools such as Tableau, Python Matplotlib, to create visually powerful and actionable interactive reports and dashboards. Excellent Tableau Developer, expertise in building, publishing customized interactive reports and dashboards with customized parameters and user-filters using Tableau.
  • Having Strong years of experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Good knowledge in Database Creation and maintenance of physical data models with Oracle, Teradata, Netezza, DB2 and SQL Server databases.
  • Hands-on experience in Data Governance, Data Mining, Data Analysis, Data Validation, Predictive modeling, Data Lineage, and Data Visualization in all the phases of the Data Science Life Cycle.
  • Used the version control tools like Git.
  • Working with large sets of complex datasets that include structured, semi-structured, and unstructured data and discover meaningful business insights.
  • Strong Data Analysis skills using business intelligence, SQL, and MS Office Tools.


Operating Systems: Windows, Linux

Data Integration Tools: Informatica PowerCenter

Databases: DB2, Oracle, SQL Server, Cassandra, MongoDB

Machine learning algorithms: Linear Regression, Logistic Regression, Random Forests, Decision Trees, K-Means Clustering, and Association Rules.

Languages: Python, SQL, COBOL, Pyspark

Web Technologies: HTML

Cloud Technologies: Amazon Web Services, MS Azure

Amazon Web Services: AWS services like S3, ELB, EBS, Auto-Scaling, Route53, Storefront, IAM, Cloud Watch, RDSMS Azure Services ADFv2, Azure SQL DB, Azure SQL Data warehouse, Azure Data Lake, Azure Data bricks Libraries (Python/ R) ggplot2, caret, dplyr, gmodels, tm, C50, twitter, NLP, Reshape2, rjson, plyr, pandas, NumPy, seaborn, scipy, matplotlib, scikit-learn, Beautiful Soup, Rpy2

Big Data/ Hadoop Ecosystem: HDFS, MapReduce, Pig, Hive, Kafka, Flume, Spark, Oozie, Zookeeper

BI & Visualization Tools: Tableau, Excel

IBM: SPSS, Microsoft Stats & Solver, Palisade software Tools

Modeling Tools: Erwin

Version Control Tools: SVN, GIT

DWH Modeling: Star Schema, snowflake schema

Collaboration Tools: Confluence, Jira

Project Execution Models: SDLC, Waterfall, Agile (Scrum)

Service Management Tools: Service Now


Confidential, San Antonio, TX

Sr. Data Scientist/ Data Engineer


  • Liaised with product owners, journey experts, business analysts, and users to finalize requirements in the form of epics and user stories and managed expectations.
  • Participated in all phases of data mining - data collection, data cleaning, data manipulation, developing models, validation, visualization and performed gap analysis.
  • Implemented solutions using Statistical Modeling, Data Analytics, Data Modeling, Data Architecture, Data Analysis, Data Mining, Text Mining & Natural Language Processing (NLP), Business Intelligence (BI), Analytics Models like Decision Trees, Linear & Logistic Regression, R, Python, Spark, MS Excel, and SQL.
  • Developed unsupervised machine learning models in the Hadoop/Hive environment on AWS EC2 instance.
  • Involved working in Data science using Python on conducting different experiments, campaigns on the PGC tool using A/B testing and collection of data from varied data sources.
  • Developed MapReduce/Spark Python modules for predictive analytics & machine learning in Hadoop/Big Data.
  • Used predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on to murals and Adobe Target/Analytics platforms.
  • Extracted patterns in the structured and unstructured data set and displayed them with interactive charts using ggplot2 and ggiraph packages in R.
  • Work with deployments, maintenance and troubleshooting applications on Microsoft Cloud Infrastructure Azure. Also with Automating, Configuring and Deploying Instances on Azure environments and in Data centres.
  • Developed PySpark for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in Python.
  • Responsible for developing data pipeline with AWS S3 to extract the data and store in HDFS and deploy implemented all machine learning models.
  • Involved in Deep learning and Artificial Neural Networks for NLP such as Recursive Neural Networks and Recurrent Neural Networks.
  • Documented logical, physical, relational and dimensional data models. Designed the Data Marts in dimensional data modeling using star and snowflake schemas.
  • Implemented Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on to AWS (S3/ EC2) and Python - Django platform
  • Extracted data from Azure SQL data warehouse in MS Azure Machine Learning designer.
  • Participated effectively in both sprint planning and sprint retrospection. Extracted the data from several homogenous and heterogeneous sources and standardized using the ETL framework.
  • Performed Sentiment analysis using Natural Language Processing (NLP) model on the email feedback and reviews of the customers to determine the emotional tone behind the series of words and gain the express of the attitudes and emotions by Long-Short Term Memory (LSTM) cells in Recurrent Neural Networks (RNN).
  • Developed and implemented predictive models using machine-learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, Random Forests, K-means clustering, and KNN.
  • Executed complex SQL queries to support ad-hoc requirements.
  • Trained algorithms and used to test for the word distribution, correlation value by passing it through the NLP (speech recognition and conversion) algorithm to detect for the possibility of fraudulent activities.
  • Collected, organized and analyzed customer data with attention to detail using SQL queries and Python.
  • Detected source data anomalies using SQL queries and improved the overall data load operations by 23%.
  • Worked with AWS Lambda server less data processing and worked on various other services like AWS Glue, AWS Kinesis.
  • Performed exploratory descriptive analysis of the customer data by using python.
  • Used Python to develop machine learning algorithms such as linear regression, multivariate regression, Naive Bayes, Random Forests, K-means etc. based on Unsupervised/Supervised Model
  • Leveraged imputation techniques to fill in missing data and removed outliers.
  • Analyzed Data and Performed Data Preparation by applying historical model on the data set in AZURE ML
  • Built logistic regression model to predict the probability of a customer, opening marketing emails.
  • Performed other models such as XGBoost, Random Forest, Naïve Bayes, etc. to predict the probability.
  • Eliminated the predictor variables using the stepwise regression process in feature selection.
  • Developed various Tableau Data Models by extracting and using the data from various sources files, DB2, Excel, Flat Files and Big data.
  • Worked with the ETL/ Informatica to document the transformation rules for data migration from OLTP to Warehouse environment for reporting purposes.
  • Evaluated the models using metrics such as accuracy, KS statistics, capture rate, and ROC curve.
  • Minimized the marketing cost from $ 15 million to $11 million.
  • Maximized marketing profitability from 13% to 23%.
  • Developed metrics and created dashboards to present actionable insights and trends against KPI using Advanced Excel and Tableau.
  • Developed data solutions to support strategic initiatives, improve internal processes, and assist with strategic decision-making and design SWOT Analysis

Environment: R, Python, AWS, Azure, NLP, Machine Learning, Deep Learning, Hadoop, Spark, Big Data, SQl, S3, EC2, MapReduce, ggplot2, graph, Numpy, Scipy, Pandas, Matplotlib, Django, Flask, Tableau, K-Means, RNN, KNN, OLTP, AWS Glue, AWS Kinesis, Azure ML, ETL, Informatica, HDFS, Hive, Sqoop, HBase, Kafka, Spark, Star Shema, Snow Flake Schema.

Confidential, San Francisco, CA

Data Scientist/ Data Engineer


  • Functioned as a data analyst and responsible for delivering innovative solutions that provide the business with essential information and analytics to make business decisions that enhance.
  • Liaised with business analysts and customers to gather requirements and managed expectations.
  • Extracted the data from different sources using SQL queries.
  • Performed descriptive statistics and visualization techniques to understand the data and discovered initial insights about the data.
  • Performed exploratory descriptive analysis by using python.
  • Responsible for designing and implementing End to End data pipeline using Big Data tools including HDFS, Hive, Sqoop, HBase, Kafka & Spark.
  • Leveraged imputation techniques to fill in missing data and removed outliers.
  • Developed Map Reduce/Spark Python modules for machine learning & predictive analytics in Hadoop on Amazon Web Services (AWS).
  • Implemented a Python-based distributed random forest via Python streaming.
  • Worked on various machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, neural networks, deep learning.
  • Used packages like Dplyr, tidyr & ggplot2 in R Studio for data visualization and generated scatter plot and high low graph to identify relation between different variables
  • Setup storage and data analysis tools in Amazon Web Services - AWS cloud computing infrastructure.
  • Used python to build machine learning algorithms by importing Sci-kit learn, SciPy, NumPy, Pandas modules to analyze the terabytes of data
  • Professional working experience in Machine learning algorithms such as Linear Regression, Logistic Regression, Random Forests, Decision Trees, Support vector machines, K-Nearest Neighbors, K-Means Clustering, Ensemble methods Neural Networks and NLP text classification using python.
  • Created data flow pipelines in Azure ML for automation
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
  • Visualized results in python using Matplotlib, Seaborn libraries of Scikit-learn and used Tableau to create the interactive dashboards to present results
  • Involved in designing star schema, Snow flake schema for Data Warehouse, ODS architecture.
  • Eliminated the predictor variables using the stepwise regression process in feature selection.
  • Used the analytics tools to provide insights and recommendations to help business benefit for the members and financials.
  • Implemented end to end process, exporting final outcome to Azure blob storage
  • Interpreted data, analyzed results using statistical techniques, and provided ongoing reports monthly.
  • Implemented the project modules with an agile framework.
  • Performed exploratory data analysis that includes univariate and multivariate analysis to identify patterns and associations among the variables.
  • Increased customer satisfaction by 10% by reducing the time to process the settlement of claims.
  • Experimented and built predictive models including ensemble models using machine learning algorithms such as Logistic regression, Decision trees, Random Forests, and KNN to predict fraud claims and implemented using Python.
  • Designed rich data visualizations to model data into human-readable form through Tableau and Matplotlib module from Python.

Environment: Windows, Linux, Python - Numpy, Scipy, Matplotlib, Scikit Learn, Beautiful Soup, R- ggplot2, caret, dplyr, gmodels, tm, C50, twitter, deep learning - CNN, LSTM, CTC, AWS- S3, ELB, EBS, Auto-Scaling, Route53, Storefront, IAM, Cloud Watch, RDS, Microsoft Azure - ADFv2, Azure SQL DB, Azure SQL Data warehouse, Azure Data Lake, Azure Data bricks, Apache Spark, MapReduce, Hadoop, HBase, R Studio, Informatica, NLP, HDFS, Hive, Sqoop, HBase, Kafka Spark.

Confidential, Chicago, IL

Data Scientist


  • Worked as Data Scientist in extraction data and preparing data according to business requirements.
  • Implemented complete data science project involving data acquisition, data wrangling, exploratory data analysis, model development and model evaluation.
  • Worked on data discovery, handling structured and unstructured data, cleaning and performing descriptive analysis, and preparing data sets.
  • Worked with commercial data mining tools such as R and Python depending on job requirements.
  • Created dynamic linear models to perform trend analysis on customer transactional data in Python.
  • Used K-Means cluster analysis to identify the opportunity for upgrade and lower the churn rate.
  • Performed K-means clustering, Multivariate analysis and Support Vector Machines in Python.
  • Created various types of data visualizations using Python libraries and Tableau.
  • Worked on different data formats such as JSON, XML and performed machine-learning algorithms in Python.
  • Performed Statistical analysis and leveraged appropriate Data Visualization techniques to deliver meaningful insights of the data.
  • Used Python, MATLAB to develop a variety of models and algorithms for analytic purposes.
  • Performed Data Profiling to learn about behavior with various features such as data pattern, location, date and Time.
  • Developed NLP methods that ingest large unstructured data sets, separate signal from noise, and provide personalized insights at the patient level that directly improve our analytics platform.
  • Performed Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
  • Used SAS for pre-processing data, SQL queries, data analysis, generating reports, graphics, and statistical analyses.
  • Worked on data profiling and data validation to ensure the accuracy of the data between the warehouse and source systems.
  • Designed & developed scalable ML pipeline using Spark and HDFS on AWS EMR.
  • Developed MapReduce pipeline for feature extraction using Hive.
  • Created quality scripts using SQL and Hive to validate successful data load and quality of the data.
  • Created several types of data visualizations using Python and Tableau.
  • Created SQL tables with referential integrity and developed advanced queries using stored procedures and functions using SQL server management studio.
  • Actively involved in Analysis, Development and Unit testing of the data and delivery assurance of the user story in Agile Environment.

Environment: Machine Learning, Deep Learning, Python, Spark, K-Learn, R, Tableau, Matlab, SAS, JSON, XML, SQL, Hive, Tableau, Agile, NLP, Matlab, HDFS, Spark, Hadoop, Windows.

Confidential, Chicago, IL

Data Analyst


  • Executed design, development, and maintenance of ongoing metrics and prepared reports & dashboards (data visualization) to drive key business decisions and communicate key concepts to higher management.
  • Collaborating with the project managers and business owners to understand their organizational processes and help design the necessary reports.
  • Developed various machine learning models such as Linear Regression, Logistic regression, KNN, ensemble methods such as bagging and boosting algorithms, deep learning neural networks with Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn, Theano, TensorFLow, Keras in Python.
  • Analyzed data and automated the root-cause analysis of the critical business problems, thereby increased productivity by 20%.
  • Collaborated with business and technology teams to build insightful analytical solutions.
  • Investigated data discrepancies, resolved, and articulated them to support the team’s ability to do effective data analysis.
  • Provided statistical analysis and recommendations through deep-insights based on membership data.
  • Presented findings, proposed solutions to provide the information to the customers, thereby increased customer satisfaction by 15%.
  • Identified and resolved data processing issues and put countermeasures in place to improve the quality and validity of data.
  • Performed effective data integration (using ETL) across different data sources to fed data into marts.


Data Integration Analyst


  • Extensively worked on ETL design & development, Unit Testing, System Testing, and User Acceptance Testing.
  • Performed effective integration across different data sources to fed data into marts.
  • Generated reports with metrics from data marts that help business in decision-making.
  • Analyzed the problem specifications and identified the areas of enhancements using impact analysis.
  • Liaised with business, functional experts, and support business ad-hoc queries.
  • Performed unit testing, system testing, system integration testing, and user acceptance testing.
  • Performed peer review and debug the technical specifications.
  • Improved the performance of existing & new data loads and reports refresh ultimately saved around a quarter million pound across 12 data warehouses/data marts.
  • Increased 35% application performance by partitioning session and database tables.
  • Increased ad-hoc business users’ requests response by 49% with the help of a knowledge base.
  • Increased business users’ attention by 45% in solving requests raised with complete root cause analysis.
  • Automated the business processes resulted in a 10% increase in response time and effective lean methodology implementation resulted in a 44% increase in team productivity.

Environment: Windows, ETL, Informatica, tableau, Linux, UAT, Report.


Module Leader


  • Analyzed and prepared the technical specifications based on the existing functionality and requirements.
  • Designed and developed the business specifications with added flexibility and deployed with zero defects.
  • Increased customer satisfaction by 20%.
  • Saved the savings expenses by $ 1.1 billion yearly.

Hire Now