Data Analyst Resume
St Louis, MO
SUMMARY
- Over all 8 Years of experience in building Data Science solutions using Machine Learning, Statistical Modeling, Data Mining, Natural Language Processing (NLP), and Data Visualization.
- Theoretical foundations and practical hands - on projects related to (i) supervised learning (linear and logistic regression, boosted decision trees, random forests, Support Vector Machines, neural networks, NLP), (ii) unsupervised learning (clustering, k-means, DBSCAN, Expectation Maximization), dimensionality reduction, recommender systems), (iii) probability & statistics, experiment analysis, principal component and factor analysis, confidence intervals, A/B testing, (iv) algorithms and data structures.
- Experience in building various machine learning models using algorithms such as Gradient Descent, KNN, and Ensembles such as Random Forest, AdaBoost, and Gradient Boosting Trees.
- Experience in Natural Language Processing (NLP) and Time Series Analysis and Forecasting using the ARIMA model in Python and R.
- Expert skills in using state-of-the-art Models, ResNet, VGG16, MobileNet for Transfer Learning, designing and building state-of-the-art feature detectors and object trackers, image classification, image processing, and computer vision techniques.
- Expert skills in image enhancements using OpenCV techniques and algorithms such as thresholding, Kalman filtering, watershed filtering, image smoothening, and blur detection.
- Experience using Spark and Amazon Machine Learning (AML) to build ML models.
- Expertized in bringing up the best models with training and tuning techniques using the frameworks PyTorch, keras and Tensorflow.
- Expert working within enterprise data warehouse environments platforms and working within distributed computing platforms such as Hadoop.
- Demonstrate ability to apply relevant techniques to drive business impact and help with Optimization, causal inference, and choice modeling.
- Implement statistical and machine learning models, large-scale, cloud-based data processing pipelines, and off-the-shelf solutions for test and evaluation; interpret data to assess algorithm performance.
- Network with business stakeholders to develop a pipeline of data science projects aligned with business strategies. Translate complex and ambiguous business problems into project charters identifying technical risks and project scope.
- Experienced in data visualization using Tableau, and PowerBI.
- Extensive hands-on experience in navigating complex relational datasets in both structured and semi-unstructured formats.
- Experienced in agile/iterative development process to drive timely and impactful data science deliverables.
- Experience in the software development environment, Agile, and code management/versioning (e.g. Git).
- Design, train and apply statistics, mathematical models, and machine learning techniques to create scalable solutions for predictive learning, forecasting, and optimization.
- Develop high-quality, secure code implementing models and algorithms as application programming interfaces or other service-oriented software implementations.
- Experience working with engineers in designing scalable data science flows and implementing them into production.
- Excellent communication and presentation skills and ability to explain technical concepts in simple terms to business stakeholders.
TECHNICAL SKILLS
Languages: Python3, R, JavaScript, C, Bash
Packages: Hadoop, Pandas, NumPy, Keras, TensorFlow, Tensorboard, pyTorch, Seaborn, sciPy, TextBlob, matplotlib, scikit-learn, scrapy, spaCy, OpenCV, NLTK, Jupyter Notebook
Web Technologies: HTML5, DHTML, XML, CSS3, Web Services
Big Data Technologies: SQL, pySpark, Hive, Sqoop
Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), PowerBI
Operating System: Windows, Linux
IDE: Visual Studio, Eclipse, PyCharm
PROFESSIONAL EXPERIENCE
Confidential, St.Louis MO
Data Analyst
Responsibilities:
- Implement different ML/DL algorithm to train and test models Some of these projects end goals are to enforce revenue growth and performance improvement for B2C clients
- Increase the accuracy of pre-built models with proper optimization.
- Developed NLP models for Topic Extraction, Sentiment Analysis. Used NLTK,spaCy,Keras library for NLP data processing and finding the patterns.
- Use of BigData tools such as Spark (pySpark), Hadoop for batch and real time processing. Good exposure to Kafka streaming system.
- Identifying the key value creation opportunities such as customer-facing activities and expanding client's portfolio of offerings.
- This involved extensive research on companies' profile, acquire domain knowledge, and analyze the existing data to discover different “values”.
- Train engineers and other employee of clients with minimum data-driven culture to collect and record important data.
- Extracted sensor data from IoT Sensor devices (for distance, acceleration) and perform data cleaning on different format of data for exploratory data analysis and creating different ML models to predict product life and other functions.
Confidential, Waukegan, IL
Data Analyst
Responsibilities:
- Designed, developed, and implemented new data processes to extract, clean, integrate, and conform disparate pipeline data sources from across the organization for loading into dimensional data models.
- Developed models to predict metal loss on the pipe, pipe rupture probability using different ML/DL algorithm and communicate with Integrity team for process improvement to prevent unexpected interruption in the oil & gas transportation service.
- Analyzed and presented the complex statistical relation between different features of pipeline among stakeholders using Advanced querying, visualization, and analytics tools.
- Implement Data Exploration to analyze patterns and to select features using SparkSQL and other PySpark libraries.
- Application of various machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
- Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks such as Caffe, Neon, etc.
- Conducted a hybrid of Hierarchical and K-means Cluster Analysis using IBM SPSS and identified meaningful segments of customers through a discovery approach.
- Creating Informatica mappings, mapplets, and reusable transformations
- Work with NLTK library to NLP data processing and finding the patterns. Categorized comments into positive and negative clusters from different social networking sites using Sentiment Analysis and Text Analytics.
- Analyze traffic patterns by calculating autocorrelation with different time lags.
- Use Principal Component Analysis in feature engineering to analyse high-dimensional data.
- Perform Multinomial Logistic Regression, Random Forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
- Implemented different models like Logistic Regression, Random Forest, and Gradient-Boost Trees to predict whether a given die will pass or fail the test.
- Perform data analysis by using Hive to retrieve the data from the Hadoop cluster, SQL to retrieve data from the Oracle database, and used ETL for data transformation.
- Use MLlib, Spark's Machine learning library to build and evaluate different models.
- Perform Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
- Develop MapReduce pipeline for feature extraction using Hive.
- Collect data needs and requirements by Interacting with the other departments.
- Implement Data Exploration to analyze patterns and to select features using SparkSQL and other PySpark libraries
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
Confidential, Bridgewater NJ
Data Analyst
Responsibilities:
- Developed model to predict the lead time for the products manufacturing. Performed detailed feature engineering, and optimization to improve the model prediction from 85% to 96%.
- Implement various statistical methods such as resampling, multicollinearity analysis, dimensional reduction to improve the model.
- Implemented different machine learning algorithm such as classification, regression, clustering using Python, SKLearn library, Statsmodels, Numpy, Pandas, Scipy, Tensorflow, Keras.
- Work in Data Requirement analysis for transforming data according to business requirements. Worked on data cleaning and reshaping, generated segmented subsets using NumPy and Pandas in Python.
- Perform several exploratory data analysis (EDA) using seaborn, matplotlib on dataset with product delivery, new and returning customers, sales, etc. and discuss strategy with managers to improve the sales revenue.
- Used NLP techniques to create a predictive model from unstructured data sources such as emails, customer satisfaction forms.
- Adopted common software engineering practice such as use of GitHub as version control, CICD for continuous code integration and deployment.
- Worked with large amounts of structured and unstructured data. Generated cost-benefit analysis to quantify the model implementation comparing with the former situation.
- Designed and Developed Scripts, Data Import/Export, Data Conversions and Data Cleansing.
- Worked on different data formats such as Flat files, SQL files, Databases, XML schema, CSV files.
- Involved in dimensional modeling, identifying the facts and dimensions. Developed SQL scripts for creating tables, Sequences, Triggers, views and materialized views.
- Write Python scripts for file transfer and file manipulation.
- Conduct data analyses of the relationship between customer and client, achieving a 15% more accurate prediction of performance than previous years using pySpark
Environment: Python, pySpark, HDFS, Hive, Spacy, Big Data, Git, NLP & Machine learning Algorithms
Confidential
Data Analyst
Responsibilities:
- Built the Decision System by creating the algorithm based on business data.
- Performed data ETL by collecting, exporting, merging and massaging data from multiple sources
- Used Optimization Technique Simulated Annealing and Decision Tree ML concepts. Statistical concepts were widely used including Central Limit Theorem, Probability Concept, Probability Distribution.
- Built Decision Trees in Python to represent segmentation of data & identify key variables to be used in predictive modeling
- Evaluated models using Cross Validation and Confusion Matrix.
- Addressed overfitting by implementing cross validation methods like K-Fold.
- Assisted the business with claims analysis by building Instant Decision Rules with the help of A/B Testing in the Claims Adjudication System
- Managed project planning and deliverables for several projects focused on Advanced Analytics, Big Data and Digital Analytics streams
- Provided support in solution development for data science, advanced analytics and digital analytics projects
- Implemented machine learning techniques and interpreted statistical results which are ready- consumption for senior management and clients
- Provided pre-sales support to team for RFP’s, RFI’s and client presentations
- Played a vital role in claims analysis, assisting the business in decision making for Claims Processing
- Generated reports in case of Decline Claims using PowerBI
Confidential
Data Analyst
Responsibilities:
- Responsible for collection of data from different sources and cleaning Them.
- Developed project workflows and executed them using the Agile methodologies.
- Used Alteryx to profile and analyze the source system data to identify the edge cases and quality issues along with communicating with the required audience.
- Actively participated in Requirement gathering and built data products to solve real-time business problems.
- Used Python programmable logic along with Alteryx to analyze the data and to build business decision-oriented reports.
- Developed Batch ETL Workflows using SQL Queries and Alteryx logic and target the refined data to a specific schema.
- Built Advanced Charts using Tableau Desktop like Box and Whisker plots, Bullet charts, Waterfall charts for measuring the Sales growth by different departments within the company.
- Implemented different data structure logic and transformations for loading the data into SQL.
- Applied differential Axis in Tableau to Visualize the results in different angles and report the results.
- Created and scripted SQL code to create and maintain custom database schema for multiple application that utilized multiple product tables.
- Applied reporting approaches, best practices, performance tuning and development.
- Actively worked on solving performance-issues and limit queries to the workbooks that when it connects to live database by using a data extract option in Power BI.
- Implemented appropriate load balancing using AWS.
- Participated regularly with various Audit Stakeholders and manage project timelines and milestones.
Environment: Alteryx, SQL, ORACLE PL/SQL, Power BI, Excel, MS Office