Result - oriented professional with more than 9+ years of experience in Data Analytics, experience in data scrubbing to mine the data, Data Acquisition, Data Engineering to extract features utilizing Statistical Techniques, Exploratory Data Analysis with an inquisitive mind, build diverse Machine Learning Algorithms for developing Predictive Models and design Stunning Visualizations to help the growth of Business Profitability. Highly organized, motivated and skilled professional who is fascinated by the value created by analytics and looks forward to a career of constant learning and intellectual challenges.
- Outstanding preeminence in Data extraction, Data cleaning, Data Loading, Statistical Data Analysis, Exploratory Data Analysis, Data Wrangling, Predictive Modeling using R, Python.
- Good knowledge in exploratory data analysis using Tableau and Power BI.
- Efficient in performing the feature engineering and feature selection techniques in complex dataset to represent the features in predictive models for better accuracy.
- Highly accomplished in identifying and applying the befitting machine learning algorithms according to the problem statements.
- Professional working experience in Machine learning algorithms such as Linear Regression, Logistic Regression, Random Forests, XG-Boost, Decision Trees, K-Means Clustering, Hierarchical Clustering, PCA, NLP and good knowledge on Recommendation Systems.
- Experience with a variety of NLP methods for information extraction, topic modelling, parsing, and relationship extraction.
- Deep knowledge on Time series analysis, Time series forecasting, stationarity, autoregression integration moving average model (ARIMA).
- Experience in doing the hyperparameter optimization of machine learning algorithms.
- Skilled in using Apache Spark for extraction of data in the infrastructure to provide data summarization and Proficient in HiveQL, SparkSQL, PySpark.
- Good knowledge in using of Apache spark machine learning library MLlib.
- Good experience in extraction, transformation and loading data from heterogeneous data sources using data engineering tools such as Informatica & Apache Kafka.
- Hands on experience and in provisioning virtual clusters under Amazon Web Service (AWS) cloud which includes services like Elastic compute cloud (EC2) & S3.
- Experience writing complex SQL queries, procedures, triggers to obtain filtered data for various RDBMS such as PostgreSQL, Teradata, Oracle and SQL Server.
- Proficient in data mining algorithms to identify trends or patterns in complex data sets and report findings and make recommendations to company leadership.
- Strong Experience in building end to end analytic solutions by building data pipelines, data models and self-service prediction models.
- Experience in developing reports, dashboards & KPI’s for stakeholders in multiple areas of the business.
- Good experience of working in different visualization tools such as Obiee, Cognos, Tableau and Power BI.
- Progressively involved in accessing JIRA tool and other internal issue trackers for project development, SDLC, GIT, Agile methodology and SCRUM process.
Technologies: Data Warehousing (DW), Business Intelligence (BI), Big Data, SQL, Data Modeling, Statistical Analysis, Machine Learning, Natural Language processing (NLP), Deep Learning, Unix Shell Scripting
Machine Learning & Deep Learning: Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, Random Forest, Support Vector Machines (SVM), K-Means Clustering, K-Nearest Neighbors (KNN), Gradient Boosting Trees, Ada Boosting, LDA, Natural Language Processing, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN)
Languages: R, Python, SQL
Database: Oracle, Teradata, PostgreSQL
Data Modeling Tools: Erwin DM
Data Engineering Tools: Informatica, Apache Spark, Apache Kafka, SSIS, PL/SQL
Visualization Tools: Obiee, Cognos, Tableau, Power BI, R shiny
Cloud Technology: Amazon Web Services (AWS)
Version Control Tools: SVM, Git Hub
Work Methodology: Agile, Waterfall
Confidential, Phoenix, AZ
Sr. Data Analyst/Data Scientist
- Participated in all phases of Data Mining, Data Collection, Data-Cleaning, Developing-Models, Validation, Visualization and Performed Gap Analysis.
- Used Pandas, NumPy, Seaborn, SciPy, Matplotlib & Sci-kit-learn in Python for developing various Machine Learning Algorithms.
- Implemented various data pre-processing techniques to manipulate the unstructured, structured data, data imputations and data imbalance issues.
- Participated in features engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn pre-processing.
- Developed and implemented predictive models using machine learning algorithms such as Linear Regression, Classification, Multivariate Regression, Naive Bayes, Random Forest, K-means Clustering, KNN, PCA and Regularization.
- Architect and build a highly scalable NLP system for processing large-scale unstructured data
- Used Text Mining and NLP techniques find the sentiment about the organization. With RNN, we could classify it with great accuracy.
- Developed data pipeline using spark streaming from Kafka topics.
- Integrate Statistical Reports created in Python/R with Tableau to create dashboards and generate varied reports in accordance with the situational demands.
- Developed financial performances and revenue analysis dashboards for clients having 50k+ borrowers.
- Owned and led the development of the self-reporting feature which assisted in acquiring 2 new clients.
- Documented training and dashboard design best practices and trained clients with reporting feature.
- Created ad-hoc dashboards effectively using data blending, joins, actions, filters, calculations, sets, parameters, graphs, charts and maps.
- Worked with Advance analysis, Trend Lines, Statistics, and Dual Axis to generate insights from data and present findings to stakeholders and management to make data-driven decisions.
- Built complex SQL queries on large datasets for data manipulation and used pandas, NumPy and matplotlib to modify and maintain legacy reports.
- Worked with NLTK library for NLP data processing and finding the patterns.
- Applied word2Vec model to build word embedding for better performance and accuracy.
- Utilized Spark, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including Classifications, Regressions, Clustering, Dimensionally reduction etc.
- Used Spark Data frames, Spark-SQL, Spark MLlib extensively and developing and designing POC's using Python, Spark SQL and MLlib libraries.
- Developed Kafka producers and Spark streaming consumers to read the stream of events as per business models.
- Used Data Quality Validation techniques to validate Critical Data Elements (CDE) and identified various anomalies.
- Expert in Business Intelligence and Data Visualization tool: Tableau.
- Extensively used AWS services likes EC2 and S3 for implementing cloud-hosted solution for client.
- Interacted with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions.
Technologies: Python 3.x (Scikit-learn, SciPy NumPy, Pandas, NLTK, Matplotlib, Seaborn), Machine Learning (Logistic regression/ Random Forests/ KNN/ K-Means Clustering / Hierarchical Clustering/ Ensemble methods ),Deep Learning ( CNN,RNN, LSTM), NLP, Big Data, Apache Spark/PySpark/MLlib/Spark-SQL, Kafka, Zookeeper, AWS, PostgreSQL, Tableau, Agile
Confidential, Reston, VA
Sr. Data Analyst/Data Scientist
- Built a highly immersive Data Science pipeline involving data extraction, data cleansing, data manipulation, model building and Python Visualization.
- Worked with stakeholders throughout the organization to identify opportunities for leveraging company data to drive business solutions.
- Worked on different document formats such as DOC, DOCX, TXT, JPG, PNG, PDF. Read using Python and store the data in data frames using packages like Pandas, PyPDF2, python-docx, Pytesseract in Python.
- Setup storage and data analysis tools in Amazon Web Services cloud computing instances (AWS EC2) and setup storage buckets in the AWS S3 service.
- Utilized Python packages & Spark SQL for initial data visualizations and exploratory data analysis.
- Worked extensively in data cleaning techniques before the model building.
- Used predictive modeling to increase and optimize customer experiences, revenue generation, ad targeting and other business outcomes.
- Implemented application of various machine learning algorithms and statistical modeling like SVM, K-means classifier, K nearest neighbors, Decision Tree, Random Forest, Naive Bayes, Hierarchical classification, Logistic Regression and Linear Regression using Python to determine the accuracy rate of each model.
- Used RMSE score, Confusion matrix, ROC, Cross validation and A/B testing to evaluate model performance in both simulated environment and real world.
- Assessed the effectiveness and accuracy of new data sources and data gathering techniques
- Develop processes and tools to monitor and analyze model performance and data accuracy.
- Extensively worked on Data Modeling tools Erwin Data Modeler to design the Data Models.
- B uilt data visualization dashboard using Tableau & Power BI to showcase the outcome of predictive ML model.
- Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
- Applied Agile Methodology for building the project timelines .
Technologies: R (ggplot2, Dplyr, Lubridate, Knitr, shiny, Caret, Keras, MLR, fpp2), Python (Scikit-learn, NumPy, Pandas, Matplotlib, Seaborn), Machine Learning ( Supervised, & Unsupervised Algorithms), Big Data, Apache Spark( MLlib, SQL), Teradata, Tableau, Erwin, Agile
Sr. Data Analyst / Data Scientist
- Collaborated with a wide array of business groups to help drive insights creation and develop use case hypotheses.
- Collaborate with key business and technical stake holders in defining the architecture, strategic roadmaps and business technical solutions and align with Enterprise architecture goals.
- Provide architectural solutions and services for data intake, integration and enrichment covering data intake, integration, Meta data, data quality and data security.
- Leveraged knowledge in analytic and statistical algorithms to help customers explore methods to improve their business.
- Perform preliminary data analysis and handle anomalies such as missing, duplicates, outliers, and imputed irrelevant data.
- Used statistical analysis, simulations, predictive modelling to analyze information and develop practical solutions to business problems
- Draw inferences and conclusions, and create Dashboards and Visualizations of processed data, identify trends, anomalies.
- Performed Exploratory Data Analysis using R. Also involved in generating various graphs and charts for analyzing the data using Python Libraries.
- Expertise in using Linear & Logistic Regression and Classification Modeling, Decision-trees, Principal Component Analysis (PCA), Cluster and Segmentation analyses.
- Designed, implemented and maintained medium to large custom application data models.
- Supported cross-functional projects and make recommendations grounded in data research and rigorous modeling.
- Led analytic projects through all phases including concept formulation, data manipulation, research evaluation, and final research report Design.
- Performed ad hoc data mining and statistical analyses on complex problems.
- Researched and worked with technical teams to implement new and emerging technologies that will facilitate better data integrity, reliability, and enrichment for quantitative solutions.
- Designed and built end to end self-service analytical solutions using visualization tools such as Obiee & Cognos .
- Developed ad-hoc dashboards to perform year over year, quarter over quarter, YTD, QTD and MTD for sales analysis.
- Used Waterfall, Agile methodology and Scrum process for project developing.
Technologies: - R, Python, Machine Learning, Obiee, Cognos, Teradata, Waterfall, Agile.
- Assembled large and complex data sets from various heterogenous data sources into consumable formats that meet business requirements.
- Created complex mappings for fact and dimension tables using various transformations that involved implementation of Business Logic to load data from multiple Sources into Operational Data Store (Staging Area) and then load into Data Warehouse.
- Orchestrated smooth and effective flow of data to from staging layer till reporting layer.
- Monitored all business requirements and validate all designs and schedule all ETL processes and prepare documents for all data flow diagrams.
- Maintained and re-engineered existing ETLs to increase data accuracy, data stability, and pipeline performance.
- Designed and implemented functionality, participate in team code reviews, and provide feedback on performance, logic, standard methodologies and maintenance issues to ensure code-level consistency.
- Performed root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
- Worked with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
- Created Unix shell script to automated ETL pipeline jobs.
- Built SSAS OLAP Dimensions and Cubes and then automate the data refresh using SSIS package to make the data readiness for the reporting.
- Proactively tracked key metrics, identify trends that warrant deeper analysis, and advise decision makers of the business implications.
- Worked independently on end to end solutions, from data investigation to visual dashboards
- Worked in couple of data migration projects and performed gap analysis for 100% completion.
- Helped in reducing the billing discrepancy in one of the initiatives to 1% by converting all the legacy orders to new platform by data profiling and data cleaning which resulted in attaining a ROI of $1.5M.
Technologies: Oracle, Informatica, PL/SQL, SSIS, SSAS, SSRS, Waterfall.