- Data Scientist/Analyst with overall experience of 7+ years in Data Extraction, Data Modeling, Data Wrangling, Statistical Modeling, Data Mining, Machine Learning and Data Visualization.
- Hands on experienced with Machine Learning, Regression Analysis, Clustering, Boosting, Classification, Principal Component Analysis and Data Visualization Tools.
- Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.
- Hands - on experienced with NLP, mining of structured, semi-structured, and unstructured data.
- Experienced working with large, real world data inconsistent, incomplete, full of errors.
- Proficient in deep learning techniques of neural networks such as Tensorflow, Keras, NLTK
- Adept at Big Data related technologies including knowledge in Hadoop, Hive, Pig, Sqoop, Cassandra, Map-Reduce, MongoDB, Cassandra, HBase and Cloudera Manager for design of business intelligence applications.
- Hands on experience with commercial data mining tools such as Python, R, MapReduce, Yarn, Pig, Hive, Scala, HBase, HDFS, Sqoop, Spark, Scala .
- Strong programming skills with in-depth knowledge in a variety of languages such as Python, R, SAS and SQL for data cleaning, data visualization, risk analysis and predictive analytics.
- Worked on different data formats such as JSON, XML, CSV, TXT, XLS and performed machine learning algorithms in Python using python libraries such as pandas, numpy, Seaborn, scipy, matplotlib, scikit-learn, NLTK and R programming packages like ggplot, dplyr, randomforest, gbm,mlr, jsonlite.
- Extensive experience in Data Visualization including producing tables, graphs, Storytelling listings using various tools such as Tableau, MS Excel, Power BI and, Google Analytics.
- Expertise in Excel Macros, Pivot Tables and other advanced functions. Extensive experience on usage of ETL& Reporting tools like SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS).
Programming Languages: Python, R, SQL, T-SQL, PL/SQL, Pytorch, SAS
Machine Learning and Statistical Analysis Tools: Regression Models, Classification Models, Clustering, Visualization, Time Series, Market Basket Analysis, Dimensionality Reduction, Bootstrapping, Recommender Systems, Neural Networks using Tensor Flow, Keras, OpenCV, NLTK - LSA, LDA
Databases: MySQL 4.x/5.x, MongoDB, AWS S3, Microsoft SQL Server 2008,12,14,16, Oracle 10g, 11g, 12c, Teradata, HBase, Cassandra, MongoDB
Business Intelligence Tools: Tableau, Tableau server, Amazon Redshift, Microsoft Power BI
Cloud: Google Cloud, AWS, IBM Cloud
Reporting Tools: MS Office (Word/Excel/Power Point/ Visio/Outlook), Crystal reports XI, SSRS, Google Analytics.
IDE and OS: Spyder, RStudio, Jupyter, JupyterLab, Anaconda, Windows, LINUX, Macintosh HD
Confidential, Chicago, IL
- Conducted statistical analysis to determine the key factors for planning and conducting experiments to prove total fraud loss using prescriptive and predictive analytics.
- Worked on fraudulent transaction detection using Classification Prediction Algorithms historical data.
- Implemented supervised algorithms like Logistic Regression, Random Forest, Naïve Bayes, Linear SVM, KNN.
- Designed NLP algorithm to understand the behavior of the customers and various merchants.
- Implemented Ridge regression model, LASSO solver via cross validation and gradient descent to select the regularization parameter.
- Implemented GridSearchCV and Pipelining to find the best regularization parameter and model.
- Implemented several credit-risk models like deep learning algorithms - Neural Networks, Random Forests, Gradient Boosting for fraud risk management and fraud risk protection.
- Developed NLP algorithms for Text Mining to improving search relevance and for real-time decision-making algorithms.
- Data transformation from various resources, data organization, features extraction, feature engineering and feature preprocessing from data.
- Utilized big data analytical tools like Python, Scala, R, Hadoop, Python, Tableau, and Snowflake DB.
- Building self-learning AI that continually retrains to detect outlier transactions and behaviors using unsupervised learning techniques.
- Conducted exploratory data analysis using descriptive statistics & inferential statistics using Python, R and Data imputation using Scikit-learn and NLTK.
- Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.
- Performed data analysis by using Hive to retrieve the data from Hadoop cluster and loaded data to HDFS, SQL to retrieve data from Teradata database.
- Performed Exploratory Data analysis has been conducted using Numpy, Pandas, and Scipy in Python.
- The data visualization has been done using matplotlib and seaborn in Python and ggplot2 in R and created Dashboards using Tableau.
Environment: Python, R, SQL, Tableau Desktop, Spark, Hive, Jupyter Notebook, MS Excel, MapReduce, Unix, HDFS, Teradata, recommendation systems, python libraries (pandas, numpy, Seaborn, scipy, matplotlib, scikit-learn, NLTK), AWS, Redshift, S3, Snowflake DB, Github, Machine Learning Algorithms.
Confidential, Minneapolis, MN
- Developed deep understanding of the guest website experience, purchase behavior, platform and functionality usage of the customers.
- Developed MapReduce/Spark Python modules for predictive analytics & machine learning in Hadoop on AWS.
- Designed and developed large scale models using Logistic Regression, Random Forest, Naïve-Bayesian, Time-series models and NLP Models.
- Implemented Natural Language Processing algorithm (LDA, LSA) for text analytics and search relevance.
- Performed Data Modelling and Semantics Analysis of the keywords using Python and R .
- Performed data extraction from Oracle database using PL/SQL and PySpark.
- Performed scripting in Python, R, SQL, Scala, Spark and SAS for Statistical Data Analysis.
- Performed Scala scripts to run in Spark cluster.
- Worked with various data formats including JSON, XML, CSV from different data sources including Oracle, Teradata, and DB2 databases.
- Data Management of Relational databases like Oracle DB, Teradata DB, and DB2.
- Loaded the data from Oracle to HBase where necessary using Sqoop.
- Developed Hive queries for Analysis across different banners.
- Performed social media analytics by extracting data from Twitter using Python and Twitter API. Parsed JSON formatted twitter data and uploaded to the database and conducted Sentiment analysis to understand the customer behavior.
- Utilized Google analytics to understand the user traffic on the Confidential Website and prepared reports.
- Utilized recommender systems, collaborative filtering techniques to drive Confidential business priorities
- Monitored and analyzed session data to understand customer behavior and identified site issues that are adversely impacting conversion on the digital platform.
- Performed ad-hoc analysis to gain insight into differences of various guest segments.
- Created, maintained and customized events, hit attributes, dimensions, reports and dashboards in Tableau.
Environment: Python, R, PL/SQL, Oracle, Teradata, DB2, Ms Excel, Hive, Hadoop, MapReduce, PySpark, Scala, Java, AWS, HBase, Sqoop, NLP (LDA, LSA), Logistic Regression, Naïve Bayesian, Time Series.
Confidential, Phoenix, AZ
Sr. Data Analyst
- Gathered, analyzed, documented and translated application requirements into data models and supported standardization of documentation.
- Delivered and communicated research results, recommendations, opportunities to the managerial and executive teams, and implemented the techniques for priority projects.
- Designed Regression models to determine and forecast the Air Quality Index based on the historical data.
- Developed different prediction models such as Linear Regression, Decision Tree and Support Vector, choosing the best model based on the trade-off between accuracy and interpretation.
- Strong validation experience of data models by different measures such as RMSE, RSquared and Adjusted Rsquared values.
- Extracted historical and real-time data by using Sqoop, Pig, Flume, Hive, Map Reduce and HDFS.
- Wrote user defined functions (UDFs) in Hive to manipulate strings, dates and other data.
- Developed Oracle11g stored packages, functions, procedures and database triggers using PL/SQL for ETL process, data handling, logging, archiving.
- Performed data extraction from NoSQL database Mongo DB.
- Handled importing data from various data sources, performed transformations using MapReduce, Hive and loaded data into HDFS.
- Designed and documented Use Cases, Activity Diagrams, Sequence Diagrams, OOD (Object Oriented Design) using UML and Visio.
- Collaborated with Data Warehouse team on development and maintenance using Oracle SQL, SQL Loader, PL/SQL and Informatica Power Center.
- Designed, developed and maintained daily and monthly summary, trending, benchmark reports, user stories and dashboards in Tableau Desktop.
- Published workbooks and extract data sources to Tableau Server, implemented row-level security and scheduled automatic extract refresh.
Environment: Machine learning (Regressions, Random Forest, SVM), Linux, Python (Scikit-Learn/Scipy/Numpy/Pandas), R, Tableau, Hadoop, Map Reduce, HDFS, Hive, Pig, HBase, Sqoop, Flume, Oracle, Visio.
Confidential, Albuquerque, NM
IT Data Analyst Consultant
- Responsible for gathering requirements from business analysts and operational analysts and identifying the data sources required for the reports.
- Experienced in managing complex healthcare projects with diverse teams and developing data - driven and outcome-based strategies to improve business decision making and operational efficiencies.
- Performed data analysis using Python, R, SQL, SAS and MS Excel to Identify reporting/analytics improvement opportunities and provide proactive consultative strategic solutions.
- Developed predictive modeling algorithms (Linear regression models) for Ambulatory Quality Division to predict the quality standards and compared with the national benchmarks.
- Maintained database dictionaries, overall monitoring of standards and procedures, and integration of systems through database design.
- Utilized Big Data tools like Hadoop, Spark, no-SQL (Cassandra) database for analytical studies.
- Utilized Business Intelligence tools (Business Objects/BO, Cognos) for real-time metric tracking and reporting.
- Understanding of operations in the Health Care industry and a strong acumen of business processes, including operations, delivery models and revenue models.
- Used Python scripts automated the process of combining the large datasets and Data files like (SAS, CSV, Excel, JSON) and then converted for Data Analysis.
- Developed Python programs for manipulating the data read from Teradata data sources and convert them as CSV files.
- Worked on numerous ad-hoc data pulls for business analysis and monitoring.
- Designed and developed various monthly and quarterly business monitoring excel reports by writing SQL Queries in SSMS and using in MS Excel Vlookup, pivot tables and Macros.
- Performed verification and validation for the accuracy of data in the monthly/quarterly reports.
- Created multi-set tables and volatile tables using existing tables and collected statistics on the table to improve the performance.
- Extract up to date accounts data from SQL Server database by using SSMS.
- Developed the Interfaces in SQL, for data calculations and data manipulations.
- Used MS Excel, SSMS and SSRS for data pools and ad-hoc reports for business analysis
- Performed in-depth analysis in data & prepared weekly, biweekly, monthly reports by using R, SQL, MS Excel, and SSRS
- Experience in automation scripting using shell, Python, R, and SAS.
- Designed visualization dashboards using tableau desktop and publishing dashboards on tableau server and desktop reader.
- Developed programs with manipulating arrays using libraries like NumPy and Pandas in Python
Environment: SQL Server, SSMS, SSRS Python (NumPy, Pandas), R, SQL, SAS MS Excel, MS office, MS Excel, Agile, Windows, UNIX, Tableau Desktop.
Confidential, Dallas, TX
- Conduct thorough research into the airline industry, specifically including airports within a precise area of the route map to verify and analyze trends in consumer air traffic, fares, network, and travel behavior.
- Designed and automated multiple statistical models using Python and R combining up to 7 data sources to perform data analyses and reports.
- Involved in preparing to deliver daily, weekly, monthly, and annual statistical reports and analysis using SQL, Python, R, Alteryx, Excel Pivot Tables, V - lookups, Tableau, and graphical representation.
- Analyzed various statistical reports to develop qualitative and quantitative analyses and found different patterns to save resources on various aspects.
- Created Database Tables, Views, Functions, Procedures, Packages as well as Database Sequences, Triggers and database link.
- Written SQL queries for retrieving the required data from the Oracle database. Tested and Debugged PL/SQL packages.
- Developed complex SQL queries by performing various joins, grouping and sorting to fetch aircraft maintenance data later used for weekly and monthly quality audit reporting across multiple teams for reporting and analysis.
- Ensured reporting and creation of dashboards and Digitization of reports using SQL, Python, MS Excel, and Tableau.
- Automated reports to pull data from the database directly to excel and make the formatting and other changes to the report by creating macros.
- Created Tableau dashboards on trends and forecasts for personal injury, aircraft damage, safety, and operational data.
- Interact with business Customers to define, analyze, and deliver Customer requirements.
- Design and develop the sales standard reports for APJ regions and area across the client locations.
- Involved in the automation of various reports utilizing Python and R.
- Worked on VBA projects as an integration of Excel-PowerPoint and Excel-Access etc.
- Used charts, Pivots, Vlookups, macros and other complex excel function to automate and improve the productivity of the team.
Environment: Adhoc, VBA, SQL/Server, Oracle 9i, MS-Office, MS- Excel, Tableau, Alteryx, R, Python, SQL Macros.
Data Reporting Analyst
- Responsible for gathering requirements from Business Analysts and maintain financial and material control reports for management.
- Created, updated, and ensured accuracy and distribution of monthly management reports for various divisions of the organization.
- Create and maintain regular material control reports for management using SSRS.
- Performed statistical analysis and generated reports for deviations and changes in trends compared to historical data and forecast down to a profit center level using R.
- Imported and exported large amounts of data from files to Oracle Database and vice versa
- Maintained the database and created Stored Procedures for data extraction.
- Using Python programs automated the process of combining the large SAS datasets and Data files and then converting for Data Analysis.
- Developed reports with Custom SQL and views to support business requirements.
- Worked on Set, Multiset, Derived and Volatile Temporary tables.
- Extracted data from existing data source and performed ad-hoc queries to support regional and corporate management.
- Performance tuned and optimized various complex SQL queries for faster data manipulation.
- Automated reports and help others within the finance and accounting group to retrieve and manipulate data efficiently using MS- Excel macros and python.
- Utilized MS-Excel at maximum extent for creating, managing, reporting using Macros, pivot table, vlookup.
- Used Oracle and spreadsheets as data sources for designing Tableau Reports and Dashboards.
- Created dashboards and data visualizations using Action filters, Calculated fields, Sets, Groups, Parameters etc., in Tableau
- Used Alteryx for data mining, basic ETL operations and reporting.
- Designed Physical and Logical Data Models using Visio.
- Designed and developed Ad-hoc weekly, monthly Tableau reports as per business analyst, operation analyst, and project manager data requests.
Environment: Python, R, MS Excel, MS Word, Tableau, Visio, Oracle, SSRS