- Around 8 years of experiences as a Data Analyst including in a senior capacity
- Formed predictive modeling using Machine Learning and Deep Learning algorithms
- Formulated Pattern Recognition algorithms and done data mining both with transactional data and big data (NoSQL)
- Assisted in architecting data warehouse, creating data pipelines, drafting artifacts such as xml, data dictionary, data tagging workbook
- Developed multiple predictive model using Bayesian, Naïve Bayes, SVM, Regression, Correlation, Decision Tree, PCA, MCA, Neural Networks, Random Forest, Ensemble Models, K - means clustering and others
- Created dashboards using Tableau , Qlik tools, Confidential Power BI and MS Office
- Done data mining (unstructured and structured data) and performed data quality assurance
- Worked closely with the clients to understand business requirements and transfer them from high level requirements to business requirements
- Used multiple analytical and visualization tools to extract and clean data, to form predictive models and to visualize using multiple visualization tools before clients
- User R Studio, Python IDE (Spider), Google Analytics, Pentaho, SAS tools as analytics
- Extracted and manipulated data both from transactional data bases (Oracle, SQL) and Hadoop based storage systems (such as Amazon S3, HDFS, Azure, Redshift)
- Transformed business resources and tasks into regularized data and analytical models, designed algorithms, mining and solutions across massive structured and unstructured data
- Involved in data analysis life cycle, including Data Acquisition, Data Cleansing, Data Manipulation, Feature Engineering, Modelling, Optimization, Testing and Deployment
- Solid experience in Deep Learning with Convolutional Neural Networks (CNN ), Recursive Neural Networks (RNN), normalization and others
- Excellent proficiency in model validation and optimization with Model selection, Parameter tuning and K-fold cross validation
- Used Non-relational database such as MongoDB and others
- Designed and developed PL SQL queries, ETL packages and business reports on data from SQL Server and Oracle
- Used tracking systems: JIRA/confluence, RTC, Power BI and version control tools: Github
Computer Skills: R/RStudio, Python, Java, SQL, SAS Base, basic proficiency in Scala, Matlab, C++, C, Hadoop, MapReduce
Tools: MS Azure ML, Cloudera, Spark, SAS EM, Eclipse, Erwin, IPython, SQL Server, Spring Framework, MySQL, Oracle, RedShift, Tableau, MS Excel, Qlik View, MS Power BI, MS Power point, Qlik tools, SAP, SPSS 24.0, SAS, Confidential Power BI and Business Objects
Environment: Hadoop, Spark, OLAP, DB2, Metadata, Scala, Python, Amazon S3, Kafka, CoreML, Automated Logistic regression models, Informatica 9.0, MongoDB, Redshift
Senior Data Analyst
- Built sentiment analysis models to score all customers interactions with the organization and a topic model that automated classification of text conversations.
- Implemented data mining and machine learning solutions to various business problems
- Applied LSTM networks to predict cost of healthcare services for individual customer
- Used Tableau to design dashboards and to do high level BI.
- Analyzed and extracted relevant information from large amounts of data to help automate for self-monitoring, self-diagnosing, self-correcting solutions and optimize key processes.
- Developed Logical Data Architecture with adherence to Enterprise Architecture.
- Built machine learning models using Regression, Boosting, CRFs, MRFs, other deep learning, Decision Tree, Random Forest, Naïve Bayes, Correlation and Cluster Analysis.
- Done coding in R, Python. Worked as Agile champion
- Done data mining in various data sources including but not limited to Oracle, MS SQL Server, DB2, ODS, and Hadoop based data storage systems
- Done data mining, Factor Analysis, ANOVA, Hypothetical testing, normal distribution and other advanced statistical techniques
- Excellent knowledge on Normalization (1NF, 2NF, 3NF and BCNF) and De-normalization techniques to improve database performance in OLTP, OLAP and Data Warehouse
Confidential, San Mateo, CA
Senior Data Analyst
- Analyzed use cases on sales promotion and potential customer heatmap to form predictive model and to explore business expansion opportunity and to prevent fraud.
- Strong subject matter expertise in Risk Analysis, Portfolio and Budgeting Analysis, Value Based Management and others
- Worked extensively with Confidential Excel (Macros, VLOOKUPS, and Pivot Tables), R Studio and Python scripting to code predictive models and data wrangling
- Used Tableau and Confidential Power BI to visualize analyzed data
- Assisted implementing enterprise data warehousing using Amazon Redshift and architected Master Data Management (MDM) system. Wrote shell scripting (LINUX)
- Done SQL Scripting to pull data from Redshift and Oracle based data bases
- Performed data quality assessment using custom-built assessment metrics
- Conducted Market Research, Feasibility Studies, Data Analyses, Data Mapping, Gap Analyses, Risk Identification, Risk Assessment, Risks Analyses, and Risk management.
- Used Crystal Reports, Oracle Discoverer, Query Analyzer and SQL Navigator.
- Done User Acceptance Testing (UAT) and documented Test Cases.
- Ability to manage (Multiple) project tasks with changing priorities and tight deadlines. Ability to work well in a team and a self-starter
- Done advanced SAS programming such as PROC SQL (JOIN/ UNION), PROC APPEND, PROC DATASETS, and PROC TRANSPOSE.
- Used Multi Variate Logistic Regression, Decision Tree (C4.5), Bayesian Network, FFNN and Random forest to draft and apply custom made predictive tools
- Assisted other teams in data analysis, data mining and data profiling as needed.
- Participated in the ETL migration ETI to DataStage and in the DBMS migration
- Assisted in creation of schema based on analytics need
- Assisted in normalization and denormalization of data
- Reviewed data models and reported requirements for Cognos Reports with the Data warehouse/ETL and Reporting team
- Wrote SQL and HQL queries (HIVE on Spark) to clean data and form tables
- Performed ETL using informatica
- Assisted in configuring Flume for auto steaming to data to Hadoop based storage system
- Worked both in Waterfall and Agile environment. Also wore the hat of a Scrum Master.
Confidential, Fremont, CA
Senior Data Analyst
- Extracted, transformed and cleaned data from various data sources such as SQL Server 2008 and 2012, MongoDB and Hadoop storage system (pilot for semi-structured and unstructured data sets such as .txt, JASON and .pdf)
- Utilized Natural Language Processing (NLP) to analyze customer feedbacks
- Used deep learning using Recurrent Neural Networks (LSTM RNNs)
- Predictions included but not limited to KPI (CPI/SPI) forecasting, sales projection, Engineering cost estimation and others
- Assisted in data migration from warehouses and streamed into Kafka feeding the Spark engine
- Performed transformations on loaded datasets using Python over the spark engine using both batch (SQL Loader) and streaming data (Hive as pilot implementation).
- Designed new Machine Learning techniques to replace existing process & increase efficiency
- Developed a Machine Learning test-bed with 24 different model learning and feature learning algorithms. Used Regression, Correlation, Bayesian Network, k-NN, and Decision trees
- Through systematic search, demonstrated performance surpassing the state-of-the-art
- Formulated models that give more accuracy over existing models
- Used SQL loader to pull data into Hadoop storage system