- Over 8+ years of experience in IT and comprehensive industry knowledge on Machine Learning, Artificial Intelligence, Statistical Modeling, Data Analysis, Predictive Analysis,Data Manipulation, Data Mining, Data Visualization and Business Intelligence.
- Experience in transforming business requirements into actionable data models, working in a variety of industries Banking/ Financial, Healthcare, Pharmaceutical & Insurance domains.
- Experience in performingFeature Selection, Linear Regression, Logistic Regression, k - Means Clustering, Classification, Decision Tree, Supporting Vector Machines (SVM), Naive Bayes, K-Nearest Neighbors (KNN), Random Forest, and Gradient Descent, Neural Network algorithms to train and test the huge data sets.
- Adept in statistical programming languages like Python, R and SAS including Big Data technologies like Hadoop, Hive, HDFS, MapReduce and NoSQL Based Databases.
- Expertized in Python data extraction and data manipulation, and widely used python libraries like NumPy, Pandas, and Matplotlib for data analysis.
- Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide data summarization and Proficient in HiveQL, SparkSQL, PySpark. In depth knowledge in using of spark machine learning library MLlib.
- Hands on experience and in provisioning virtual clusters under Amazon Web Service (AWS) cloud which includes services like Elastic compute cloud (EC2), S3, and EMR.
- Proficient in designing and creating various Data Visualization Dashboards, worksheets and analytical reports to help users to identify critical KPIs and facilitate strategic planning in the organization utilizing Tableau Visualizations according to the end user requirements.
- Strong familiarity in workingwith various statistical concepts such as Hypothesis Testing, t-Test, and Chi - Square Test, ANOVA, Statistical Process Control, Control Charts, Descriptive Statistics and Correlation Techniques.
- Extensively worked on other machine learning libraries such as Seaborn, SciKit learn, SciPy for machine learningand familiar working with TensorFlow, NLTK for deep learning.
- Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutionsto various business problems and generating data visualizations using R, Python.
- Expert in creating PL/SQL Schema objects like Packages, Procedures, Functions, Subprograms, Triggers, Views, Materialized Views, Indexes, Constraints, Sequences, Exception Handling, Dynamic SQL/Cursors, Native Compilation, Collection Types, Record Type, Object Type using SQL Developer.
- Exposed to the manipulating large data sets, by using R Packages like tidyr, tidyverse, dplyr reshape, lubridate, Caret and data visualization using ggplot2 packages.
- Experience in working with BigData technologies such as Hadoop, MapReduce jobs, HDFS, Apache Spark, Hive, Pig, Sqoop, Flume, Kafka and familiar with Scala Programming.
- Good understanding of Teradata SQL Assistant, Teradata Administrator and data load/ export utilities like BTEQ, Fast Load, Multi Load, and Fast Export.
- Knowledge and experience working in Waterfall as well as Agile environments including the Scrum process and used Project Management tools like ProjectLibre, Jira/Confluence and version control tools such as GitHub/Git.
- Exposure towards Azure Data Lake and Azure Storage.
- Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio, SSAS, SSISand SSRS.
- Quick learner having strong business domain knowledge and can communication the business data insights easily with technical and nontechnical clients.
Languages: Python, R
Packages/libraries: Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, BeautifulSoup, MLlib, ggplot2, Rpy2,caret, dplyr, RWeka, gmodels, NLP, Reshape2, plyr.
Machine Learning: Linear Regression, Logistic Regression, Decision trees, Random forest, Association Rule Mining (Market Basket Analysis), Clustering (K-Means, Hierarchal), Gradient decent, SVM (Support Vector Machines), Deep Learning (CNN, RNN, ANN) using TensorFlow (Keras).
Statistical Tools: Time Series, Regression models, splines, confidence intervals, principal component analysis, Dimensionality Reduction, bootstrapping
Big Data: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka, Flume, Oozie, Spark
BI Tools: Tableau, Amazon Redshift
Data Modeling Tools: Erwin r, Rational Rose, ER/Studio, MS Visio, SAP Power designer
Databases: MySQL, SQL Server, Oracle, Hadoop/Hbase, Cassandra, DynamoDB, Azure Table Storage, Teradata.
Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, SSRS, IBM Cognos7.0/6.0.
Version Control Tools: SVM, GitHub
Operating Systems: Windows, Linux, Ubuntu
Jr. Data Scientist
- Developing, monitoring and maintenance of custom risk scorecards using advanced machine learning and statistical method. Recommending and implementing model changes with credit risk management team to improve performance of credit functions.
- Working on cleaning the data using exploratory data analysis (EDA) and python libraries (NumPy, Pandas) by replacing the missing values using imputation techniques.
- Training and testing data using various Machine Learning algorithms like Linear & Logistic Regression, Naïve Bayes, Decision Trees, Random Forests, Clustering, SVM, Neural Networks, Principle Component Analysis, and Bayesian.
- Working with Pandas, NumPy, SciPy, Matplotlib, Scikit-learn, and TensorFlow developing various machine learning algorithms.
- Working on AWS which includes Amazon Kinesis, Amazon Simple Storage Service (Amazon S3), Spark Streaming, PySpark and Spark SQL on top of an Amazon EMR cluster.
- Performing Map Reduce jobs in Hadoop and implemented Spark analysis using Python for performing machine learning & predictive analytics on AWS platform.
- Involving in creating data frames in Hadoop system, Spark using PySpark and then applying Hive/SQL queries into Spark transformations using Spark RDDs, Python libraries.
- Vastly implementing Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, NoSQL databases like MongoDB.
- Performing data preprocessing like cleaning (for outlier, missing values analysis, etc.) and Data Visualization (Scatter Plots, Box Plots, Histograms, etc.) using Matplotlib.
- Worked on large scale of data sets and extracted data from various database sources like Oracle, SQLServer, DB2, and Teradata.
- Utilized PySpark, Spark Streaming, MLlib, in Spark ecosystem with a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
- Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection and elastic technologies like ElasticSearch (EC2).
Environment: Python 3x,Cloudera,Hadoop, Apache Spark, Hive, NumPy, NLTK, Pandas, SciPy, Map Reduce, Tableau, Sqoop, HBase, Oozie, HDFS, PySpark, NoSQL, Tableau, Mongo DB, Teradata, SQL Server.
- Identified problems with customer data and developed cost effective models by the root cause analysis.
- Worked closely with health services researchers and business analysts to draw insight and intelligence from large administrative claims datasets, electronic medical records and various healthcare registry datasets.
- Developed and test hypotheses in support of research and product offerings, and communicate findings in a clear, precise, and actionable manner to our stakeholders.
- Implemented big data processing applications to collect, clean and normalization large volumes of open data using Hadoop ecosystems such as PIG, HIVE, and HBase.
- Worked with various data formats such as JSON, XML, performed machine learning techniques using python and R.
- Involved in collecting and analyzing the internal and external data, data entry error correction, and defined criteria for missing values.
- Developed Map Reduce jobs written in java, using Hive for data cleaning and preprocessing.
- Exported the data required information to RDBMS using Hadoop Sqoop to make the data available for the claims processing team to assist in processing a claim based on the data.
Environment: Python 2.x, Ski-Kit, R- Studio, ggplot2,XML, HDFS, Hadoop 2.3, Hive, Impala, Linux, Spark, SQL Server 2014, Microsoft Excel, MATLAB, Spark SQL, PySpark.
- Interacted with Business Analysts and Data Modelers for defined mapping and design documents for various data sources.
- Created visual analytics for large data using Tableau on Sales and marketing Data, to assure integrity, identifying the root cause of data inconsistencies.
- Worked on the dashboard reports for the Key Performance Indicators (KPI) for the topmanagement.
- Extensively worked in creating Aggregates, Hierarchies, Charts, Histograms, Filters, Quick filters, Cascading Filters, Table Calculations, Trend lines and calculates measures, LOD expressions Sets, Groups, Actions, Parameters etc.
- Worked with various data sources such as Excel, MySQL, Teradata, and Oracle databases by blending data in to a single sheet.
- Analyzed various reports, dashboards, scorecards in MicroStrategy and created the same using Tableau desktop server.
- Provided customer support to Tableau users and Wrote Custom SQL to support business requirements.
- Worked closely with reporting team for deploying Tableau reports and publishing them on the Tableau and Share point server.
- Worked closely with ETL team for various trouble shooting issues.
Environment: Tableau Desktop 9.x/10.0, Table Servers 9.x/10.0, Tableau Repository, SQL Server, MySQL, Crystal Reports, MS-Excel, Teradata, Share Point, Agile.