- Above 8+ years of working experience as a Senior Data Scientist/Data Analyst, Leverage a wide range of data analysis, machine learning and statistical modeling algorithms and methods to solve business problems.
- Deep expertise with Statistical Analysis, Data mining and Machine Learning Skills using R, Python and SQL.
- Proficient in managing entire data science project life cycle and actively involved in all the phases of project.
- Hands on experience Hadoop, Deep Learning Text Analytics and IBM Data Science work bench tools.
- Experience working in Agile Scrum Software Development.
- Knowledge in Cloud services such as Microsoft Azure and Amazon AWS.
- Knowledge about Big Data toolkits like Mahout, Spark ML, H2O.
- Professional working experience in Machine learning algorithms such as Linear Regression, Logistic Regression, Random Forests, Decision Trees, K - Means Clustering and Association Rules.
- Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
- Hands on experience in Data Governance, Data Mining, Data Analysis, Data Validation, Predictive modeling, Data Lineage and Data Visualization in all the phases of the Data Science Life Cycle.
- Extensively worked for data analysis using R Studio, SQL, Tableau and other BI tools.
- Working with large sets of complex datasets that include structured, semi-structured and unstructured data and discover meaningful business insights.
- Knowledge in NoSQL databases such as HBase, Cassandra, and Mongo DB etc.
- Highly skilled in using statistical analysis using R, SPSS, Matlab and Excel.
- Experience working with SAS Language for validating data and generating reports.
- Experience working with Web languages such as Html, CSS, Rshiny etc.
- Strong Data Analysis skills using business intelligence, SQL and MS Office Tools.
- Strong SQL Server programming skills, with experience in working with functions, packages and triggers.
- Experience in all phases of Data warehouse development from Requirements, analysis, design, development, testing and post production support.
- Working on different libraries related to Data science and Machine learning like scikit-learn, OpenCV, NumPy, SciPy, matplotlib, pandas, JSON, SQL, and Scala etc.
- Hands on Spark MLlib utilities such as including classification, regression, clustering, collaborative filtering, dimensionality reduction.
- Strong skills in Machine Learning algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Decision Tree, Random Forest, Support Vector Machine, K-Nearest-Neighbors, K-means Clustering, Neural networks, Ensemble Methods.
- Working experience in implementing Machine Learning Algorithms using MLLib and Mahout in Hadoop ecosystem and Apache Spark framework such as HDFS, Map Reduce, HiveQL, Spark SQL and PySpark.
- Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables and OLAP reporting.
- Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide Data Summarization.
- Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
Data Science tool: R 3.5.0, Python 3.6.5, MATLAB
Database: Oracle 12c, MS Access 2016, SQL Server 2017, Sybase and DB2, Teradata r15, Hive 2.3
Machine Learning: Linear regression, Logistic regression, Decision tree, Random Forest, K nearest neighbor, K mean, Avro, MLbase
Data Modeling Tools: Erwin 9.7, ER/Studio, Star-Schema Modeling, Snowflake Schema Modeling, FACT and dimension tables, Pivot Tables.
BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, Crystal Reports
Languages: SQL, PL/SQL, ASP, Visual Basic, XML, Python 3.6.5, SQL, T-SQL, SQL Server, C, C++, JAVA, HTML 5, UNIX shell scripting, PERL
Applications: Toad for Oracle, Oracle SQL Developer, MS Word 2017, MS Excel 2016, MS Power Point 2017, Teradata r15.
Big Data: Hadoop 3.0, Spark 2.3, Hive 2.3, Cassandra 3.11, MongoDB 3.6, MapReduce, Sqoop.
Operating Systems: Microsoft Windows 9x / NT / 2000/XP / Vista/7/8/10 and Unix.
Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Ralph Kimball and Bill Inmon, Waterfall Model.
Operating System: Windows, Unix, Sun Solaris
Confidential - Phoenix, AZ
Sr. Data Scientist
- Working as Data Scientist and developed predictive models, forecasts and analyses to turn data into actionable solutions.
- Lead the full machine learning system implementation process: Collecting data, model design, feature selection, system implementation, and evaluation.
- Implemented Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on to AWS (S3/EC2) and Django platform for the company's core business.
- Worked on AWS S3 buckets and intra cluster file transfer between PNDA and s3 securely.
- Reducing total medical cost and out of network utilization by identifying the opportunities in surgery.
- Pattern mining of different procedures that ends up to surgery using Association rules.
- Improved prediction of the likelihood of patients with congestive heart failure (Confidential) who are at risk of re-hospitalization within 30 days using logistic regression algorithms in order to outreach them and reduce cost of care.
- Developed unsupervised machine learning models in the Hadoop/Hive environment on AWS EC2 instance.
- Implemented public segmentation using unsupervised machine learning algorithms by implementing k-means algorithm using Pyspark.
- Performed K-means clustering, Multivariate analysis and Support Vector Machines.
- Worked on Natural Language Processing with NLTK module for application development for automated customer response.
- Utilized machine learning algorithms such as linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, & KNN for data analysis.
- Advanced Text analytics using Deep learning techniques such as Convolution neural networks to determine the sentiment of texts.
- Supported client by developing Machine Learning Algorithms on Big Data using PySpark to analyze transaction fraud, Cluster Analysis etc.
- Used Spark for test data analytics using MLLib and Analyzed the performance to identify bottlenecks.
- Converted the unstructured data into structured data using Apache Avro.
- Designed predictive models using the machine learning platform -H2O, Flow UI.
- Used the Agile Scrum methodology to build the different phases of Software development life cycle.
- Built multi-layers Neural Networks to implement Deep Learning by using Tensor flow and Keras.
- Worked with Machine learning algorithms like Regressions (linear, logistic), SVMs and Decision trees.
- Used Text Mining and NLP techniques find the sentiment about the organization.
- Developed MapReduce/Spark modules for machine learning & predictive analytics in Hadoop on AWS.
- Utilized various new supervised and unsupervised machine learning algorithms/software to perform NLP tasks and compare performances
- Worked with SAS for extracting data, manipulating, validating and generating reports.
- Performed data visualization using matplotlib library function such as Histograms, Pie charts, Bar charts, scatter plots etc.
- Used Spark for test data analytics using MLLib and Analyzed the performance to identify bottlenecks.
- Used various PROC and DATA statements like MEANS, UNIVARIATE, PRINT, LABEL, FORMAT and loops in SAS to read and write data.
- Responsible for Creating Repositories in GIT for a new user story.
- Verified whether the proper files are uploaded into the right GIT repositories.
- Created reports with Crystal Reports and scheduled to run on a daily basis.
- Accomplished multiple tasks from collecting data to organizing data and interpreting statistical information.
- Evaluate the performance of various algorithms/models/strategies based on the real world data sets.
- Worked with different data science teams and provided respective data as required on an ad-hoc request basis
- Assisted both application engineering and data scientist teams in mutual agreements/provisions of data.
Environment: Spark 2.0.2, MLbase, Pyspark, AWS, Agile, SAS, ODS, Agile, MapReduce, regression, logistic regression, random forest, neural networks, Avro, SAS, NLTK, XML, MLLib, Git & Json.
Confidential - St. Louis, MO
Sr. Data Scientist
- Responsible for Retrieving data using SQL/Hive Queries from the database and perform Analysis enhancements.
- Used R, SAS and SQL to manipulate data, and develop and validate quantitative models.
- Worked as a RLC (Regulatory and Legal Compliance) Team Member and undertook user stories (tasks) with critical deadlines in Agile Environment.
- Read data from various files including HTML, CSV and sas7bdat file etc using SAS/Python.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Coded, tested, debugged, implemented and documented data using R.
- Researched on Multi-layer classification algorithms as well as building Natural Language Processing model through ensemble.
- Cleaned data by analyzing and eliminating duplicate and inaccurate data using R.
- Experience in retrieving unstructured data from different sites such as in html, xml format.
- Performing Exploratory Data Analysis on the data provided by the Client.
- Worked with Pandas libraries in R for storing the retrieved data in a file.
- Conducting Root Cause Analysis independently/collaboratively in resolving multiple occurred issues.
- Worked with Dataframes and other data interfaces in R for retrieving and storing the data.
- Responsible in making sure that the data is accurate with no outliers.
- Applied various machine learning algorithms such as Decision Trees, K-Means, Random Forests and Regression in R with the required packages installed.
- Applied K-Means algorithm in determining the position of an Agent based on the data collected.
- Applied Regression in identifying the probability of the Agent's location regarding the insurance policies sold.
- Used advanced Microsoft Excel functions such as Pivot tables and VLOOKUP in order to analyze the data and prepare programs.
- Performed various statistical tests for clear understanding to the client.
- Actively involved in Analysis, Development and Unit testing of the data and delivery assurance of the user story in Agile Environment.
- Worked with Quality Control Teams to develop Test Plan and Test Cases.
Environment: R, Decision Trees, K-Means, Random Forests, Microsoft Excel, Agile, SAS, SQL, NLP
Confidential - San Francisco, CA
Sr. Data Analyst/Data Scientist
- Worked closely with data scientists to assist on feature engineering, model training frameworks, and model deployments implementing documentation discipline.
- Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats
- Performed Decision Tree Analysis and Random forests for strategic planning and forecasting and manipulating and cleaning data using dplyr and tidyr packages in Python.
- Involved in data analysis and creating data mapping documents to capture source to target transformation rules.
- Creating or modifying the T-SQL queries as per the business requirements and worked on creating role playing dimensions, fact-less Fact, snowflake and star schemas.
- Wrote, executed, performance tuned SQL Queries for Data Analysis & Profiling and wrote complex SQL queries using joins, sub queries and correlated sub queries.
- Involved in development and implementation of SSIS, SSRS and SSAS application solutions for various business units across the organization.
- Developed mappings to load Fact and Dimension tables, SCD Type 1 and SCD Type 2 dimensions and Incremental loading and unit tested the mappings.
- Wrote test cases, developed Test scripts using SQL and PL/SQL for UAT.
- Worked with the ETL team to document the Transformation Rules for Data Migration from OLTP to Warehouse Environment for reporting purposes.
- Performed data testing, tested ETL mappings (Transformation logic), tested stored procedures, and tested the XML messages.
- Created Use cases, activity report, logical components to extract business process flows and workflows involved in the project using Rational Rose, UML and Microsoft Visio.
- Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects XIR2.
- Developed Data Mapping, Transformation and Cleansing rules for the Master Data Management Architecture involved OLTP, ODS and OLAP.
- Extensively used SQL, T-SQL and PL/SQL to write stored procedures, functions, packages and triggers.
- Analyzed of data report were prepared weekly, biweekly, monthly using MS Excel, SQL & Unix.
- Applied various machine learning algorithms and statistical modeling like decision tree, logistic regression, Gradient Boosting Machine to build predictive model using scikit-learn package in Python.
Environment: Python, T-SQL, SSIS, SSRS, SQL, PL/SQL, OLTP, Oracle, MS Access, MS Excel, XML, Microsoft Visio, UML, OLAP, Unix
- Gathered and translated business requirements into detailed, production-level technical specifications, new features, and enhancements to existing technical business functionality.
- Used the Waterfall methodology to build the different phases of Software development life cycle.
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis
- Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management.
- Data analysis and reporting using MS Power Point, MS Access and SQL assistant.
- Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services.
- Developed SQL scripts for creating tables, Sequences, Triggers, views and materialized views.
- Compiled data from various sources public and private databases to perform complex analysis and data manipulation for actionable results.
- Used and maintained database in MS SQL Server to extract the data inputs from internal systems.
- Interacted with the Client and documented the Business Reporting needs to analyze the data.
- Used SAS for pre-processing data, SQL queries, data analysis, generating reports, graphics, and statistical analyses.
- Migrated database from legacy systems, SQL server to Oracle.
- Performed data analysis, statistical analysis, and generated reports, listings and graphs using SAS tools-SAS/Base, SAS/Macros and SAS graph, SAS/SQL, SAS/Connect, and SAS/Access.
- Developed data mapping documentation to establish relationships between source and target tables including transformation processes using SQL.
- Extensive data cleansing and analysis, using pivot tables, formulas (V-lookup and others), data validation, conditional formatting, and graph and chart manipulation using excel.
- Created pivot tables and charts using worksheet data and external resources, modified pivot tables, sorted items and group data, and refreshed and formatted pivot tables.
- Used advanced Microsoft Excel to create pivot tables, used VLOOKUP and other Excel functions.
- Worked on CSV files while trying to get input from the MySQL database.
- Created functions, triggers, views and stored procedures using MySQL.
- Worked on database testing, wrote complex SQL queries to verify the transactions and business logic.
- Worked on data profiling and data validation to ensure the accuracy of the data between the warehouse and source systems.
- Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.
Environment: Erwin 9.0, SDLC, MS Power Point, MS Access, MS SQL Server2008, SAS, Oracle11g, Microsoft Excel