Data Scientist Resume
Englewood Cliffs, NJ
SUMMARY
- Over 8+ years of working experience in IT industry as Data Scientist & Data Analyst.
- Proficient in managing entire data science project life cycle and actively involved in all the phases of project.
- Leverage a wide range of data analysis, machine learning and statistical modeling algorithms and methods to solve business problems.
- Experience working in Agile Scrum Software Development.
- Knowledge in Cloud services such as Microsoft Azure and Amazon AWS.
- Deep expertise with Statistical Analysis, Data mining and Machine Learning Skills using R, Python and SQL.
- Proficient in managing entire data science project life cycle and actively involved in all the phases of project.
- Professional working experience in Machine learning algorithms such as Linear Regression, Logistic Regression, Random Forests, Decision Trees, K - Means Clustering and Association Rules.
- Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
- Knowledge in NoSQL databases such as HBase, Cassandra, and Mongo DB etc.
- Highly skilled in using statistical analysis using R, SPSS, Matlab and Excel.
- Hands on Spark MLlib utilities such as including classification, regression, clustering, collaborative filtering, dimensionality reduction.
- Strong Data Analysis skills using business intelligence, SQL and MS Office Tools.
- Strong SQL Server programming skills, with experience in working with functions, packages and triggers.
- Hands on experience in Data Governance, Data Mining, Data Analysis, Data Validation, Predictive modeling,
- Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
TECHNICAL SKILLS
Machine Learning: Linear regression, Logistic regression, Decision tree, Random Forest, K nearest neighbor, K mean, Avro, MLbase
Data Science tool: R 3.5.0, Python 3.6.5, MATLAB
Big Data: Hadoop 3.0, Spark 2.3, Hive 2.3, MapReduce
NoSQL DB: Cassandra 3.11, MongoDB 3.6
Languages: SQL, PL/SQL, UNIX shell scripting
Operating System: Windows, Unix, Sun Solaris
Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Ralph Kimball and Bill Inmon, Waterfall Model.
PROFESSIONAL EXPERIENCE
Confidential - Englewood Cliffs, NJ
Data Scientist
Responsibilities:
- Working as Data Scientist and developed predictive models, forecasts and analyses to turn data into actionable solutions.
- Coordinated with the stakeholders and project key personnel to gather functional and non-functional requirements during JAD sessions.
- Communicate with team members, leadership, and stakeholders on findings to ensure models are well understood and incorporated into business processes.
- Effectively communicate with product team and support Engineering team during development cycle Agile.
- Used an Agile methodology to come up with test scenarios and test cases.
- Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
- Developed unsupervised machine learning models in the Hadoop/Hive environment on AWS EC2 instance.
- Led data discovery, handling structured and unstructured data, cleaning and performing descriptive analysis, and storing as normalized tables for dashboards.
- Built predictive models including linear regression and Random Forest Regression by using python scikit-learn.
- Explored and analyzed the customer specific features by using Matplotlib and ggplot2.
- Automate solutions to manual processes with big data tools (Python, AWS).
- Developed data models (Logistic Regression, KNN, Random Forests and Deep Neural Networks) in Python.
- Implemented classification using supervised algorithms like Logistic Regression, Decision trees, KNN, and Naive Bayes.
- Utilized various new supervised and unsupervised machine learning algorithms/software to perform NLP tasks and compare performances
- Supported client by developing Machine Learning Algorithms on Big Data using PySpark to analyze transaction fraud, Cluster Analysis etc.
- Analyze large business datasets to provide strategic direction to the company using data analytics.
- Implemented public segmentation using unsupervised machine learning algorithms by implementing k-means algorithm using Pyspark.
- Performed K-means clustering, Multivariate analysis and Support Vector Machines.
- Responsible for Creating Repositories in Git for a new user story.
- Verified whether the proper files are uploaded into the right Git repositories.
- Accomplished multiple tasks from collecting data to organizing data and interpreting statistical information.
- Evaluate the performance of various algorithms/models/strategies based on the real world data sets.
- Regularly accessing JIRA tool and other internal issue trackers for the Project development.
- Involved in developing Ad-hoc reporting for various sales operations for different customers using Tableau dashboards.
- Assisted both application engineering and data scientist teams in mutual agreements/provisions of data.
Confidential - Phoenix, AZ
Sr. Data Scientist
Responsibilities:
- Worked as Data Scientist and developed predictive models, forecasts and analyses to turn data into actionable solutions.
- Lead the full machine learning system implementation process: Collecting data, model design, feature selection, system implementation, and evaluation.
- Implemented Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on to AWS (S3/EC2) and Django platform for the company's core business.
- Worked on AWS S3 buckets and intra cluster file transfer between PNDA and s3 securely.
- Reducing total medical cost and out of network utilization by identifying the opportunities in surgery.
- Pattern mining of different procedures that ends up to surgery using Association rules.
- Improved prediction of the likelihood of patients with congestive heart failure (Confidential) who are at risk of re-hospitalization within 30 days using logistic regression algorithms in order to outreach them and reduce cost of care.
- Developed unsupervised machine learning models in the Hadoop/Hive environment on AWS EC2 instance.
- Implemented public segmentation using unsupervised machine learning algorithms by implementing k-means algorithm using Pyspark.
- Performed K-means clustering, Multivariate analysis and Support Vector Machines.
- Worked on Natural Language Processing with NLTK module for application development for automated customer response.
- Utilized machine learning algorithms such as linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, & KNN for data analysis.
- Advanced Text analytics using Deep learning techniques such as Convolution neural networks to determine the sentiment of texts.
- Supported client by developing Machine Learning Algorithms on Big Data using PySpark to analyze transaction fraud, Cluster Analysis etc.
- Used Spark for testdataanalytics using MLLib and Analyzed the performance to identify bottlenecks.
- Converted the unstructured data into structured data using Apache Avro.
- Designed predictive models using the machine learning platform -H2O, Flow UI.
- Used the Agile Scrum methodology to build the different phases of Software development life cycle.
- Built multi-layers Neural Networks to implement Deep Learning by using Tensor flow and Keras.
- Worked with Machine learning algorithms like Regressions (linear, logistic), SVMs and Decision trees.
- Used Text Mining and NLP techniques find the sentiment about the organization.
- Developed MapReduce/Spark modules for machine learning & predictive analytics in Hadoop on AWS.
- Utilized various new supervised and unsupervised machine learning algorithms/software to perform NLP tasks and compare performances
- Worked with SAS for extracting data, manipulating, validating and generating reports.
- Performed data visualization using matplotlib library function such as Histograms, Pie charts, Bar charts, scatter plots etc.
- Used Spark for test data analytics using MLLib and Analyzed the performance to identify bottlenecks.
- Used various PROC and DATA statements like MEANS, UNIVARIATE, PRINT, LABEL, FORMAT and loops in SAS to read and write data.
- Responsible for Creating Repositories in GIT for a new user story.
- Verified whether the proper files are uploaded into the right GIT repositories.
- Created reports with Crystal Reports and scheduled to run on a daily basis.
- Accomplished multiple tasks from collecting data to organizing data and interpreting statistical information.
- Evaluate the performance of various algorithms/models/strategies based on the real world data sets.
- Worked with different data science teams and provided respective data as required on an ad-hoc request basis
- Assisted both application engineering and data scientist teams in mutual agreements/provisions of data.
Tools: Spark 3.0, MLbase, Pyspark, AWS, Agile, SAS, ODS, Agile, MapReduce, regression, logistic regression, random forest, neural networks, Avro, SAS, NLTK, XML, MLLib, Git & JSON.
Confidential - Worcester, MA
Data Scientist
Responsibilities:
- Lead the full machine learning system implementation process: Collecting data, model design, feature selection, system implementation, and evaluation.
- Worked with Machine learning algorithms like Regressions (linear, logistic), SVMs and Decision trees.
- Implemented Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on to AWS (S3/EC2) and Django platform for the company's core business.
- Worked on AWS S3 buckets and intra cluster file transfer between PNDA and s3 securely.
- Developed predictive models on large scale datasets to address various business problems through leveraging advanced statistical modeling, machine learning and deep learning.
- Built multi-layers Neural Networks to implement Deep Learning by using Tensor flow and Keras.
- Perfectly Utilized machine learning algorithms such as linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, & KNN for data analysis.
- Researched extensively on the nature of the customers and designed multiple models to perfectly fit the necessity of the client and Performed Extensive Behavioral modeling and Customer Segmentation to discover behavior patterns of customers by using K-means Clustering.
- Designed and developed various machine learning frameworks using MATLAB.
- Super Intended usage of open source tools - R Studio(R) and for statistical analysis and building the machine learning models.
- Developed a Machine Learning test-bed with different model learning and feature learning algorithms.
- By thorough systematic search, demonstrated performance surpassing the state-of-the-art (deep learning).
- Used Text Mining and NLP techniques find the sentiment about the organization.
- Developed unsupervised machine learning models in the Hadoop/Hive environment on AWS EC2 instance.
- Used clustering technique K-Means to identify outliers and to classify unlabeled data.
- Worked with data-sets of varying degrees of size and complexity including both structured and unstructured data.
- Participated in all phases of Data mining, Data cleaning, Data collection, developing models, Validation, Visualization and Performed Gap analysis.
- Used R programming language for graphically critiquing the datasets and to gain insights to interpret the nature of the data.
- Establish a robust process by machine learning(MLbase) to insure the predictive analytics and quality of all algorithms and processes
- Ensure operational and optimal execution of production data science routines and processes
- Implemented supervised learning algorithms such as Neural networks, SVM, Decision trees and Naïve Bayes for advanced text analytics.
- Performed Data wrangling to clean, transform and reshape the data utilizing Numpy and Pandas library.
- Contribute to data mining architectures, modeling standards, reporting, and data analysis methodologies.
- Conduct research and make recommendations on data mining products, services, protocols, and standards in support of procurement and development efforts.
- Involved in defining the Source to Target data mappings, Business rules, data definitions.
- Worked with different data science teams and provided respective data as required on an ad-hoc request basis
- Assisted both application engineering and data scientist teams in mutual agreements/provisions of data.
Tools: AWS S3, NLP, EC2, Neural networks, SVM, Decision trees, MLbase, ad-hoc, MAHOUT, NoSQL, Pl/Sql, MDM, MATLAB, MLLib & Git.
Confidential - Union, NJ
Data Analyst/Data Scientist
Responsibilities:
- Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
- Worked closely with other data scientists to assist on feature engineering, model training frameworks, and model deployments implementing documentation discipline.
- Worked with the ETL team to document the Transformation Rules for Data Migration from OLTP to Warehouse Environment for reporting purposes.
- Performed data testing, tested ETL mappings (Transformation logic), tested stored procedures, and tested the XML messages.
- Created Use cases, activity report, logical components to extract business process flows and workflows involved in the project using Rational Rose, UML and Microsoft Visio.
- Involved in development and implementation of SSIS, SSRS and SSAS application solutions for various business units across the organization.
- Developed mappings to load Fact and Dimension tables, SCD Type 1 and SCD Type 2 dimensions and Incremental loading and unit tested the mappings.
- Wrote test cases, developed Test scripts using SQL and PL/SQL for UAT.
- Creating or modifying the T-SQL queries as per the business requirements and worked on creating role playing dimensions, fact-less Fact, snowflake and star schemas.
- Wrote, executed, performance tuned SQL Queries for Data Analysis & Profiling and wrote complex SQL queries using joins, sub queries and correlated sub queries.
- Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects XIR2.
- Predicted house-prices and area population income using regression methods in Excel and Octave (Matlab).
- Developed Data Mapping, Transformation and Cleansing rules for the Master Data Management Architecture involved OLTP, ODS and OLAP.
- Performed Decision Tree Analysis and Random forests for strategic planning and forecasting and manipulating and cleaning data using dplyr and tidyr packages in Python.
- Involved in data analysis and creating data mapping documents to capture source to target transformation rules.
- Extensively used SQL, T-SQL and PL/SQL to write stored procedures, functions, packages and triggers.
- Analyzed of data report were prepared weekly, biweekly, monthly using MS Excel, SQL & Unix.
- Applied various machine learning algorithms and statistical modeling like decision tree, logistic regression, Gradient Boosting Machine to build predictive model using scikit-learn package in Python.
Tools: Python 2.7, T-SQL, SSIS, SSRS, SQL, PL/SQL, OLTP, Oracle, MS Access2007, MS Excel, XML, Microsoft Visio, UML, OLAP, Unix