Sr. Data Scientist Resume
Cherry Hill, NJ
SUMMARY:
- Well Having 7+ years of working experience as a Sr. Data Scientist & Data Analyst.
- Leverage a wide range of data analysis, machine learning and statistical modeling algorithms and methods to solve business problems.
- Experience working in Agile Scrum Software Development.
- Knowledge in Cloud services such as Microsoft Azure and Amazon AWS.
- Knowledge about Big Data toolkits like Mahout, Spark ML, H2O.
- Professional working experience in Machine learning algorithms such as Linear Regression, Logistic Regression, Random Forests, Decision Trees, K - Means Clustering and Association Rules.
- Deep expertise with Statistical Analysis, Data mining and Machine Learning Skills using R, Python and SQL .
- Proficient in managing entire data science project life cycle and actively involved in all the phases of project.
- Knowledge on test driven developments p y.test / Nose and GPU environments
- Hands on Spark MLlib utilities such as including classification, regression, clustering, collaborative filtering, dimensionality reduction.
- Strong skills in Machine Learning algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Decision Tree, Random Forest, Support Vector Machine, K-Nearest-Neighbors, K-means Clustering, Neural networks, Ensemble Methods.
- Working experience in implementing Machine Learning Algorithms using MLLib and Mahout in Hadoop ecosystem and Apache Spark framework such as HDFS, Map Reduce, HiveQL, Spark SQL and PySpark.
- Hands on experience Hadoop, Deep Learning Text Analytics and IBM Data Science work bench tools.
- Hands on experience in Data Governance, Data Mining, Data Analysis, Data Validation, Predictive modeling, Data Lineage and Data Visualization in all the phases of the Data Science Life Cycle.
- Extensively worked for data analysis using R Studio, SQL, Tableau and other BI tools.
- Working with large sets of complex datasets that include structured, semi-structured and unstructured data and discover meaningful business insights.
- Knowledge in NoSQL databases such as HBase, Cassandra, and Mongo DB etc.
- Highly skilled in using statistical analysis using R, SPSS, Matlab and Excel .
- Experience working with SAS Language for validating data and generating reports.
- Experience working with Web languages such as Html, CSS, Rshiny etc.
- Strong Data Analysis skills using business intelligence, SQL and MS Office Tools.
- Strong SQL Server programming skills, with experience in working with functions, packages and triggers .
- Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
- Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide Data Summarization .
- Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
TECHNICAL SKILLS:
Machine Learning: Linear regression, Logistic regression, Decision tree, Random Forest, K nearest neighbor, K mean, Avro, MLbase
Data Science tool: R 3.5.0, Python 3.6.5, MATLAB
Database: Oracle 12c, MS Access 2016, SQL Server 2017, Sybase and DB2, Teradata r15, Hive 2.3
Erwin 9.7, ER/Studio, Star: Schema Modeling, Snowflake Schema Modeling, FACT and dimension tables, Pivot Tables.
Big Data: Hadoop 3.0, Spark 2.3, Hive 2.3, Cassandra 3.11, MongoDB 3.6, MapReduce, Sqoop.
Operating Systems: Microsoft Windows 9x / NT / 2000/XP / Vista/7/8/10 and Unix.
BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, Crystal Reports
Operating Systems: SQL, PL/SQL, ASP, Visual Basic, XML, Python 3.6.5, SQL, T-SQL, SQL Server, C, C++, JAVA, HTML 5, UNIX shell scripting, PERL
Applications: Toad for Oracle, Oracle SQL Developer, MS Word 2017, MS Excel 2016, MS Power Point 2017, Teradata r15 .
Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Ralph Kimball and Bill Inmon, Waterfall Model.
PROFESSIONAL EXPERIENCE:
Confidential - Cherry Hill, NJ
Sr. Data Scientist
Responsibilities:
- Working as Data Scientist and developed predictive models, forecasts and analyses to turn data into actionable solutions.
- Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
- Led data discovery, handling structured and unstructured data, cleaning and performing descriptive analysis, and storing as normalized tables for dashboards.
- Designed predictive models using the machine learning platform - H2O, Flow UI.
- Used the Agile Scrum methodology to build the different phases of Software development life cycle.
- Developed MapReduce/Spark modules for machine learning & predictive analytics in Hadoop on AWS.
- Implemented classification using supervised algorithms like Logistic Regression, Decision trees, KNN, and Naive Bayes .
- Utilized various new supervised and unsupervised machine learning algorithms/software to perform NLP tasks and compare performances
- Supported client by developing Machine Learning Algorithms on Big Data using PySpark to analyze transaction fraud, Cluster Analysis etc.
- Used Spark for test data analytics using MLLib and Analyzed the performance to identify bottlenecks.
- Evaluate the performance of various algorithms/models/strategies based on the real world data sets.
- Regularly accessing JIRA tool and other internal issue trackers for the Project development.
- Communicate with team members, leadership, and stakeholders on findings to ensure models are well understood and incorporated into business processes.
Environment: Spark 2.0.2, MLbase, Pyspark, AWS, Agile, SAS, Jira, ODS, MapReduce, regression, logistic regression, random forest, neural networks, Avro, SAS, NLTK, XML, MLLib, Git & JSON.
Confidential - South Portland, ME
Sr. Data Scientist
Responsibilities:
- Lead the full machine learning system implementation process: Collecting data, model design, feature selection, system implementation, and evaluation.
- Worked with Machine learning algorithms like Regressions (linear, logistic), SVMs and Decision trees.
- Developed a Machine Learning test-bed with different model learning and feature learning algorithms.
- By thorough systematic search, demonstrated performance surpassing the state-of-the-art (deep learning).
- Used Text Mining and NLP techniques find the sentiment about the organization.
- Developed unsupervised machine learning models in the Hadoop/Hive environment on AWS EC2 instance.
- Used clustering technique K-Means to identify outliers and to classify unlabeled data.
- Worked with data-sets of varying degrees of size and complexity including both structured and unstructured data .
- Participated in all phases of Data mining, Data cleaning, Data collection, developing models, Validation, Visualization and Performed Gap analysis.
- Used R programming language for graphically critiquing the datasets and to gain insights to interpret the nature of the data.
- Implemented Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on to AWS (S3/EC2) and Django platform for the company's core business.
- Data wrangling to clean, transform and reshape the data utilizing Numpy and Pandas library .
- Contribute to data mining architectures, modeling standards, reporting, and data analysis methodologies.
- Conduct research and make recommendations on data mining products, services, protocols, and standards in support of procurement and development efforts.
- Involved in defining the Source to Target data mappings, Business rules, data definitions.
- Worked with different data science teams and provided respective data as required on an ad-hoc request basis
- Assisted both application engineering and data scientist teams in mutual agreements/provisions of data.
Environment: R Studio 3.5.1, AWS S3, NLP, EC2, Neural networks, SVM, Decision trees, MLbase, ad-hoc, MAHOUT, NoSQL, Pl/Sql, MDM, MLLib & Git.
Confidential - St. Louis, MO
Data Scientist
Responsibilities
- Responsible for Retrieving data using SQL/Hive Queries from the database and perform Analysis enhancements.
- Used R, SAS and SQL to manipulate data, and develop and validate quantitative models.
- Worked as a RLC (Regulatory and Legal Compliance) Team Member and undertook user stories (tasks) with critical deadlines in Agile Environment .
- Applied Regression in identifying the probability of the Agent's location regarding the insurance policies sold.
- Used advanced Microsoft Excel functions such as Pivot tables and VLOOKUP in order to analyze the data and prepare programs.
- Performed various statistical tests for clear understanding to the client.
- Actively involved in Analysis, Development and Unit testing of the data and delivery assurance of the user story in Agile Environment .
- Cleaned data by analyzing and eliminating duplicate and inaccurate data using R.
- Experience in retrieving unstructured data from different sites such as in html, xml format .
- Worked with Dataframes and other data interfaces in R for retrieving and storing the data.
- Responsible in making sure that the data is accurate with no outliers.
- Applied various machine learning algorithms such as Decision Trees, K-Means, Random Forests and Regression in R with the required packages installed.
- Applied K-Means algorithm in determining the position of an Agent based on the data collected.
- Read data from various files including HTML, CSV and s as7bd at file etc using SAS/Python.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Coded, tested, debugged, implemented and documented data using R.
- Researched on Multi-layer classification algorithms as well as building Natural Language Processing model through ensemble.
- Worked with Quality Control Teams to develop Test Plan and Test Cases .
Environment: R 3.5, Decision Trees, K-Means, Random Forests, Microsoft Excel, Agile, SAS, SQL, NLP
Confidential
Data Analyst/Data Scientist
Responsibilities:
- Worked closely with data scientists to assist on feature engineering, model training frameworks, and model deployments implementing documentation discipline.
- Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
- Worked with the ETL team to document the Transformation Rules for Data Migration from OLTP to Warehouse Environment for reporting purposes.
- Performed data testing, tested ETL mappings (Transformation logic), tested stored procedures, and tested the XML messages.
- Created Use cases, activity report, logical components to extract business process flows and workflows involved in the project using Rational Rose, UML and Microsoft Visio.
- Involved in development and implementation of SSIS, SSRS and SSAS application solutions for various business units across the organization.
- Developed mappings to load Fact and Dimension tables, SCD Type 1 and SCD Type 2 dimensions and Incremental loading and unit tested the mappings.
- Wrote test cases, developed Test scripts using SQL and PL/SQL for UAT .
- Creating or modifying the T-SQL queries as per the business requirements and worked on creating role playing dimensions, fact-less Fact, snowflake and star schemas .
- Wrote, executed, performance tuned SQL Queries for Data Analysis & Profiling and wrote complex SQL queries using joins, sub queries and correlated sub queries.
- Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects XIR2 .
- Developed Data Mapping, Transformation and Cleansing rules for the Master Data Management Architecture involved OLTP, ODS and OLAP.
- Performed Decision Tree Analysis and Random forests for strategic planning and forecasting and manipulating and cleaning data using dplyr and tidyr packages in Python .
- Involved in data analysis and creating data mapping documents to capture source to target transformation rules.
- Extensively used SQL, T-SQL and PL/SQL to write stored procedures, functions, packages and triggers .
- Analyzed of data report were prepared weekly, biweekly, monthly using MS Excel, SQL & Unix.
- Applied various machine learning algorithms and statistical modeling like decision tree, logistic regression, Gradient Boosting Machine to build predictive model using scikit-learn package in Python .
Environment: Python 2.7, T-SQL, SSIS, SSRS, SQL, PL/SQL, OLTP, Oracle, MS Access2007, MS Excel, XML, Microsoft Visio, UML, OLAP, Unix
Confidential
Jr. Data Analyst
Responsibilities:
- Gathered and translated business requirements into detailed, production-level technical specifications, new features, and enhancements to existing technical business functionality.
- Data analysis and reporting using MS Power Point, MS Access and SQL assistant .
- Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services .
- Used advanced Microsoft Excel to create pivot tables, used VLOOKUP and other Excel functions .
- Worked on CSV files while trying to get input from the MySQL database .
- Created functions, triggers, views and stored procedures using MySQL .
- Worked on database testing, wrote complex SQL queries to verify the transactions and business logic.
- Worked on data profiling and data validation to ensure the accuracy of the data between the warehouse and source systems.
- Developed SQL scripts for creating tables, Sequences, Triggers, views and materialized views .
- Compiled data from various sources public and private databases to perform complex analysis and data manipulation for actionable results.
- Used and maintained database in MS SQL Server to extract the data inputs from internal systems.
- Interacted with the Client and documented the Business Reporting needs to analyze the data.
- Used SAS for pre-processing data, SQL queries, data analysis, generating reports, graphics, and statistical analyses.
- Migrated database from legacy systems, SQL server to Oracle .
- Performed data analysis, statistical analysis, and generated reports, listings and graphs using SAS tools-SAS/Base, SAS/Macros and SAS graph, SAS/SQL, SAS/Connect, and SAS/Access.
- Developed data mapping documentation to establish relationships between source and target tables including transformation processes using SQL .
- Used the Waterfall methodology to build the different phases of Software development life cycle.
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis
- Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management.
- Extensive data cleansing and analysis, using pivot tables , formulas (V-lookup and others), data validation, conditional formatting, and graph and chart manipulation using excel .
- Created pivot tables and charts using worksheet data and external resources, modified pivot tables, sorted items and group data, and refreshed and formatted pivot tables.
- Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects .
Environment: Erwin 9.0, SDLC, MS Power Point, MS Access, MS SQL Server2008, SAS, Oracle11g, Microsoft Excel