Sr. Data Scientist Resume
Franklin Lakes, NJ
PROFESSIONAL SUMMARY:
- Well around 5 years of working experience as a Data Scientist & Data Analyst.
- Leverage a wide range of data analysis, machine learning and statistical modeling algorithms and methods to solve business problems.
- Experience working in Agile Scrum Software Development.
- Knowledge in Cloud services such as Microsoft Azure and Amazon AWS.
- Knowledge about Big Data toolkits like Mahout, Spark ML, H2O.
- Professional working experience in Machine learning algorithms such as Linear Regression, Logistic Regression, Random Forests, Decision Trees, K - Means Clustering and Association Rules.
- Deep expertise with Statistical Analysis, Data mining and Machine Learning Skills using R, Python and SQL.
- Proficient in managing entire data science project life cycle and actively involved in all the phases of project.
- Hands on Spark MLlib utilities such as including classification, regression, clustering, collaborative filtering, dimensionality reduction.
- Strong skills in Machine Learning algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Decision Tree, Random Forest, Support Vector Machine, K-Nearest-Neighbors, K-means Clustering, Neural networks, Ensemble Methods.
- Working experience in implementing Machine Learning Algorithms using MLLib and Mahout in Hadoop ecosystem and Apache Spark framework such as HDFS, Map Reduce, HiveQL, Spark SQL and PySpark.
- Hands on experience Hadoop, Deep Learning Text Analytics and IBM Data Science work bench tools.
- Hands on experience in Data Governance, Data Mining, Data Analysis, Data Validation, Predictive modeling, Data Lineage and Data Visualization in all the phases of the Data Science Life Cycle.
- Extensively worked for data analysis using R Studio, SQL, Tableau and other BI tools.
- Working with large sets of complex datasets that include structured, semi-structured and unstructured data and discover meaningful business insights.
- Knowledge in NoSQL databases such as HBase, Cassandra, and Mongo DB etc.
- Highly skilled in using statistical analysis using R, SPSS, Matlab and Excel.
- Experience working with SAS Language for validating data and generating reports.
- Experience working with Web languages such as Html, CSS, Rshiny etc.
- Strong Data Analysis skills using business intelligence, SQL and MS Office Tools.
- Strong SQL Server programming skills, with experience in working with functions, packages and triggers.
- Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
- Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide Data Summarization.
- Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
TECHNICAL SKILLS:
Machine Learning: Linear regression, Logistic regression, Decision tree, Random Forest, K nearest neighbor, K mean, Avro, MLbase
Data Science tool: R 3.5.0, Python 3.6.5, and MATLAB
Databases: Oracle 12c, MS Access 2016, SQL Server 2017, Sybase and DB2, Teradata r15, Hive 2.3.
Data Modeling Tools: Erwin 9.7, ER/Studio, Star-Schema Modeling, Snowflake Schema Modeling, FACT and dimension tables, Pivot Tables.
Big Data: Hadoop 3.0, Spark 2.3, Hive 2.3, Cassandra 3.11, MongoDB 3.6, MapReduce, Sqoop.
Operating Systems: Microsoft Windows 9x / NT / 2000/XP / Vista/7/8/10 and UNIX, Solaris
BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, Crystal Reports
Languages: SQL, PL/SQL, ASP, Visual Basic, XML, Python 3.6.5, SQL, T-SQL, SQL Server, C, C++, JAVA, HTML5, UNIX shell scripting, PERL
Applications: Toad for Oracle, Oracle SQL Developer, MS Word 2017, MS Excel 2016, MS Power Point 2017, Teradata r15.
Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Ralph Kimball and Bill Inmon, Waterfall Model.
PROFESSIONAL EXPERIENCE:
Confidential - Franklin Lakes, NJ
Sr. Data Scientist
Responsibilities:
- Working as Data Scientist in extraction data and preparing data according to business requirements.
- Led data discovery, handling structured and unstructured data, cleaning and performing descriptive analysis, and preparing data sets.
- Worked with Machine learning algorithms like Regressions (linear, logistic), SVMs and Decision trees.
- Extracted Text data from XML files and performed topic modeling on top of it.
- Worked extensively with IVR developers and Business Analysts to understand the IVR call flow.
- Developed MapReduce/Spark modules for machine learning & predictive analytics in Hadoop.
- Similarly extracted the useful data from CSV files and performed analytics on them.
- Merged and matched different data sets from different data sources.
- Worked extensively with python in optimization of the code for better performance.
- Performed the analytics on final data set and gave inputs to coreAI in improvising the IVR performance.
- Worked in SAS programming for auditing data, developing data, performing data validation QA and improve-efficiency of SAS programs.
- Explained and validated work results and insight applied to tools used for client analysis and documentation of SAS files and library structures
- Understand customer business use cases and be able to translate them to analytical data applications and models with a vision on how to implement.
- Evaluate the performance of various Topic modeling algorithms using Text analytics/Mining.
- Communicate with team members, leadership, and Director on findings to ensure models are well understood and incorporated into business processes.
Environment: Spyder, Jupiter, Python, SAS, Teradata SQL Assistent, LDA, Teradata ODBC, NLP, Text Analytics/mining, Clustring, Hadoop, SAP, Teradata, Virtual Machine.
Confidential - Portland, OR
Sr. Data Scientist
Responsibilities:
- Worked as Data Scientist and developed predictive models, forecasts and analyses to turn data into actionable solutions.
- Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
- Led data discovery, handling structured and unstructured data, cleaning and performing descriptive analysis, and storing as normalized tables for dashboards.
- Designed predictive models using the machine learning platform -H2O, Flow UI.
- Used the Agile Scrum methodology to build the different phases of Software development life cycle.
- Developed MapReduce/Spark modules for machine learning & predictive analytics in Hadoop on AWS.
- Implemented classification using supervised algorithms like Logistic Regression, Decision trees, KNN, and Naive Bayes.
- Utilized various new supervised and unsupervised machine learning algorithms/software to perform NLP tasks and compare performances
- Supported client by developing Machine Learning Algorithms on Big Data using PySpark to analyze transaction fraud, Cluster Analysis etc.
- Used Spark for test data analytics using MLLib and Analyzed the performance to identify bottlenecks.
- Evaluate the performance of various algorithms/models/strategies based on the real world data sets.
- Regularly accessing JIRA tool and other internal issue trackers for the Project development.
- Communicate with team members, leadership, and stakeholders on findings to ensure models are well understood and incorporated into business processes.
Environment: Spark 2.0.2, MLbase, Pyspark, AWS, Agile, SAS, Jira, ODS, MapReduce, regression, logistic regression, random forest, neural networks, Avro, SAS, NLTK, XML, MLLib, Git & JSON.
Confidential - Monroe, LA
Sr. Data Scientist
Responsibilities:
- Lead the full machine learning system implementation process: C llecting data, model design, feature selection, system implementation, and evaluation.
- Worked with Machine learning algorithms like Regressions (linear, logistic), SVMs and Decision trees.
- Developed a Machine Learning test-bed with different model learning and feature learning algorithms.
- By thorough systematic search, demonstrated performance surpassing the state-of-the-art (deep learning).
- Used Text Mining and NLP techniques find the sentiment about the organization.
- Developed unsupervised machine learning models in the Hadoop/Hive environment.
- Used clustering technique K-Means to identify outliers and to classify unlabeled data.
- Worked with data-sets of varying degrees of size and complexity including both structured and unstructured data.
- Participated in all phases of Data mining, Data cleaning, Data collection, developing models, Validation, Visualization and Performed Gap analysis.
- Used R programming language for graphically critiquing the datasets and to gain insights to interpret the nature of the data.
- Implemented Predictive analytics and machine learning algorithms to forecast key metrics for the company's core business.
- Data wrangling to clean, transform and reshape the data utilizing Numpy and Pandas library.
- Contribute to data mining architectures, modeling standards, reporting, and data analysis methodologies.
- Conduct research and make recommendations on data mining products, services, protocols, and standards in support of procurement and development efforts.
- Involved in defining the Source to Target data mappings, Business rules, data definitions.
- Worked with different data science teams and provided respective data as required on an ad-hoc request basis
- Assisted both application engineering and data scientist teams in mutual agreements/provisions of data.
Environment: R Studio 3.5.1, AWS S3, NLP, EC2, Neural networks, SVM, Decision trees, MLbase, ad-hoc, MAHOUT, NoSQL, Pl/Sql, MDM, MLLib & Git.
Confidential - NY
Jr. Data Scientist
Responsibilities:
- Responsible for Retrieving data using SQL/Hive Queries from the database and perform Analysis enhancements.
- Used R, SAS and SQL to manipulate data, and develop and validate quantitative models.
- Worked as a RLC (Regulatory and Legal Compliance) Team Member and undertook user stories (tasks) with critical deadlines in Agile Environment.
- Applied Regression in identifying the probability of the Agent's location regarding the insurance policies sold.
- Used advanced Microsoft Excel functions such as Pivot tables and VLOOKUP in order to analyze the data and prepare programs.
- Performed various statistical tests for clear understanding to the client.
- Actively involved in Analysis, Development and Unit testing of the data and delivery assurance of the user story in Agile Environment.
- Cleaned data by analyzing and eliminating duplicate and inaccurate data using R.
- Experience in retrieving unstructured data from different sites such as in html, xml format.
- Worked with Dataframes and other data interfaces in R for retrieving and storing the data.
- Responsible in making sure that the data is accurate with no outliers.
- Applied various machine learning algorithms such as Decision Trees, K-Means, Random Forests and Regression in R with the required packages installed.
- Applied K-Means algorithm in determining the position of an Agent based on the data collected.
- Read data from various files including HTML, CSV and sas7bdat file etc using SAS/Python.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Coded, tested, debugged, implemented and documented data using R.
- Researched on Multi-layer classification algorithms as well as building Natural Language Processing model through ensemble.
- Worked with Quality Control Teams to develop Test Plan and Test Cases.
Environment: R 3.5, Decision Trees, K-Means, Random Forests, Microsoft Excel, Agile, SAS, SQL, NLP, Visio, UML, OLAP, Unix
Confidential
Data Analyst
Responsibilities:
- Gathered and translated business requirements into detailed, production-level technical specifications, new features, and enhancements to existing technical business functionality.
- Data analysis and reporting using MS Power Point, MS Access and SQL assistant.
- Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services.
- Used advanced Microsoft Excel to create pivot tables, used VLOOKUP and other Excel functions.
- Worked on CSV files while trying to get input from the MySQL database.
- Created functions, triggers, views and stored procedures using MySQL.
- Worked on database testing, wrote complex SQL queries to verify the transactions and business logic.
- Worked on data profiling and data validation to ensure the accuracy of the data between the warehouse and source systems.
- Developed SQL scripts for creating tables, Sequences, Triggers, views and materialized views.
- Compiled data from various sources public and private databases to perform complex analysis and data manipulation for actionable results.
- Used and maintained database in MS SQL Server to extract the data inputs from internal systems.
- Interacted with the Client and documented the Business Reporting needs to analyze the data.
- Used SAS for pre-processing data, SQL queries, data analysis, generating reports, graphics, and statistical analyses.
- Migrated database from legacy systems, SQL server to Oracle.
- Performed data analysis, statistical analysis, and generated reports, listings and graphs using SAS tools-SAS/Base, SAS/Macros and SAS graph, SAS/SQL, SAS/Connect, and SAS/Access.
- Developed data mapping documentation to establish relationships between source and target tables including transformation processes using SQL.
- Used the Waterfall methodology to build the different phases of Software development life cycle.
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis
- Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management.
- Extensive data cleansing and analysis, using pivot tables, formulas (V-lookup and others), data validation, conditional formatting, and graph and chart manipulation using excel.
- Created pivot tables and charts using worksheet data and external resources, modified pivot tables, sorted items and group data, and refreshed and formatted pivot tables.
- Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.
Environment: Erwin 9.0, SDLC, MS Power Point, MS Access, MS SQL Server2008, SAS, Oracle11g, Microsoft Excel