Data Scientist Resume
New, YorK
SUMMARY:
- Over 8+ years of experience in Data Analysis, Data Integration, Migration and Meta data Management (MDM) and Configuration Management.
- Experience in Analysis, Design, Development, Deployment and Maintenance of critical software.
- Developed various Machine Learning applications with Python Scientific Stack and R.
- Experienced with Deep Learning frameworks like Scikit Learn and Tensorflow
- Experienced Data Analyst with solid understanding of Data Mapping, Data warehousing (OLTP, OLAP), Data Mining, Data Governance and Data management services with Quality Assurance.
- Adept in Statistical Data Analysis, Exploratory Data Analysis, Machine Learning, Data Mining, Java and Data visualization using R, Python, Base SAS, SAS Enterprise Guide and SAS Enterprise Miner, Tableau and SQL
- Proficient in Tableau and R - Shiny data visualization tools to analyze and obtain insights into large datasets, create visually powerful and actionable interactive reports and dashboards.
- Strong working experience in Financial and Insurance industry with substantial knowledge of various front end and back end and overall end to end processes.
- Proficient in data mining tools like R, SAS, Python, SQL, Excel, Java, ecosystems Staff leadership and Java development.
- Proficiency in preparing ETL Mappings (Source-Stage, Stage-Integration, ISD), Requirements gathering, Data Reporting, Data visualization, Advanced business dashboard and presenting in front of clients.
- Worked with NoSQL Database including HBase, Cassandra and MongoDB
- Extensive experience in Object Oriented Analysis and Design (OOAD) techniques with UML using Flow Charts, Use Cases, Class Diagrams, Sequence Diagrams, Activity Diagrams and State Transition Diagrams
- Strong working knowledge with SQL, SQL Server, Oracle, SAS, Tableau and Jupyter while handling various Mat lab applications in multiple projects
- Experience in data scaling, wrangling and data visualization in R, Python, SAS and Tableau
- Strong SQL query skills and Spark experience with designing and verifying Databases using Entity-Relationship Diagrams (ERD) and data profiling utilizing queries, dashboards, macros etc.
- Worked closely with the QA-Team in executing the test scenarios, plans, providing test data, creating test cases, Issuing STR’s upon identification of bugs and collecting the test metrics
- Experience in performing user acceptance testing (UAT) and End to End testing monitoring test results and Networks C++ escalating based on priorities
- Experience working with WATERFALL and AGILE methodologies and demonstrated excellent quality in Mat lab delivering the output.
TECHNICAL SKILLS:
Languages: SQL, PL/SQL, java, C, C++, XML, HTML, MATLAB, Python, Mat lab R.
Statistical Analysis: R, Python, MATLAB, Minitab, Jupyter
Databases: SQL Server 2014/2012/2008/2005/2000 , MS-Access, Oracle 11g/10g/9i.
DWH / BI Tools: Microsoft Power BI, Tableau, SSAS, Visual Studio, Crystal Reports, Informatica, R-Studio.
Database Design Tools and Data Modeling: MS Visio, ERWIN, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques.
Tools: and Utilities: Import & Export Wizard, Microsoft Management Console, Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau.
PROFESSIONAL EXPERIENCE:
Confidential, New York
Data Scientist
Responsibilities:
- Data extraction, data scaling, data transformations, data modelling and visualizations using R, SQL and Tableau based on requirements.
- Performed Ad-hoc reporting/customer profiling, segmentation using R/Python.
- Imported data from different sources to R- environment.
- Worked with text data and data from variety of sources like csv, html, xml, json and other file format to analyze the customer-product relationships.
- Worked on R-packages like tidyR, reshape2, lubridate, stringR and validate to clean the data.
- Worked on R-packages like dplyr, data.table, plyr, sqldf to transform the data into the desire formats.
- Extensively worked with python libraries like numpy, pandas, matplotlib.
- Implemented Logistic regression algorithm using glm() method against the apps data.
- Built various Machine Learning models like decision tree classifier using party package, Randomforest algorithm using randomforest package and solved classification problems in the data.
- Created various clustering algorithms like KNN, K-means and hierarchical clustering for customer segmentation.
- Developed Map Reduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
- Designing a machine learning pipeline using Microsoft Azure Machine Learning to predict and prescribe and Implemented a machine learning scenario for a given data problem
- Designed and developed NLP models using NLP, tm packages in R
- Built Neural Networks using nuralnet package in R
- Worked on creating positive and negative sentiment in the market as part of sentiment analysis.
- Strong work on DataMining and responsible for generating reports and dashboards with numbers and graphs to make sure loan processing times and work pipeline is properly routed as needed.
- Worked in Agile model and attending daily scrums
- Created wellness check scripts (SQL) to make sure the both stage and production environments are getting the data as intended and addressing all the production issues.
- Strong work on data mining and responsible for generating reports and dashboards with numbers and graphs to make sure loan processing times and work pipeline is properly routed as needed.
Environment: Python, R, Tableau, Machine learning, SQL
Confidential, Jersey City.
Data Analyst/Machine Learning
Responsibilities:
- A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, Scala, SQL, GIT, MongoDB, Hadoop.
- Used MLlib, Spark's Machine learning library to build and evaluate different models.
- Used R and python for Exploratory Data Analysis, A/B testing Anova test and Hypothesis test to compare and identify the effectiveness of Creative Campaigns.
- Responsible for C++ creating several ETL mapping documents (Source to Stage, Stage to Integration) for different SOR’s making sure the column loadrules, datatypes, TDQ/BDQ rules are all in place for all the required source datafields
- Created several corresponding ISD (Integration Specific Document) which includes interface list (In-bound/Out-bound), detailed file Scala, production support details (SLA’s, servers etc.,)
- Experience in developing complex Informatica maps and strong in Data warehousing concepts along with understanding of standard ETL transformation methodologies.
- Experience with dimensional and relational database design, ETL and life cycle development using Informatica power center, repository manager, Designer, workflow manager and work flow monitor
- Responsible for handling defects and escalations making sure they are addressed within time by updating corresponding documents (DDL,s, Mappings, Model changes etc.,)
- Created wellness check scripts (SQL) to make sure the both stage and production environments are getting the data as intended and addressing all the production issues.
- Strong work on data mining and responsible for generating reports and dashboards with numbers and graphs to make sure loan processing times and work pipeline is properly routed as needed.
Environment: R Studio, Python, Spark, ETL, Informatica, SQL Server 2012, 2014
Confidential, Grand Rapids, MI
Data Analyst
Responsibilities:
- Responsible for running Daily, Weekly, Monthly reports from both SQLServer and Oracle data warehouses.
- Developed various automated and customized reports along with improved template formats for various data metrics as part reporting.
- Worked with the marketing team and analyzed marketing data using Access/Excel and SQL tools to generate reports, tables, listings and graphs.
- Strong functional domain knowledge of data governance, data quality.
- Scrum master for two internal projects both involving automating various manual processes in AGILE mode to improve timely delivery and quality.
- Strong Excel working knowledge on data mining, filtering, pivot tables, formulas and setting up database connection for automatic data refresh and to share point links as well.
- Experience conducting loads using Informatica tools and handled performance tuning of Informatica jobs.
- Extensively experience on data migration, extraction, data cleansing and data staging of operational sources using ETL (Informatica) processes.
- Analyzed different kinds of data from many systems using ad-hoc queries, SQL scripts, and delivered various comparisons, trends, statistics, errors and suggestions.
- Maintained Change control process, conducted thorough analysis on various parameters, documented and presented the same to the reporting managers.
Environment: Informatica, PL/SQL, XML, Windows NT 4.0, SAS, XML, UNIX Shell Scripting, Networks.
Confidential
Data Analyst
Responsibilities:
- Created new database objects like Procedures, Functions, Packages, Triggers, Indexes and Views using T-SQL in Development and Production environment for SQLServer2000
- Actively participated in gathering of Requirement and System Specification.
- Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and formatted the results into reports and kept logs
- Strong Understanding of Agile Data Warehouse Development.
- Worked on complex T-SQL statements, and implemented various codes and functions.
- Installed, authored, and managed reports using SQLServer2005 Reporting Services
- Wrote Transact SQL utilities to generate table insert and update statements.
- Developed and optimized database structures, stored procedures, DDL triggers and user-defined functions.
- Implemented new T-SQL features added in SQLServer2005 that are Error handling through TRY-CATCH statement, CommonTableExpression (CTE).
- Created Stored Procedures to transform the data and worked extensively in T-SQL for various needs of the transformations while loading the data.
- Participated in developing prototype modeling and communicated results to the requesting individuals.
Environment: Python, SQL.
Confidential
Data Analyst
Responsibilities:
- Developed the ETL (SSIS) pipelines for data extraction.
- Developed software tools in Python to automatically scrutinize documents and electronic content.
- Developed .Net based applications for Microsoft Office document processing.
- Developed the Database SQL schema for the data pipelines.
- Performed Data Analysis and subsequent reports for QA teams to prioritize the issues.
- Strong functional domain knowledge of data governance, data quality to make sure compliance standards is properly incorporated.
- Participated in requirements gathering and development of value-adding use-cases and applications in tight cooperation with other intra-organizational units, product managers, product development teams.
- Developed the SQL jobs for generating the analytic reports.
Environment: Python, SSIS, SQL, C#.