Data Scientist Resume
New, YorK
PROFESSIONAL SUMMARY:
- Over 8+ years of experience in Data Analysis, Data Integration, Migration and Meta data Management (MDM) and Configuration Management.
- Developed various Machine Learning applications wif Python Scientific Stack and R.
- Experienced wif Deep Learning frameworks like Scikit Learn, Tensorflow and Keras.
- Experienced Data Analyst wif solid understanding of Data Mapping, Data warehousing (OLTP, OLAP), Data Mining, Data Governance and Data management services wif Quality Assurance.
- Adept in Statistical Data Analysis, Exploratory Data Analysis, Machine Learning, Data Mining, Java and Data visualization using R, Python, Base SAS, SAS Enterprise Guide and SAS Enterprise Miner, Tableau and SQL
- Strong working experience in Financial and Insurance industry wif substantial knowledge of various front end and back end and overall end to end processes.
- Proficient in data mining tools like R, SAS, Python, SQL, Excel, Java, ecosystems Staff leadership and Java development.
- Proficiency in preparing ETL Mappings (Source - Stage, Stage-Integration, ISD), Requirements gathering, Data Reporting, Data visualization, Advanced business dashboard and presenting in front of clients.
- Extensive experience in Object Oriented Analysis and Design (OOAD) techniques wif UML using Flow Charts, Use Cases, Class Diagrams, Sequence Diagrams, Activity Diagrams and State Transition Diagrams
- Strong working knowledge wif SQL, SQL Server, Oracle, SAS, Tableau and Jupyter while handling various Mat lab applications in multiple projects
- Expertise in handling various forms of Data like MasterData, Metadata, SourceData wif teh ability to provide Data analytics using various tools (Access, Excel, Reporting tools, etc.) and overall working Java experience in providing qualitative & quantitative assessment of data
- Experience in data scaling, wrangling and data visualization in R, Python, SAS and Tableau
- Strong SQL query skills and Spark experience wif designing and verifying Databases using Entity-Relationship Diagrams (ERD) and data profiling utilizing queries, dashboards, macros etc.
- Worked closely wif teh QATeam in executing teh test scenarios, plans, providing test data, creating test cases, Issuing STR’s upon identification of bugs and collecting teh test metrics
- Experience in performing user acceptance testing (UAT) and End to End testing monitoring test results and Networks C++ escalating based on priorities
- Experience working wif WATERFALL and AGILE methodologies and demonstrated excellent quality in Mat lab delivering teh output.
TECHNICAL SKILLS:
Languages: SQL, PL/SQL, java, C, C++, XML, HTML, MATLAB, Python, Mat lab R.
Statistical Analysis: R, Python, SAS E-miner 7.1, SAS Programming, MATLAB, Minitab, Jupyter
Databases: SQL Server 2014/2012/2008/2005/2000, MS-AccessOracle 11g/10g/9i.
DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Pentaho, Kettle, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1, R-Studio.
Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques.
Tools: and Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau.
PROFESSIONAL EXPERIENCE:
Confidential, New York
Data Scientist
Responsibilities:
- Data extraction, data scaling, data transformations, data modelling and visualizations using R, SQL and Tableaubased on requirements.
- Adept in writing R scripts while working wif Oracle R Enterprise (ORE)
- Performed Ad-hoc reporting/customer profiling, segmentation using R/Python.
- Created different database schemas (Oracle) wif several tables containing teh data related to application details, DB and OSdetails, Asset configuration details, Server details and developed several queries to obtain encryption readiness results.
- Developed Map Reduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
- Created several corresponding ISD (IntegrationSpecificDocument) which includes interface list (In-bound/Out-bound), detailed file sizes, production support details (SLA’s, servers etc.,)
- Created functional specific document for teh Phase3 work including but not limited to Informatica Java, JavaScript requirements, architectural s, ETL sequence diagrams, data mappings, quality management, C++ use cases and data reconciliation details
- Performed wellness checks (SQL) to make sure teh both stage and production environments are getting teh data as intended and addressing all teh production related issues.
- Worked wif various LOBDirectors to identify third party hosted applications and prioritized them based on severity of risk, vulnerability using Risk assessment matrix.
- Worked on Mat lab Cyber Security Questionnaire, requesting responses from all teh corresponding Application Managers wif in due date for each Phase, based onrisk rating ( Critical, High, Medium and Low)
- Worked closely wif ADM’s (Application Development Managers) for data at rest applications to determine teh DB and OS types and versions and tan created a GTAC supported compatible matrix using Spark Excel to see which applications a solution and which ones has need an upgrade (Software/Hardware)
- Designing a machine learning pipeline using MicrosoftAzureMachineLearning to predict and prescribe and Implemented a machine learning scenario for a given data problem
- Designed and developed NLP models for Neural Networks sentiment analysis.
- Experience wif dimensional and relational database design, ETL and lifecycledevelopment using Informatica Power Center, Repository Manager, Designer, Work flow Manager and Work flow Monitor.
- Strong work on DataMining and responsible for generating reports and dashboards wif numbers and graphs to make sure loan processing times and work pipeline is properly routed as needed.
Environment: SQL/Server, Oracle 10g/11g, MS-Office, Informatica, ER Studio, XML, Hive, HDFS, Flume, Sqoop, R connector, C++, Python, R, Tableau 9.2
Confidential, Jersey City
Data Analyst/Machine learning.
Responsibilities:
- Worked on Phase3 of teh overall PPM program which is to ensure data from SOR systems has answers to business monitoring questions and also to make sure data is available in teh interim Java, JavaScript database in hourly cycles for reporting purposes
- A highly immersive Data Science program involving Data Manipulation&Visualization, Web Scraping, MachineLearning, Python programming, Scala, SQL, GIT, MongoDB, Hadoop.
- Used R and python for Exploratory Data Analysis, A/B testing, HQL, VQL, Data Lake, AWS Redshift, oozie, pySpark, Anova test and Hypothesis test to compare and identify teh TEMPeffectiveness of Creative Campaigns.
- Responsible for C++ creating several ETL mapping documents (Source to Stage, Stage to Integration) for different SOR’s making sure teh column loadrules, datatypes, TDQ/BDQ rules are all in place for all teh required source datafields
- Created several corresponding ISD (Integration Specific Document) which includes interface list (In-bound/Out-bound), detailed file Scala, productionsupport details (SLA’s, servers etc.,)
- Created functional specific document for teh Phase3 work including but not limited to Informatica requirements, architectural s, ETL sequence diagrams, data mappings, quality management, use cases and data reconciliation details.
- Experience in developing complex Informatica maps and strong in Data warehousing concepts along wif understanding of standard ETL transformation methodologies.
- Experience wif dimensional and relational database design, ETL and life cycle development usingInformatica powercenter, repository manager,Designer, workflow manager and work flow monitor
- Responsible for handling defects and escalations making sure they are addressed wifin time by updating corresponding documents (DDL,s, Mappings, Model changes etc.,)
- Created wellness check scripts (SQL) to make sure teh both stage and production environments are getting teh data as intended and addressing all teh production issues.
- Strong work on data mining and responsible for generating reports and dashboards wif numbers and graphs to make sure loan processing times and work pipeline is properly routed as needed.
Environment: R Studio, Python, Tableau, C++, java, Networks, SQL Server 2012, 2014 and Oracle 10g, 11g
Confidential, Grand Rapids, MI
Data Analyst
Responsibilities:
- Responsible for running Daily, Weekly, Monthly reports from both SQLServer and Oracle data warehouses.
- Developed various automated and customized reports along wif improved template formats for various data metrics as part reporting.
- Worked wif teh marketing team and analyzed marketing data using Access/Excel and SQL tools to generate reports, tables, listings and graphs.
- Strong functional domain knowledge of data governance, data quality .
- Scrum master for two internal projects both involving automating various manual processes in AGILE mode to improve timely delivery and quality.
- Responsible for creating sales reporting metrics using Cognos across all markets (11 states) and providing wif improvement solutions Networks which benefitted teh individual market sales revenue.
- Strong Excel working knowledge on data mining, filtering, pivot tables, formulas and setting up database connection for automatic data refresh and to share point links as well.
- Experience conducting loads usingInformaticatools and handled performance tuning of Informatica jobs.
- Extensively experience on data migration, extraction, data cleansing and data staging of operational sources using ETL (Informatica) processes.
- Analyzed different kinds of data from many systems using ad-hoc queries, SQL scripts, Cognos report designs and delivered various comparisons, trends, statistics, errors and suggestions.
- Maintained Change control process, conducted thorough analysis on various parameters, documented and presented teh same to teh reporting managers.
Environment: ER Studio, Informatica Power Center 8.1/9.1, Power Connect/ Power exchange, Oracle 11g, Mainframes, DB2 MS SQL Server 2008, SQL, PL/SQL, XML, Windows NT 4.0, Tableau, Workday, SPSS, SAS, Business Objects, XML, Tableau, Unix Shell Scripting, Networks, Teradata, Netezza.
Confidential
Data Analyst
Responsibilities:
- Worked as part of a team dat developed Machine Learning models using Natural Language Processing using Python Sklearn to provide insights on teh fraudulent claims and recommending ac- tionable insights.
- Created new database objects like Procedures, Functions, Packages, Triggers, Indexes and Views using T-SQL in Development and Production environment for SQLServer2000
- Actively participated in gathering of Requirement and System Specification.
- Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and formatted teh results into reports and kept logs
- Strong Understanding of Agile Data Warehouse Development.
- Worked on complex T-SQL statements, and implemented various codes and functions.
- Installed, authored, and managed reports using SQLServer2005 Reporting Services
- Wrote Transact SQL utilities to generate table insert and update statements.
- Developed and optimized database structures, stored procedures, DDL triggers and user-defined functions.
- Implemented new T-SQL features added in SQLServer2005 dat are Error handling through TRY-CATCH statement, CommonTableExpression (CTE).
- Created Stored Procedures to transform teh data and worked extensively in T-SQL for various needs of teh transformations while loading teh data.
- Participated in developing prototype modeling and communicated results to teh requesting individuals.
Environment: Statistical Modeling, Machine Learning, NLP, Python, Sklearn, Gensis, Pandas, PySpark, R,Mat- plotlib, SAS, Seaborn, Tableau, Power BI, SQL.
Confidential
Data Analyst
Responsibilities:
- Developed teh ETL (SSIS) pipelines for data extraction.
- Developed software tools in Python to automatically scrutinize documents and electronic content.
- Developed .Net based applications for Microsoft Office document processing.
- Developed teh Database SQL schema for teh data pipelines.
- Performed Data Analysis and subsequent reports for QA teams to prioritize teh issues.
- Strong functional domain knowledge of data governance, data quality to make sure compliance standards is properly incorporated.
- Participated in requirements gathering and development of value-adding use-cases and applications in tight cooperation wif other intra-organizational units, product managers, product development teams.
- Developed teh SQL jobs for generating teh analytic reports.
Environment: Python, SSIS, SSRS, SQL, Sklearn, C#, Matplotlib, PostgreSQL