Data Scientist Resume
New, YorK
SUMMARY:
- Over 8+ years of experience in Data Analysis, Data Integration, Migration and Meta data Management (MDM) and Configuration Management.
- Experience in Analysis, Design, Development, Deployment and Maintenance of critical software.
- Developed various Machine Learning applications with Python Scientific Stack and R.
- Experienced with Deep Learning frameworks like Scikit Learn, Tensorflow and Keras.
- Experienced Data Analyst with solid understanding of Data Mapping, Data warehousing (OLTP, OLAP), Data Mining, Data Governance and Data management services with Quality Assurance.
- Adept in Statistical Data Analysis, Exploratory Data Analysis, Machine Learning, Data Mining, Java and Data visualization using R, Python, Base SAS, SAS Enterprise Guide and SAS Enterprise Miner, Tableau and SQL
- Strong working experience in Financial and Insurance industry with substantial knowledge of various front end and back end and overall end to end processes.
- Proficient in data mining tools like R, SAS, Python, SQL, Excel, Java, ecosystems Staff leadership and Java development.
- Proficiency in preparing ETL Mappings (Source - Stage, Stage-Integration, ISD), Requirements gathering, Data Reporting, Data visualization, Advanced business dashboard and presenting in front of clients.
- Extensive experience in Object Oriented Analysis and Design (OOAD) techniques with UML using Flow Charts, Use Cases, Class Diagrams, Sequence Diagrams, Activity Diagrams and State Transition Diagrams
- Strong working knowledge with SQL, SQL Server, Oracle, SAS, Tableau and Jupyter while handling various Mat l ab applications in multiple projects
- Expertise in handling various forms of Data like MasterData, Metadata, SourceData with the ability to provide Data analytics using various tools (Access, Excel, Reporting tools, etc.) and overall working Java experience in providing qualitative & quantitative assessment of data
- Experience in data scaling, wrangling and data visualization in R, Python, SAS and Tableau
- Strong SQL query skills and Spark experience with d esigning and verifying Databases using Entity-Relationship Diagrams (ERD) and d ata profiling utilizing queries, dashboards, macros etc.
- Worked closely with the QATeam in executing the t est scenarios, p lans, p roviding t est d ata, c reating t est c ases, Issuing STR’s upon identification of bugs and collecting the t est m etrics
- Experience in performing u ser a cceptance t esting (UAT) and End to End testing monitoring test results and Networks C++ escalating based on priorities
- Experience working with WATERFALL and AGILE methodologies and demonstrated excellent quality in Mat l ab delivering the output.
TECHNICAL SKILLS:
Languages: SQL, PL/SQL, java, C, C++, XML, HTML, MATLAB, Python, Mat lab R.
Statistical Analysis: R, Python, SAS E-miner 7.1, SAS Programming, MATLAB, Minitab, Jupyter
Databases: SQL Server 2014/2012/2008/2005/2000 , MS-AccessOracle 11g/10g/9i.
DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Pentaho, Kettle, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1, R-Studio.
Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques.
Tools: and Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau.
PROFESSIONAL EXPERIENCE:
Confidential, New York
Data Scientist
Responsibilities:
- Data extraction, data scaling, data transformations, data modelling and visualizations using R, SQL and Tableau based on requirements.
- Adept in writing R scripts while working with Oracle R Enterprise (ORE)
- Performed Ad-hoc reporting/customer profiling, segmentation using R/Python.
- Created different database schemas (Oracle) with several tables containing the data related to application details, DB and OSdetails, Asset configuration details, Server details and developed several queries to obtain encryption readiness results.
- Developed Map Reduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
- Created several corresponding ISD (IntegrationSpecificDocument) which includes interface list (In-bound/Out-bound), detailed file sizes, production support details (SLA’s, servers etc.,)
- Created functional specific document for the Phase3 work including but not limited to Informatica Java, JavaScript requirements, architectural references, ETL sequence diagrams, data mappings, quality management, C++ use cases and data reconciliation details
- Performed wellness checks ( SQL) to make sure the both stage and production environments are getting the data as intended and addressing all the production related issues.
- Worked with various LOBDirectors to identify third party hosted applications and prioritized them based on severity of risk, vulnerability using Risk assessment matrix.
- Worked on Mat lab Cyber Security Questionnaire, requesting responses from all the corresponding Application Managers with in due date for each Phase, based onrisk rating ( Critical, High, Medium and Low)
- Worked closely with ADM’s (Application Development Managers) for data at rest applications to determine the DB and OS types and versions and then created a GTAC supported compatible matrix using Spark Excel to see which applications a solution and which ones have need an upgrade (Software/Hardware)
- Designing a machine learning pipeline using MicrosoftAzureMachineLearning to predict and prescribe and Implemented a machine learning scenario for a given data problem
- Designed and developed NLP models for Neural Networks sentiment analysis.
- Experience with dimensional and relational database design, ETL and lifecycledevelopment using Informatica Power Center, Repository Manager, Designer, Work flow Manager and Work flow Monitor.
- Strong work on DataMining and responsible for generating reports and dashboards with numbers and graphs to make sure loan processing times and work pipeline is properly routed as needed.
Environment: SQL/Server, Oracle 10g/11g, MS-Office, Informatica, ER Studio, XML, Hive, HDFS, Flume, Sqoop, R connector, C++, Python, R, Tableau 9.2
Confidential, Jersey City
Data Analyst/Data Engineer
Responsibilities:
- Worked on Phase3 of the overall PPM program which is to ensure data from SOR systems have answers to business monitoring questions and also to make sure data is available in the interim Java, JavaScript database in hourly cycles for reporting purposes
- A highly immersive Data Science program involving Data Manipulation&Visualization, Web Scraping, MachineLearning , Python programming , Scala , SQL, GIT, MongoDB, Hadoop .
- Used R and python for Exploratory Data Analysis, A/B testing, HQL, VQL, Data Lake, AWS Redshift, oozie, pySpark, Anova test and Hypothesis test to compare and identify the effectiveness of Creative Campaigns.
- Responsible for C++ creating several ETL mapping documents (Source to Stage, Stage to Integration) for different SOR’s making sure the column loadrules, datatypes, TDQ/BDQ rules are all in place for all the required source datafields
- Created several corresponding ISD (Integration Specific Document) which includes interface list (In-bound/Out-bound), detailed file Scala, productionsupport details (SLA’s, servers etc.,)
- Created functional specific document for the Phase3 work including but not limited to Informatica requirements, architectural references, ETL sequence diagrams, data mappings, quality management, use cases and data reconciliation details.
- Experience in developing complex I nformatica maps and strong in Data w arehousing concepts along with understanding of standard ETL transformation methodologies.
- Experience with dimensional and relational database design, ETL and life cycle development using Informatica p ower c enter , r epository m anager , Designer , w orkflow m anager and w ork flow m onitor
- Responsible for handling defects and escalations making sure they are addressed within time by updating corresponding documents (DDL,s, Mappings, Model changes etc.,)
- Created wellness check scripts (SQL) to make sure the both stage and production environments are getting the data as intended and addressing all the production issues.
- Strong work on d ata m ining and responsible for generating reports and dashboards with numbers and graphs to make sure loan processing times and work pipeline is properly routed as needed .
Environment: : R Studio, Python, Tableau, C++, java, Networks, SQL Server 2012, 2014 and Oracle 10g, 11g
Confidential, Grand Rapids, MI.
Data Analyst
Responsibilities:
- Responsible for running Daily, Weekly, Monthly reports from both SQLServer and Oracle data warehouses.
- Developed various automated and customized reports along with improved template formats for various data metrics as part reporting.
- Worked with the marketing team and analyzed marketing data using Access/Excel and SQL tools to generate reports, tables, listings and graphs.
- Strong functional domain knowledge of data governance, data quality.
- Scrum master for two internal projects both involving automating various manual processes in AGILE mode to improve timely delivery and quality.
- Responsible for creating sales reporting metrics using Cognos across all markets (11 states) and providing with improvement solutions Networks which benefitted the individual market sales revenue.
- Strong Excel working knowledge on data mining, filtering, pivot tables, formulas and setting up database connection for automatic data refresh and to share point links as well.
- Experience conducting loads using Informatica tools and handled performance tuning of Informatica jobs.
- Extensively experience on data migration, extraction, data cleansing and data staging of operational sources using ETL (Informatica) processes.
- Analyzed different kinds of data from many systems using ad-hoc queries, SQL scripts, Cognos report designs and delivered various comparisons, trends, statistics, errors and suggestions.
- Maintained Change control process, conducted thorough analysis on various parameters, documented and presented the same to the reporting managers.
Environment: : ER Studio, Informatica Power Center 8.1/9.1, Power Connect/ Power exchange, Oracle 11 g, Mainframes, DB2 MS SQL Server 2008, SQL, PL/ SQL, XML, Windows NT 4.0, Tableau, Workday, SPSS, SAS, Business Objects, XML, Tableau, Unix Shell Scripting, Networks, Teradata, Netezza.
Confidential, Bay Harbor Island, FL
Data Analyst
Responsibilities:
- Worked as part of a team that developed Machine Learning models using Natural Language Processing using Python Sklearn to provide insights on the fraudulent claims and recommending ac- tionable insights.
- Created new database objects like Procedures , Functions , Packages , Triggers , Indexes and Views using T-SQL in Development and Production environment for SQLServer2000
- Actively participated in gathering of Requirement and System Specification.
- Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and formatted the results into reports and kept logs
- Strong Understanding of Agile Data Warehouse Development.
- Worked on complex T-SQL statements, and implemented various codes and functions.
- Installed, authored, and managed reports using SQLServer2005 Reporting Services
- Wrote Transact SQL utilities to generate table insert and update statements.
- Developed and optimized database structures , stored procedures , DDL triggers and user-defined functions.
- Implemented new T-SQL features added in SQLServer2005 that are Error handling through TRY- CATCH statement, CommonTableExpression (CTE).
- Created Stored Procedures to transform the data and worked extensively in T-SQL for various needs of the transformations while loading the data .
- Participated in developing prototype modeling and communicated results to the requesting individuals.
Environment: Statistical Modeling, Machine Learning, NLP, Python, Sklearn, Gensis, Pandas, PySpark, R,Mat- plotlib, SAS, Seaborn, Tableau, Power BI, SQL.