Data Scientist/data Analyst Resume
San Francisco, CaliforniA
PROFESSIONAL SUMMARY:
- Above 8+ years of experience in large datasets of Structured and Unstructured data, Data Visualization, Data Acquisition, Predictive modeling, DataValidation.
- Developing Logical Data Architecture with adherence to Enterprise Architecture.
- Experience in applying PredictiveModeling and MachineLearning algorithms for Analytical projects.
- Experience in Extracting data for creating Value Added Datasets using Azure, Python, R, SAS and SQL to analyze teh behaviour to target a specific set of customers to obtain hidden insights within teh data to effectively implement teh project Objectives.
- Collaborated with teh lead Data Architect to model teh Data warehouse in accordance to FSLDM subject areas, Snow flake schema, 3NF format.
- Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, desktop platforms and Storyline on webs.
- Experience in coding SQL/PLSQL using Procedures, Triggers and Packages.
- Experience in foundational machine learning models and concepts: regression, random forest, boosting, HMMs, CRFs, GBM, NNs, MRFs, deep learning.
- Highly skilled in using visualization tools like ggplot2, Tableau and d3.js for creating dashboards.
- Proficiency in understanding statistical and other tools/languages - R, C, C++, Java, Python, SQL, UNIX, Qlikview data visualization tool and Anaplan forecasting tool.
- Cluster Analysis, TEMPPrincipal Component Analysis (PCA), Recommender Systems, Association Rules.
- Strong experience in teh Analysis, design, development, testing and Implementation of Business Intelligence solutions using Data Warehouse/Data Mart Design, OLAP, BI, Client/Server applications.
- Integration Architect & Data Scientist experience in Analytics, BPM, SOA, ETL, BigData and Cloud technologies.
- Proficient in teh Integration of various data sources with multiple relational databases like Oracle11g/10g/9i, DB2, MS SQL Server, Teradata and Flat Files into teh staging area, ODS, Data Warehouse and Data Mart.
- Experience in using Statistical procedures and Machine Learning algorithms such as ANOVA, Clustering and Regression and Time Series Analysis to analyze data for further Model Building.
- Proficient in Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution and other advanced statistical and econometric techniques.
- Strong understanding of Data Modeling and Datamining.
- Developing Logical Data Architecture with adherence to Enterprise Architecture.
- Experience in Business Intelligence/DataWarehousing Design and Architect, Dimension Data Modelling, ETL, OLPACube, Reporting and other BItools.
- Excellent Knowledge in Relational Database Design, Data Warehouse/OLAP concepts and methodologies.
- Designing of Physical Data Architecture of New system engines.
- Experience in data driven statistical analysis like sampling, finding teh data distribution, hypothesis testing, correlation among teh variable, outlier detection, analysis of variance, probability theory.
TECHNICAL SKILLS:
Database Design Tools: Physical, Fact & Dimensions tables, Normalization and De-normalization techniques, logical data modeling, Kimball.
Databases: MS-Access, Oracle 11g, SQL Server 20017, Sybase and DB2.
Languages: C, C++, XML, HTML, DHTML, HTTP, PL/SQL, SQL, T-SQL,, Matlab.
Tools: and Utilities: SQL Server, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Visu, Microsoft Management Console, Visual Source Safe 6.0, Excel Power Pivot, Excel Data Explorer, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office 2007/10/13.
Operating Systems: Microsoft Windows 10/8/7/XP, UNIX and Linux.
PROFESSIONAL EXPERIENCE:
Confidential
Data Scientist
Responsibilities:
- Responsible for performing Machine-learning techniques regression/classification to predict teh outcomes.
- Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modeling.
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica and Business Objects.
- Designed teh prototype of teh Data mart and documented possible outcome from it for end-user.
- Involved in business process modeling using UML
- Developed and maintained data dictionary to create metadata reports for technical and business purpose.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Interaction with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions.
- Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL.
- Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats
- Performance tuning of teh database, which includes indexes, and optimizing SQL statements, monitoring teh server
- Wrote simple and advanced SQL queries and scripts to create standard and adhoc reports for senior managers.
- Collaborate teh data mapping document from source to target and teh data quality assessments for teh source data.
- Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for teh new programs.
- Participated in Business meetings to understand teh business needs & requirements.
- Prepare ETL architect& design document which covers ETL architect, SSIS design, extraction, transformation and loading of Duck Creek data into dimensional model.
- Provide technical & requirement guidance to teh team members for ETL -SSISdesign.
- Participated in Business meetings to understand teh business needs & requirements.
- Design ETL framework and development.
- Design Logical & Physical Data Model using MS Visio 2003 data modeler tool.
- Participated in stake holders meetings to understand teh business needs & requirements.
- Participated in Architect solution meetings & guidance in Dimensional Data Modeling design.
- Coordinate and communicate with technical teams for any data requirements.
Environment: Python, Spark MLlib, TensorFlow, K- means, ANN, Regression, Oryx 2, Accord.NET, Flask, ORM, Jinja 2, Amazon Machine Learning (AML),Apache,Django, Mako, Naive Bayes, SVM.
Confidential, Minneapolis, Minnesota
Data Scientist
Responsibilities:
- Data modeling and formulation of statistical equations using advanced statistical forecasting techniques.
- Document teh complete process flow to describe program development, testing, application integration, coding and implementation.
- Scoring predictive models as per regulatory requirements & ensuring deliverables with PSI.
- Built predictive scorecards for Life Insurance, TD, Cross-selling Car loan and RD.
- Mentoring Provide guidance to team members.
- Developing propensity models for Retail liability products to drive proactive campaigns.
- Transformation, Data cleansing and creating new variables using R.
- Responsible for defining teh functional requirement documents for each source to target interface.
- Identifying teh Customer and account attributes required for MDM implementation from disparate sources and preparing detailed documentation.
- Approve and Present designed Logical Data Model in Data Model Governance Committee (DMGC)
- Tabulation and Extraction of data from multiple data sources using R, SAS.
- Work with users to identify teh most appropriate source of record and profile teh data required for sales and service.
- Extracted data from HDFS and prepared data for exploratory analysis using datamunging.
- Responsible for defining teh key identifiers for each mapping/interface.
- Validated teh machine learning classifiers using ROC Curves and Lift Charts.
- Arrange and chair Data Workshops with SME's and related stake holders for requirement data catalogue understanding.
Environment: regression, logistic regression, Hadoop, Teradata, OLTP, Unix, Python, MLLib, SAS, random forest, OLAP, HDFS, NLTK, SVM, JSON and XML.
Confidential, San Francisco, California
Data Scientist/Data Analyst
Responsibilities:
- Worked on data cleaning and reshaping, generated segmented subsets using Numpy and Pandas in Python.
- Developed Python scripts to automate data sampling process. Ensured teh data integrity by checking for completeness, duplication, accuracy, and consistency.
- Worked on model selection based on confusion matrices, minimized teh Type II error
- Generated cost-benefit analysis to quantify teh model implementation comparing with teh former situation.
- Wrote and optimized complex SQL queries involving multiple joins and advanced analytical functions to perform data extraction and merging from large volumes of historical data stored in Oracle 11g, validating teh ETL processed data in target database.
- Conducted model optimization and comparison using stepwise function based on AIC value
- Applied various machine learning algorithms and statistical modeling like decision tree, logistic regression, Gradient Boosting Machine to build predictive model using scikit-learn package in Python
- Continuously collected business requirements during teh whole project life cycle.
- Identified teh variables dat significantly effect teh target
- Generated data analysis reports using Matplotlib, Tableau, successfully delivered and presented teh results for C-level decision makers.
Environment: Teradata 13, Erwin 8, SQL Server 2008, Oracle 9i, PL/SQL, OLAP, Informatica Power Center, SQL*Loader, ODS, OLTP, SSAS.
Confidential, Brentwood, TN
Data Modeler
Responsibilities:
- Develop Integrations jobs to transfer data from source system to Hadoop.
- Installation of Talend Studio.
- Technical design documents for Transformation processes.
- Application of business rules on teh data being transferred.
- Task allocation for teh ETL and Reporting team.
- Communicate effectively with client and their internal development team to deliver product functionality requirements.
- Architecting and design of data warehouse ETL processes.
- Demo of POC built for teh prospective customer and provide guidance and gather teh feedback to backend ETL testing on SQL Server 2008 using SSIS.
- Create teh Operational manual Document.
- Create Integration Jobs to backup a copy of data in network file system.
- Design and implement teh ETL Data model and create staging, source and Target tables in SQL server database.
- Gathering and analysis requirements definition meetings with business users and document meeting outcomes.
Environment: ETL, ODS, Hadoop, MS Office, Talend Studio, OLAP, SQL Server 2008.
Confidential
Data Modeler
Responsibilities:
- Developed and implemented predictive models using Natural Language Processing Techniques and machine learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, Random Forests, K-means clustering, KNN, PCA and regularization for data analysis.
- Designed and developed Natural Language Processing models for sentiment analysis.
- Applied clustering algorithms me.e. Hierarchical, K-means with halp of Scikit and Scipy.
- Developed visualizations and dashboards using ggplot, Tableau.
- Worked on development of data warehouse, Data Lake and ETL systems using relational and non relationaltools like SQL, NoSQL.
- Built and analyzed datasets using Matlab R, SAS and Python (in decreasing order of usage)
- Participated in all phases of datamining; datacollection, datacleaning, developingmodels, validation, visualization and performed Gapanalysis.
- DataManipulation and Aggregation from different source using Nexus, Toad, Business Objects, PowerBI and Smart View.
- Implemented Agile Methodology for building an internal application.
- Good knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and MapReduce concepts.
- As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
- Programmed a utility in Python dat used multiple packages (scipy, numpy, pandas)
- Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, and Naive Bayes.
- Used Teradata15 utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
- Maintenance in teh testing team for System testing/Integration/UAT.
- Involved in preparation & design of technical documents like Bus Matrix Document, PPDM Model, and LDM & PDM.
- Understanding teh client business problems and analyzing teh data by using appropriate Statistical models to generate insights.
Environment: MDM, QlikView, MLLib, HADOOP, MapReduce, PIG, MAHOUT, R, Erwin, Tableau, PL/SQL, JAVA, HIVE, AWS, HDFS, Teradata, JSON, Spark, R Studio.
Confidential
Data Analyst
Responsibilities:
- Developed Tableau visualizations and dashboards using Tableau Desktop.
- Applied Business Objects best practices during development with a strong focus on reusability and better performance.
- Developed and executed load scripts using Teradata client utilities FASTLOAD, MULTILOAD and BTEQ.
- Generated periodic reports based on teh statistical analysis of teh data using SQL Server Reporting Services.
- Used Graphical Entity-Relationship Diagramming to create new database design via easy to use, graphical interface.
- Maintained metadata (data definitions of table structures) and version controlling for teh data model.
- Designed different type of STAR schemas for detailed data marts and plan data marts in teh OLAP environment.
- Write SQL scripts to test teh mappings and Developed Traceability Matrix of Business Requirements mapped to Test Scripts to ensure any Change Control in requirements leads to test case update.
- Responsible for development and testing of conversion programs for importing Data from text files into map Oracle Database utilizing PERL, shell scripts & SQL*Loader.
- Utilized Erwin's forward/reverse engineering tools and target database schema conversion process.
- Developed SQLscripts for creating tables, Sequences, Triggers, views and materialized views.
- Co-ordinate with various business users, stakeholders and SME to get Functional expertise, design and business test scenarios review, UAT participation and validation of financial data.
- Worked on creating enterprise wide Model EDM for products and services in Teradata Environment based on teh data from PDM. Conceived, designed, developed and implemented this model from teh scratch.
Environment: Oracle SQL Developer, MS SQL Server, SQL*PLUS, PL/SQL, Business Objects, SQL*LOADER, Tableau, Informatica, XML, Windows XP, TOAD, Business Objects.