Data Scientist Resume
Charlotte, NC
SUMMARY
- Over 8 years of Experience in Designing, Administration, Analysis, Management in the Business Intelligence Data warehousing Web - based Applications and Databases and Experience in industries such asRetail, Financial, Accounting, Distribution, Logistics, Inventory, Manufacturing, Marketing, Services, Networking and Engineering
- Experience in all the Latest BI Tools Tableau, Qlikview Dashboard Design and SAS.
- Analyze and extract relevant information from large amounts of data to help automate for self-monitoring, self-diagnosing, self-correcting solutions and optimize key processes.
- Experience in data architecture design, development, maintenance for Windows and Android device applications.
- Familiar on building models wif big data frameworks like Cloudera Manager and Hadoop.
- Experience in managing full life cycle of Data Science project includes transforming business requirements into Data Collection, Data Cleaning, Data Preparation, Data Validation, Data Mining, and Data Visualization from structured and unstructured Data Sources.
- Sound RDBMS concepts and extensively worked wif Oracle 8i 9i 10g 11g, DB2, SQL Server 8.0 9.0 10.0 10.5 11.0, MySQL, and MS-Access.
- Expertise in Project Management, Analysis, Estimation, wif a unique mix of managerial, functional, domain, technical and client handling skills.
- Expertise and experience in SQL, SAS and Relational databases. Deep understanding & exposure of Data mining.
- Excellent knowledge in Normalization (1NF, 2NF, 3NF and BCNF) and De-normalization techniques for improved database performance in OLTP, OLAP and Data Warehouse/Data Mart environments.
- 2+ years' experience in Agile background of software/data design, development, deployment to build services and customer support in Enterprise applications using Object Oriented Analysis and Design (OOAD).
- Work on gigabytes of text and image files (2-D and 3-D) to solve real-world problems and visualize the data from the generating data reports using Google Data Studio for customer usability.
- Experience in using Python and statistical software (R, Excel, Tableau).
- Good track record of working wif complex data sets and translating data into insights to drive key business and product decisions.
- Experience wif Azure, SQL and Oracle PL/SQL.
- Experience working wif Amazon Web Services (AWS) product like S3
- Involved in a Aveva start-up mode and contributed to projects using Amazon Web Services (AWS) to develop and deploy applications support on device and cloud
- Hands on experience wif scripting languages like Perl, Bash Shell and PHP (for automation)
- Good understanding of scalable data processing to discover hidden patterns, conducting error analysis in the data for financial and statistical modeling
- Familiar wif multiple Operating System (OS) and developing environments including Linux (Ubuntu), Windows, etc.
- Familiar wif nightly build management tools like Visual Studio Team Foundation Server (VSTFS)
- Experience developing software in traditional programming languages (in C, C++) using tools like MS Visual Studio Compact Framework (VSCF) for the Windows Mobile Platform
- Familiar wif configuration management and repository management of subversion (SVN) control system source code using tools like GIT and MS Team Foundation Version Control.
- Extensive experience in Data Modeling, Data Analysis and design of OLTP and OLTP systems.
- Expertise in the Data Analysis, Design, Development, Implementation and Testing using Data Conversions, Extraction, Transformation and Loading (ETL) and SQL Server, ORACLE and other relational and non-relational databases.
TECHNICAL SKILLS
Languages: T-SQL, PL/SQL, SQL, C, C++, XML, HTML, DHTML, HTTP, Matlab, DAX, Python
Databases: SQL Server 2014/2012/2008/2005/2000, MS-Access,Oracle 11g/10g/9i and Teradata, big data, Hadoop
DWH / BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.
Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies
Tools and Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Visual Studio .Net, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau, JIRA, SparkMlib
PROFESSIONAL EXPERIENCE
Confidential, Charlotte, NC
Data Scientist
Responsibilities:
- Worked closely wif business, data governance, SMEs and vendors to define data requirements.
- Designed and provisioned the platform architecture to execute Hadoop and machine learning use cases under Cloud infrastructure, AWS, EMR, and S3.
- Selection of statistical algorithms - (Two Class Logistic Regression Boosted Decision Tree, Decision Forest Classifiers etc)
- Used MLlib, Spark's Machine learning library to build and evaluate different models.
- Worked in using Teradata14 tools like Fast Load, Multi Load, T Pump, Fast Export, Teradata Parallel Transporter (TPT) and BTEQ.
- Involved in creatingDataLake by extracting customer's BigDatafrom variousdatasources into Hadoop HDFS. dis includeddatafrom Excel, Flat Files, Oracle, SQL Server, MongoDb, Cassandra, HBase, Teradata, Netezza and also logdatafrom servers
- Used Spark Data frames, Spark-SQL, SparkMLLibextensively and developing and designing POC's using Scala, Spark SQL andMLliblibraries.
- Created high level ETL design document and assisted ETL developers in the detail design and development of ETL maps using Informatica.
- Used R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks of welfare dependency.
- Helped in migration and conversion of data from the Sybase database into Oracle database, preparing mapping documents and developing partial SQL scripts as required.
- Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy Oracle and SQL Server database systems
- Executed ad-hocdataanalysis for customer insights using SQL using Amazon AWS Hadoop Cluster.
- Worked on predictive and what-if analysis using R from HDFS and successfully loaded files to HDFS from Teradata, and loaded from HDFS to HIVE.
- Designed the schema, configured and deployed AWSRedshiftfor optimal storage and fast retrieval ofdata.
- Analyzed data and predicted end customer behaviors and product performance by applying machine learning algorithms using SparkMLlib.
- Performed data mining on data using very complex SQL queries and discovered pattern and Used extensive SQL fordataprofiling/analysis to provide guidance in building thedatamodel.
Environment: R, Machine Learning, Teradata 14, Hadoop Map Reduce, Pyspark, Spark, R, Spark MLLib, Tableau, Informatica, SQL, Excel, VBA, BO, CSV, Erwin, SAS, AWS Redshift, Scala Nlp, Cassandra, Oracle, MongoDB, Informatica MDM, Cognos,SQL Server 2012, Teradata, DB2, SPSS, T-SQL, PL/SQL, Flat Files, XML, and Tableau.
Confidential, Sanjose CA
Data Scientist
Responsibilities:
- Involved in data management including Data Modeling, Metadata, Data Analysis, Data mapping and Data Dictionaries, Erwin9.1.
- involved in Data Modeling (Oracle/MySQL/Netezza), Data Characterization, Workflow design and implementation.
- Involved in data management including Data Modeling, Metadata, Data Analysis, Data mapping and Data Dictionaries, Erwin9.1 and involved in Data Modeling (Oracle/MySQL/Netezza), Data Characterization, Workflow design and implementation.
- Worked on logical and physical modeling of various data marts as well as data warehouse using Taradata14.
- Created and maintained LogicalDataModel (LDM) for the project. Includes documentation of all entities, attributes,datarelationships, primary and foreign key structures, allowed values, codes and glossary terms in accordance wif the CorporateDataDictionary etc.
- Implemented Bulk Load Process by converting existing Triggers toOracle11g packages to improve the Data loading process.
- Designed and Developed Oracle11g, PL/SQL Procedures and UNIX Shell Scripts forData Import/Export andDataConversions.
- Maintained warehouse metadata, naming standards and warehouse standards.
- Involved in the validation of the OLAP Unit testing and System Testing of the OLAP Report Functionality anddatadisplayed in the reports.
- Worked in using Teradata14.1 tools like Fast Load, Multi Load, T Pump, Fast Export, Teradata Parallel Transporter (TPT) and BTEQ.
- Designed the overall ETL solution including analyzing data, preparation of high level, detailed design documents, test plans and deployment strategy.
- Strong knowledge of Entity-Relationship concept, Facts and dimensions tables, slowly changing dimensions and Dimensional Modeling (Star Schema and Snow Flake Schema).
- Created Software Development Life Cycle (SDLC) document and excelled in the process of Change Management, Release management and Configuration processes.
- PreparedDataVisualization reports for the management using R.
- Documented logical, physical, relational and dimensionaldatamodels. Designed theDataMarts in dimensionaldatamodeling using star and snowflake schemas.
- Involved inNormalizationandDe-Normalizationof existing tables for faster query retrieval.
- CreateMDMbase objects, Landing and Staging tables to follow the comprehensive data model in MDM.
- Used ETL methodology for supportingdataextraction, transformations and loading processing, in a complex MDM using Informatica.
- PerformedDataValidation andDataCleaning using PROC SORT, PROC FREQ and through various SAS formats.
- DesignedDataStaging Area andDataWarehouse to integrate thedatafrom various sources including Flat Files to facilitate management to make more fact based decisions.
- Created jobs, alerts to run SSIS,SSRSpackages periodically. Created the automated processes for the activities such as database backup processes and SSIS,SSRSPackages run sequentially using SQL Server Agent job and windows Scheduler.
- Manageddatain different locations using differentdatamarts usingAzure.
- Perform reverse engineering of physicaldatamodels from databases and SQL scripts.
- Involved in Normalization (3rd normal form), De-normalization (Star Schema forDataWarehousing.
- Used SSIS to create ETL packages to validate, extract, transform and load data to data warehouse databases, data mart databases to OLAP databases.
- Implemented slowly changing dimensions Type2 and Type3 for accessing history of referencedata changes.
- Extensively used SAS to query and subset data, summarize and present data, combine tables using joins and merges and created and modified tables.
- Combined views and reports into interactive dashboards inTableauDesktop that were presented to Business Users, Program Managers, and End Users.
- Utilized SDLC and Agile methodologies such as SCRUM.
- Worked in PL/SQL Programming (Stored procedures, Triggers, Packages) using Oracle (SQL, PL/SQL), SQL Server2008 and UNIX shell scripting to perform job scheduling.
Environment: ERwin9.1, Teradata14, Oracle10g, PL/SQL, UNIX, Agile, Azure, TIDAL, MDM, ETL, BTEQ, SQL Server2008, Informatica MDM, Netezza, DB2, SAS, Tableau, UNIX, SSRS, SSIS, T-SQL, MDM, Informatica, SQL
Confidential, Woonsocket, RI
Data/Business Analyst
Responsibilities:
- Interacted wif users to understand complex business requirements and documented the requirements.
- Experienced in designing, developing and data modeling of the application and ensured that they are wifin the Salesforce governor limits.
- Implemented end-to-end systems for DataAnalytics, DataAutomation and integrated wif custom visualization tools using R,Mahout, Hadoop and MongoDB.
- Gathering all the data that is required from multiple data sources and creating datasets that will be used in analysis.
- Performed Exploratory DataAnalysis and DataVisualizations using R, andTableau.
- Perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined effects.
- Worked wifDatagovernance,Dataquality,datalineage,Dataarchitectto design various models and processes.
- Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using wif BigData/Hadoop.
- Designeddatamodels anddataflow diagrams using Erwin and MSVisio.
- As an Architect implemented MDM hub to provide clean, consistent data for a SOA implementation.
- Developed, Implemented & Maintained the Conceptual, Logical&PhysicalDataModels using Erwin for Forward/ReverseEngineered Databases.
- EstablishedDataarchitecture strategy, best practices, standards, and roadmaps.
- Lead the development and presentation of a data analytics data-hub prototype wif the help of the other members of the emerging solutions team
- Performed data cleaning and imputation of missing values using R.
- Worked wif Hadoop eco system covering HDFS, HBase, YARN and MapReduce
- Take up ad-hoc requests based on different departments and locations
- Used Hive to store the data and perform data cleaning steps for huge datasets.
- Created dash boards and visualization on regular basis using ggplot2 and Tableau
- Creating customized business reports and sharing insights to the management.
- Worked wif BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
- Interacted wif the other departments to understand and identify data needs and requirements and work wif other members of the IT organization to deliver data visualization and reporting solutions to address those needs
Environment: Erwin r9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro, Hadoop, PL/SQL, etc
Confidential, Chicago, IL
Data/Business Analyst
Responsibilities:
- Involved in various activities of the project, like information gathering, analyzing the information, documenting the functional and non-functional requirements.
- Worked in Data warehousing methodologies/Dimensional Data modeling techniques such as Star/Snowflake schema using ERWIN9.1.
- Extensively used AginityNetezzaworkbench to perform various DDL, DML etc. operations on Netezzadatabase.
- Designed the Data Warehouse and MDM hub Conceptual, Logical and Physical data models.
- Involved in Perform Daily Monitoring ofOracleinstances usingOracleEnterprise Manager, ADDM, TOAD, monitor users, table spaces, memory structures, rollback segments, logs, and alerts.
- Used ER StudioData/Modelerfordatamodeling (datarequirements analysis, database design etc.) of custom developed information systems, including databases of transactional systems anddatamarts.
- Involved in Teradata SQL Development, Unit Testing and Performance Tuning and to ensure testing issues are resolved on the basis of using defect reports.
- Involved in customized reports using SAS/MACRO facility, PROC REPORT, PROC TABULATE and PROC.
- UsedNormalizationmethods up to 3NF andDe-normalizationtechniques for effective performance in OLTP and OLAP systems.
- Generated DDL scripts using Forward Engineering technique to create objects and deploy them into the databases.
- Involved in database testing, writing complex SQL queries to verify the transactions and business logic like identifying the duplicate rows by using SQLDeveloperand PL/SQLDeveloper.
- Used Teradata SQL Assistant, Teradata Administrator, PMON and data load/export utilities like BTEQ, FastLoad, Multi Load, Fast Export, Tpump on UNIX/Windows environments and running the batch process for Teradata.
- Worked ondataprofiling anddatavalidation to ensure the accuracy of thedatabetween the warehouse and source systems.
- Worked on Data warehouse concepts like Data warehouse Architecture,Starschema, Snowflakeschema, and Data Marts, Dimension and Fact tables.
- Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.
- Wrote TSQL like Indexes, Views, Stored Procedures and Triggers inSSMSto fulfill Requirements.
- Involved in Database migrations from legacy systems, SQL server to Oracle andNetezza.
- Used SSIS to create ETL packages to validate, extract, transform and load data to pull data from Source servers to staging database and tan toNetezzaDatabase and DB2 Databases.
Environment: Windows XP,SQL Developer, MS-SQL 2008 R2, MS-Access, MS Excel and SQL-PLU, Java, SSRS, SSIS.
Confidential
Data/Business Analyst
Responsibilities:
- Developed Apex Classes, Controller Classes and Apex Triggers for various functional needs in the application.
- Migrated data from external sources and performed Insert, Delete, Upsert & Export operations on millions of records.Designed and developed Service cloud and Integration.
- Writing and executing customized SQL code for ad hoc reporting duties and used other tools for routine
- Developed stored procedures and complex packages extensively using PL/SQL and shell programs
- Involved in customized reports using SAS/MACRO facility, PROC REPORT, PROC TABULATE and PROC
- Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy SQL Server database systems
- Used existing UNIX shell scripts and modified them as needed to process SAS jobs, search strings, execute permissions over directories etc.
- Extensively used Star Schema methodologies in building and designing the logical data model into Dimensional Models
- Involved in designing Context Flow Diagrams, Structure Chart and ER- diagrams
- Worked on database features and objects such as partitioning, change data capture, indexes, views, indexed views to develop optimal physical data mode
- Worked wif SQL Server Integration Services in extracting data from several source systems and transforming the data and loading it into ODS
- Involved in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch..
Environment: Windows XP,SQL Developer, MS-SQL 2008 R2, MS-Access, MS Excel and SQL-PLU, Java.
Confidential
.NET Developer
Responsibilities:
- Worked on Agile Methodology (Scrum Framework) to meet timelines wif quality deliverables.
- Participated in daily scrums and weekly meetings wif the project team to meet expectations and deadlines.
- Involved in gathering and preparing the requirements from clients for product design and enhancements.
- Developed solutions for diverse programming scenarios in C#, employing Object Oriented Programming (OOP) concepts such as: encapsulation, inheritance, polymorphism, and abstraction.
- Designed and Developed win forms usingVB.NETand Java Script for the GUIs using code behind class technique.
- Designed, implemented and configured WCF service layer.
- Worked wif XML, XSD and XSLT while implementing WCF.
- Utilized ADO.Net technology extensively for data retrieving, querying, storage and manipulation using LINQ.
- Designed the web UI using, VB.NET, HTML, DHTML, XSL/XSLT, JavaScript, CSS, Web Forms and AJAX controls.
- Created User Controls, Custom controls, Data Access Layer, Business Logic Layer classes using VB.Net for web pages.
- Extensively used User interface controls which use JQuery and JavaScript to validate client-side validation.
- Written Stored Procedures in SQL Server 2005, Oracle and used ADO.NET and Grid View, Data List, Details view, Repeaters, Dataset Classes for data manipulation.
- Worked Extensively wif Query Optimization Techniques to Fetch Data wif better Performance Tuning.
- Developed Reports used wif SQL Server Reporting Services SSRS.
Environment: VisualStudio2008, C#, ASP.NET 3.0, Entity Frame Work 4.0, HTML, DHTML, Web Forms, Java Script, JQuery, XML, WCF, IIS 7.0, AJAX, ADO.NET, LINQ, SQL Server 2005, Oracle, PL/SQL, UML, SQL Server Reporting Services.
