Sr. Data Scientist/Data Architect Resume Richmond,VA - Hire IT People

SUMMARY:

10+ years of experience in Data Science, Data Modeling, Data Analysis, Data Warehousing, Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Statistical modeling, Data modeling, Data Visualization, Web Crawling, Web Scraping. Adept in statistical programming languages like R and Python, SAS, Apache Spark, Matlab including Big Data technologies like Hadoop, Hive, Pig.
Experienced in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.
Expertise in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling (decision trees, regression models, neural networks, SVM, clustering), dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K - fold cross validation and data visualization.
Extensive experience in Relational and Dimensional Data modeling for creating Logical and Physical Design of Database and ER Diagrams using multiple data modeling tools like Erwin and ER Studio.
Strong knowledge in all phases of the SDLC (Software Development Life Cycle) from analysis, design, development, testing, implementation and maintenance.
Expertise in synthesizing Machine learning, Predictive Analytics and Big data technologies into integrated solutions.
Experienced in Data Modeling techniques employing Data warehousing concepts like star/snowflake schema and Extended Star.
Experience in using various packages in Rand python like ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitteR, NLP, Reshape2, rjson, plyr, pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, Beautiful Soup, Rpy2.
Expertise in writing functional specifications, translating business requirements to technical specifications, created/maintained/modified database design document with detailed description of logical entities and physical tables.
Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of Big Data Eco-system.
Excellent experience in Normalization (1NF, 2NF, 3NF and BCNF) and De-normalization techniques for effective and optimum performance in OLTP and OLAP environments.
Expertise in OLTP/OLAP System Study, Analysis and E-R modeling, developing Database Schemaslike Star schema and Snowflake schema used in relational, dimensional and multidimensional modeling.
Excellent experience in Extract, Transfer and Load process using ETL tools like Data Stage, Informatica, Data Integrator and SSIS for Data migration and Data Warehousing projects.
Expertise in Data Analysis, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Center.
Proficient in Machine Learning, Data/Text Mining, Statistical Analysis & Predictive Modeling.
Excellent knowledge and experience in OLTP/OLAP System Study with focus on Oracle Hyperion Suite of technology, developing Database Schemas like Star schema and Snowflake schema (Fact Tables, Dimension Tables) used in relational, dimensional and multidimensional modeling, physical and logical Data modeling using Erwin tool,
Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.
Experienced in building data models using machine learning techniques for Classification, Regression, Clustering and Associative mining.
Expert in creating PL/SQL Schema objects like Packages, Procedures, Functions, Subprograms, Triggers, Views, Materialized Views, Indexes, Constraints, Sequences, Exception Handling, Dynamic SQL/Cursors, Native Compilation, Collection Types, Record Type, Object Type using SQL Developer.
Able to leverage a heavy dose of Mathematics, Applied Statistics, Advanced Analytics and Machine learning with visualization and a healthy sense of exploration.
Experienced in statistical data analysis like Chi-square, T-test; Dimensionality reduction methods like PCA, LDA and feature selection methods.
Experienced in Teradata RDBMS using Fast load, Fast Export, Multi load, T pump, and Teradata SQL Assistance and BTEQ Teradata utilities.
Experienced in SQL, PL/SQL package, function, stored procedure, triggers, and materialized view, to implement business logics of oracle database.
Experience in developing analytics solutions based on Azure Machine Learning platform and Selection of statistical algorithms - (Two Class Logistic Regression Boosted Decision Tree, Decision Forest Classifiers etc).
Strong experience and knowledge in Data Visualization with Tableau creating: Line and scatter plots, Bar Charts, Histograms, Pie chart, Dot charts, Box plots, Time series, Error Bars, Multiple Charts types, Multiple Axes, subplots etc.
Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau.
Hands on experience with big data tools like Hadoop, Spark, Hive, Pig, Impala, Pyspark, SparkSql.
Strong experience with Oracle/SQL Server programming skills, with experience in working with functions, packages and triggers.
Expertise in Excel Macros, Pivot Tables, vlookups and other advanced functions.
Excellent knowledge and understanding of data mining techniques like classification, clustering, regression techniques and random forests.
Experience in designing, developing, scheduling reports/dashboards using Tableau and Cognos.
Experience with working in Agile/SCRUM software environments

TECHNICAL SKILLS:

Data Analytics Tools/Programming: Python (numpy, scipy, pandas,Gensim, Keras), R ( Caret, Weka, ggplot), MATLAB, Microsoft SQL Server, Oracle PLSQL, Python.

Data Visualization: Tableau, Visualization packages, Microsoft Excel.

Machine Learning Algorithms: Classifications, Regression, Clustering, Feature Engineering.

Data Modeling: Erwin 9.x, Star Schema, Snow-Flake Schema, ER Studio.

Big Data Tools: Hadoop, MapReduce, SQOOP, Pig, Hive, NOSQL, Cassandra, MongoDB, Spark, Scala.

Databases: Oracle, SQL Server, Teradata, Netezza, DB2.

ETL: Informatica, SSIS.

Others: Deep Learning, Graph Mining, Text Mining, C, C++, Java, Javascript, ASP, Shell Scripting, Scala npl, Spark MLLib, SAS, SPSS, Cognos, Azure.

PROFESSIONAL EXPERIENCE:

Confidential, Richmond,VA

Sr. Data Scientist/Data Architect

Responsibilities:

Provided the architectural leadership in shaping strategic, business technology projects, with an emphasis on application architecture and utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
Created ecosystem models (e.g. conceptual, logical, physical, canonical) that are required for supporting services within the enterprise data architecture (conceptual data model for defining the major subject areas used, ecosystem logical model for defining standard business meaning for entities and fields, and an ecosystem canonical model for defining the standard messages and formats to be used in data integration services throughout the ecosystem).
Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, and NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
Conducted studies, rapid plots and using advanced data mining and statistical modeling techniques to build a solution that optimizes the quality and performance of data.
Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data lifecycle management in both RDBMS, Big Data environments.
Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of the database.
Worked on customer segmentation using an unsupervised learning technique - clustering.
Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi-Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
Developed LINUX Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for the client.
Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables. Built and published customized interactive reports and dashboards, report scheduling using Tableau server.
Used TensorFlow to develop deep learning models for classification and regression for direct response modeling and convolution neural network for image classification.
Utilized various supervised and unsupervised machine learning techniques, such as multivariate regression with linear and logistic models, naive Bayes, KNN, PCA and K-means clustering.
Hands-on Oracle External Tables feature to read the data from flat files into Oracle staging tables.
Analyzed the weblog data using the HiveQL to extract a number of unique visitors per day, page views, visit duration, most purchased product on the website and managed and reviewed Hadoop log files.
Used Erwin9.6 for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
Wrote adhoc data normalization jobs for new data ingested into Redshift
Used JSON schema to define table and column mapping from S3 data to Redshift
Developed standards for the semantic needs of data, including different kinds of models (e.g. subject areas models, data classification schemes, standards for ontologies).
Designed and developed user interfaces and customization of Reports using Tableau and OBIEE and designed cubes for data visualization, mobile/web presentation with parameterization and cascading.
Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
Created SSIS Packages using Pivot Transformation, Execute SQL Task, Data Flow Task, etc. to import data into the data warehouse.
Developed and implemented SSIS, SSRS and SSAS application solutions for various business units across the organization.

Environment: ERwin9.6, Teradata, Oracle12c, Hadoop, HDFS, Pig, Hive, MapReduce, PL/SQL, UNIX, Informatica Power Center, MDM, SQL Server, Netezza, DB2, Tableau, Aginity, Architecture, SAS/Graph, SAS/SQL, Tableau, SAS/Connect and SAS/Access, Python, SQL,, AWS, EC2, MongoDB, HBase.

Confidential, MN

Sr. Data Scientist/Data Architect

Responsibilities:

Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine Learning use cases under Spark ML and Mllib.
Solutions architect for transforming business problems into Big Data and Data Science solutions and define Big Data strategy and Roap map.
Identified areas of improvement in existing business by unearthing insights by analyzing vast amount of data using machine learning techniques.
Interpret problems and provides solutions to business problems using data analysis, data mining, optimization tools, and machine learning techniques and statistics.
Led technical implementation of advanced analytics projects, Defined the mathematical approaches, developer new and effective analytics algorithms and wrote the key pieces of mission-critical source code implementing advanced machine learning algorithms utilizing caffe, TensorFlow, Scala, Spark, MLLib, Python and other tools and languages needed.
Gathers, analyzes, documents and translates application requirements into data models and Supports standardization of documentation and the adoption of standards and practices related to data and applications.
Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
Designed and developed NLP models for sentiment analysis.
Implemented Work Load Management (WML) in Redshift to prioritize basic dashboard queries
Wrote indexing strategy for Redshift tables. This involved designing the most efficient sortkey and distkey
Led discussions with users to gather business processes requirements and data requirements to develop a variety of Conceptual, Logical and Physical Data Models. Expert in Business Intelligence and Data Visualization tools: Tableau, Microstrategy.
Built analytical data pipelines to port data in and out of Hadoop/HDFS from structured and unstructured sources and designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.
Performed Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route and Performed data analysis by using Hive to retrieve the data from Hadoop cluster, Sql to retrieve data from Oracle database.
Worked on machine learning on large size data using Spark and MapReduce.
Designed conceptual/theoretical frameworks to support richer models of language use (e.g. incorporating frame semantics, the syntax/semantics interface, discourse and pragmatics, etc).
Let the implementation of new statistical algorithms and operators on Hadoop and SQL platforms and utilized optimizations techniques, linear regressions, K-means clustering, Native Bayes and other approaches.
Developed Spark/Scala,Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
Developed Data Mapping, Data Governance, Transformation and Cleansing rules for the Master Data Management Architecture involving OLTP, ODS and OLAP.
Data sources are extracted, transformed and loaded to generate CSV data files with Python programming and SQL queries.
Stored and retrieved data from data-warehouses using Amazon Redshift.
Worked on TeradataSQL queries, Teradata Indexes, Utilities such as Mload, Tpump, Fast load and FastExport.
Application of various machine learning algorithms and statistical modeling like decision trees, regression models, neural networks, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
Used Data Warehousing Concepts like Ralph Kimball Methodology, Bill Inmon Methodology, OLAP, OLTP, Star Schema, Snow Flake Schema, Fact Table and Dimension Table.
Refined time-series data and validated mathematical models using analytical tools like R and SPSS to reduce forecasting errors.
Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.

Environment: Python, ER Studio, Hadoop, Map Reduce, EC2, S3, Pyspark, Spark, Spark MLLib, Tableau, Informatica, SQL, Excel, VBA, BO, CSV, Netezza, SAS, Matlab, AWS, Scala Nlp, SPSS, Cassandra, Oracle, Amazon Redshift, MongoDB, SQL Server 2012, Teradata, DB2, T-SQL, PL/SQL, Flat Files, XML, Tableau.

Confidential, Atlanta GA

Sr. Data Scientist/Data Architect

Responsibilities:

Used R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks.
Managed data operations team and collaborated with data warehouse developers to meet business user needs, promote data security, and maintain data integrity.
Used R and python for Exploratory Data Analysis, A/B testing, Anova test and Hypothesis test to compare and identify the effectiveness of Creative Campaigns.
Designed and provisioned the platform architecture to execute Hadoop and machine learning use cases under Cloud infrastructure, AWS, EMR, S3.
Wrote archiving scripts to periodically transfer older data from S3 to glacier to minimize storage costs •Wrote ETL code in Python 3.x to clean and normalize unstructured data in legacy Postgres DB to accommodate schema updates in Redshift
Implemented public segmentation using unsupervised machine learning algorithms by implementing k-means algorithm using Pyspark.
Used ETL Tools for masking and cleaning data and mined data from various sources.
Involved in creating Data Lake by extracting customer's Big Data from various data sources into Hadoop HDFS. This included data from Excel, Flat Files, Oracle, SQL Server, MongoDb, Cassandra, HBase, Teradata, Netezza and also log data from servers
Developed Python code for data analysis (also using NumPy and SciPy), Curve-fitting.
Performed extensive Data Validation, Data Verification against Data Warehouse and performed debugging of the SQL-Statements and stored procedures for business scenarios.
Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
Created and reviewed Informatica mapping documents too with business and data governance rules.
Worked on predictive and what-if analysis using R from HDFS and successfully loaded files to HDFS from Teradata, and loaded from HDFS to HIVE.
Designed the schema, configured and deployed AWS Redshift for optimal storage and fast retrieval of data.
Developed ETL mappings, testing, correction and enhancement and resolved data integrity issues and coordinated multiple OLAP and ETL projects for various data lineage and reconciliation.
Analyzed data and predicted end customer behaviors and product performance by applying machine learning algorithms using Spark MLlib.
Extensively worked on data profiling and scanning to de-duplicate records in the staging area before data gets processed.
Performed transformations of data using Spark and Hive according to business requirements for generating various analytical datasets.
Design, Develop ETL process and create UNIX shell scripts to execute Teradata SQL, BTEQ, jobs.
Analyzed the bug reports in BO reports by running similar SQL queries against the source system (s) to perform root-cause analysis.
NLTK, Stanford NLP, RAKE to preprocess the data, entity extraction and keyword extraction.
Used concepts of Data Modeling Star Schema/Snowflake modeling, FACT & Dimensions tables and Logical & Physical data modeling.
Translated cell formulas for business users in Excel into VBA code to design, analyze, and deploy programs for their ad-hoc needs.
Coding using Teradata Analytical functions, BTEQ SQL of TERADATA, write UNIX scripts to validate, format and execute the SQLs on UNIX environment.
Worked on analyzing the data statistically and also prepared statistical reports SAS tool.
Created mapreduce running over HDFS for data mining and analysis using R and Loading & Storage data to Pig Script and R for MapReduce operations.
Created various types of data visualizations using R, and Tableau.
Created numerous dashboards in tableau desktop based on the data collected from zonal and compass, while blending data from MS-excel and CSV files, with MS SQL server databases.
Developed SPSS Macro, which reduced time of programming syntax and increased the productivity for whole data processing steps.
Participated in big data architecture for both batch and real-time analytics and mapped data using scoring system over large data on HDFS

Environment: Horton works - Hadoop Map Reduce, Pyspark, Spark, R, Spark MLLib, Tableau, Informatica, SQL, Excel, VBA, BO, CSV, Erwin, SAS, AWS Redshift, Scala Nlp, Cassandra, Oracle, MongoDB, Cognos,SQL Server 2012, Teradata, DB2, SPSS, T-SQL, PL/SQL, Flat Files, XML, and Tableau.

Confidential, Chicago IL

Sr. Data Architect/ Data Modeler

Responsibilities:

Understand and analyze business data requirements and architect an accurate, extensible, flexible and logical data model and Defining and implementing conceptual, logical, and physical data modeling concepts.
Defining Data Sources and data models, documenting actual data flows, data exchanges, and systems interconnections and interfaces. Ensuring these is aligned with the enterprise data model.
Develop and optimize ETL processes by working closely with multiple data partners and stakeholders across the company to meet growing business needs.
Understand and maintain existing schema with industry standard change management process to achieve zero down time schema upgrade.
Design and build world class high-volume real-time data ingestion frameworks and automate various data sources into Bigdata technologies like Hadoop etc.
Performed Data mapping between source systems to Target systems, logical data modeling, created class diagrams and ERdiagrams and used SQLqueries to filter data.
Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
Identify and predict technology needs related to data and reporting for the organization, and propose technology solutions.
Develop and keep current, a high-level data strategy that fits with the Data Warehouse Standards and the overall strategy of the Company.
Designed different type of STAR schemas using ERWIN with various Dimensions like time, services, customers and FACTtables.
Analyze database infrastructure to insure compliance with customer security standards, database performance considerations, and reverse engineering of existing database environments.
Used Hive and created Hive tables and involved in dataloading and writing HiveUDFs.
Creation of BTEQ, Fast export, MultiLoad, TPump, Fast load scripts for extracting data from various production systems.
Creation of database objects like tables, views, Materialized views, procedures, packages using Oracle tools like PL/SQL, SQL* Plus, SQL*Loader and Handled Exceptions.
Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
Extensively used Erwin for developing data model using star schema methodologies.
Worked on importing and exporting data from Oracle and DB2 into HDFS using Sqoop
Created, optimized, reviewed and executed Teradata SQL test queries to validate transformation rules used in source to target mappings/source views, and to verify data in target tables.
Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig.

Environment: ERWIN 9.x, Informatica Power Mart (Source Analyzer, Data warehousing designer, Mapping Designer, Transformations), MS SQL Server, Oracle, SQL, Hive, Map Reduce, PIG, Sqoop, HDFS, Hadoop, Teradata, Netezza, PL/SQL, Informatica, SSIS, SSRS.

Confidential, Cincinnati, Ohio

Sr. Data Modeler/Data Analyst

Responsibilities:

Worked with business users to gather requirements and create data flow, process flows and functional specification documents.
Developed Data Mapping, Data Governance and transformation and cleansing rules for the Master Data Management Architecture involving OLTP, ODS.
Based on client requirement, created design documents for workday reporting and created dashboard which gives all the information regarding those reports.
Developed, enhanced and maintained Snow Flakes Schemas within data warehouse and data mart with conceptual data models.
Designed 3rd normal form target data model and mapped to logical model.
Involved in extensive Data validation using SQLqueries and back-end testing and used SQL for Querying the database in UNIX environment
Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats
Involved in dataanalysis and creating datamapping documents to capture source to target transformation rules.
Used ER Studio and Visio to create 3NF and dimensional data models and published to the business users and ETL / BIteams.
Involved in Datamapping specifications to create and execute detailed system test plans. The datamapping specifies what data will be extracted from an internal data warehouse, transformed and sent to an external entity.
Developed Informatica SCD type-I, Type-II and Type III mappings and tuned them for better performance. Extensively used almost all of the transformations of Informatica including complex lookups, Stored Procedures, Update Strategy, mapplets and others.
Creating or modifying the T-SQL queries as per the business requirements and worked on creating role playing dimensions, factlessFact, snowflake and starschemas.
Using ER Studio modeling tool, publishing of a data dictionary, review of the model and dictionary with subject matter experts and generation of data definition language.
Extracted data from databases Oracle, Teradata, Netezza, SQL server and DB2 using Informatica to load it into a single repository for Data analysis.
Involved in development and implementation of SSIS, SSRS and SSAS application solutions for various business units across the organization.
Managed full SDLC processes involving requirements management, workflow analysis, source data analysis, data mapping, metadata management, data quality, testing strategy and maintenance of the model.
Created custom Workday reports and modify/troubleshoot existing custom reports.
Created and modified several UNIX shell Scripts according to the changing needs of the project and client requirements.
Identified and tracked the slowly changing dimensions, heterogeneous sources and determined the hierarchies in dimensions.
Building, publishing customized interactive reports and dashboards, report scheduling using Tableau server.
Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects.
Wrote reports using Report Writer that extract Workday data and manipulate it in other formats (Excel) for various needs.
Used Teradata utilities such as Fast Export, MLOAD for handling various tasks.
Analysis of functional and non-functional categorized data elements for dataprofiling and mapping from source to target data environment. Developed working documents to support findings and assign specific tasks.
Translated business requirements to technical requirements in terms of BO(Business Objects) universe and report design.
Involved in fixing invalid mappings, testing of Stored Procedures and Functions, Unit and Integrating testing of Informatica Sessions, Batches and the Target Data.
Involved in the validation of the OLAPUnittesting and SystemTesting of the OLAPReport Functionality and data displayed in the reports

Environment: ER Studio, Informatica Power Center 8.1/9.1, Power Connect/ Power exchange, Oracle 11g, Main frames,DB2 MS SQL Server 2008, SQL,PL/SQL, XML, Windows NT 4.0, Tableau, Workday, SPSS, SAS, Business Objects, XML, Tableau, Unix Shell Scripting, Teradata, Netezza, Aginity.

Confidential

Data Analyst

Responsibilities:

Understood and articulated business requirements from user interviews and then convert requirements into technical specifications. Effectively communicated with the SMEs to gather the requirements.
Involved in translating the functional requirements to specific functionalities and chalking requirement feasibility.
Worked on Regression in performing Safety Stock and Inventory Analysis using R and performed data visualizations using Tableau and R.
Used SQL to retrieve data from the Oracle database for data analysis and visualization and performed Inventory Analysis with Statistical and Data Visualization Tools.
Followed the RUP based methods using Rational Rose to create Use Cases, Activity Diagrams / State Chart Diagrams, Sequence Diagrams.
Worked on the client-server model and gathered the requirements and documented accordingly.
Involved in executing test cases to validate the data from source to target, evaluating test results and preparing test summary reports.
Designed different type of STAR schemas for detailed data marts and plan data marts in the OLAP environment.
Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive Bayes.
Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats
Performed Decision Tree Analysis and Random forests for strategic planning and forecasting and manipulating and cleaning data using dplyr and tidyr packages in R.
Involved in data analysis and creating data mapping documents to capture source to target transformation rules.
Maintained metadata (data definitions of table structures) and version controlling for the data model.
Building, publishing customized interactive reports and dashboards, report scheduling using Tableau server.
Creating or modifying the T-SQL queries as per the business requirements and worked on creating role playing dimensions, fact-less Fact, snowflake and star schemas.
Wrote, executed, performance tuned SQL Queries for Data Analysis& Profiling and wrote complex SQL queries using joins, sub queries and correlated sub queries.
Wrote PL/SQL stored procedures, functions and packages and triggers to implement business rules into the application.
Involved in development and implementation of SSIS, SSRS and SSAS application solutions for various business units across the organization.
Developed mappings to load Fact and Dimension tables, SCD Type 1 and SCD Type 2 dimensions and Incremental loading and unit tested the mappings.
Wrote test cases, developed Test scripts using SQL and PL/SQL for UAT.
Worked with the ETL team to document the Transformation Rules for Data Migration from OLTP to Warehouse Environment for reporting purposes.
Transferred data from various OLTP data sources, such as Oracle, MS Access, MS Excel, Flat files, CSV files into SQL Server.
Identified issues within the data by querying the source data and identifying the data patterns.
Performed data testing, tested ETL mappings (Transformation logic), tested stored procedures, and tested the XML messages.
Created Use cases, activity report, logical components to extract business process flows and workflows involved in the project using Rational Rose, UML and Microsoft Visio.

Environment: R, SQL, Tableau, SSRS, Oracle, T-SQL, UNIX Shell Scripting, DB2.

We provide IT Staff Augmentation Services!

Sr. Data Scientist/data Architect Resume

Richmond, VA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship