Sr. Data Scientist/data Architect Resume
Atlanta, GA
SUMMARY
- Over 9 years of strong experience in Data Science, Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Statastical modeling, Data modeling, Data Visualization, Web Crawling, Web Scraping. Adept in statistical programming languages like R and Python, SAS, Apache Spark, Matlab including Big Data technologies like Hadoop, Hive, Pig.
- Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of BigDataEco - system.
- Experienced on data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing.
- Deep analytics and understanding of Big Data and algorithms using Hadoop, MapReduce, NoSQL and distributed computing tools.
- Expertise in synthesizing Machine learning, Predictive Analytics and Big data technologies into integrated solutions.
- Experienced in DimensionalDataModeling experience usingDatamodeling, RelationalData modeling, ER/Studio, Erwin, and Sybase Power Designer, Star Join Schema/Snowflake modeling, FACT & Dimensions tables, Conceptual, Physical & logicaldatamodeling.
- Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.
- Expertise in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling (decision trees, regression models, neural networks, SVM, clustering), dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K-fold cross validation and data visualization.
- Experienced in writing Pig Latin scripts, MapReduce jobs and HiveQL.
- Extensively used SQL, Numpy, Pandas, Scikit-learn,Spark, Hive for Data Analysis and Model building.
- Extensively worked on ERWIN tool with all features like REVERSE Engineering, FORWARD Engineering, SUBJECTAREA, DOMAIN, Naming Standards Document etc.
- Experience in using various packages in Rand python like ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitteR, NLP, Reshape2, rjson, plyr, pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, Beautiful Soup, Rpy2.
- Excellent and experience and knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of BigDataEco-system.
- Experienced in importing and exporting the data using Sqoop from HDFS to RelationalDatabase systems/mainframe and vice-versa.
- Extensively worked on Sqoop, Hadoop, Hive,Spark, Cassandra to build ETL and Data Processing systems having various data sources, data targets and data formats
- Strong experience and knowledge in Data Visualization with Tableau creating: Line and scatter plots, Bar Charts, Histograms, Pie chart, Dot charts, Box plots, Time series, Error Bars, Multiple Charts types, Multiple Axes, subplots etc.
- Experienced with Integration Services (SSIS), Reporting Service (SSRS) and Analysis Services (SSAS).
- Expertise in Normalization to 3NF/De-normalization techniques for optimum performance in relational and dimensional database environments.
- Extensive experienced on ERModeling, DimensionalModeling (StarSchema, SnowflakeSchema) andDatawarehousing and OLAP tools.
- Expertise indatabase programming (SQL, PLSQL) MSAccessOracle12c/11g/10g/9i, XML, DB2, Informix, Teradata,,Database tuning and Query optimization.
- Experience in designing, developing, scheduling reports/dashboards using Tableau and Cognos.
- Expertise in performingdataanalysis anddataprofiling using complex SQL on various sources systems including Oracle and Teradata.
- Expertise in writing Stored Procedures, Functions, Nested Functions, building Packages and developing Public and Private Sub-Programs using PL/SQL and providing Documentation.
- Expertise in loading data by using the Teradata loader connection, writing Teradata utilities scripts (Fastload, Multiload) and working with loader logs.
- Experienced in SAS/BASE,SAS/STAT,SAS/SQL,SAS/MACROS,SAS/GRAPH,SAS/ACCESS,SAS/ODS,SAS/QC,SAS/ETS in Mainframe, Windows and UNIX environments
TECHNICAL SKILLS
Data Analytics Tools/Programming: Python (numpy, scipy, pandas,Gensim, Keras), R ( Caret, Weka, ggplot), MATLAB, SAS, Microsoft SQL Server, Oracle PLSQL, Python.
Big Data Tools: Hadoop, HDFS, MapReduce, SQOOP, Pig, Hive, NOSQL, Cassandra, MongoDB, Spark, Scala, HBase.
DataModeling Tools: Erwin 9x, 8x, 7 xs, ER Studio and Oracle Designer.
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9
ETL Tools: SSIS, Informatica.
Programming Languages: Java, Base SAS and SAS/SQL, SQL, T-SQL, HTML, Java Script, CSS, UNIX shells scripting, PL/SQL, R.
Database Tools: Microsoft SQL Server 2000/2008, Teradata, Oracle 12c/10g/9i, and MS Access, PostgerSQL.
Web technologies: HTML, DHTML, XML, JavaScript
Reporting Tools: Business Objects, Crystal Reports.
Packages: Microsoft Office 2010, Microsoft Project 2010, SAP and Microsoft Visio, Share point Portal Server 2003/2007.
Operating Systems: Microsoft Windows 9x / NT / 2000/XP / Vista/7 and UNIX
Quality Assurance Tools: Quick Test Pro, Win Runner, Load Runner, Quality Center.
Big Data: Hadoop, HDFS 2, Hive, Pig, HBase, Sqoop, Flume.
Other Tools: MS-Office suite (Word, Excel, Project and Outlook), BTEQ, Teradata, SQL Assistant, Aginity, Tableau, Scala npl, Spark MLLib, SAS, SPSS, Cognos.
PROFESSIONAL EXPERIENCE
Confidential, Atlanta GA
Sr. Data Scientist/Data Architect
Responsibilities:
- Architect and implementmachinelearningalgorithms for document recommendation in enterprise taking advantage of several data sources available in enterprise.
- Used R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks.
- Managed data operations team and collaborated with data warehouse developers to meet business user needs, promote data security, and maintain data integrity.
- Data Collection, Features creation, Model Building (Linear Regression, SVM, Logistic Regression, Decision Tree, Random Forest, GBM), Evaluation Metrics, Model Serving - R, Scikit-learn,SparkSQL,SparkML, Flask, Redshift, AWS S3
- Worked on Real Time as well as Batch Data and have build lambda architecture to process the data using Kafka,SparkStreaming,SparkCore andSparkSQL
- Designed and provisioned the platform architecture to execute Hadoop and machine learning use cases under Cloud infrastructure, AWS, EMR, and S3.
- Involved in creatingDataLake by extracting customer's BigDatafrom variousdatasources into Hadoop HDFS. This includeddatafrom Excel, Flat Files, Oracle, SQL Server, MongoDb, Cassandra, HBase, Teradata, Netezza and also logdatafrom servers
- Developed Python code for data analysis (also using NumPy and SciPy), Curve-fitting.
- Performed extensiveDataValidation,DataVerification againstDataWarehouse and performed debugging of the SQL-Statements and stored procedures for business scenarios.
- Used Spark Data frames, Spark-SQL, SparkMLLibextensively and developing and designing POC's using Scala, Spark SQL andMLliblibraries.
- Created and reviewed Informatica mapping documents too with business anddatagovernance rules.
- Created a recommendation system using k-means clustering, NLP and Flask to generate vehicles list for potential users and worked on NLP algorithm consists of TF-IDF and LSI on the user reviews.
- Worked on predictive and what-if analysis using R from HDFS and successfully loaded files to HDFS from Teradata, and loaded from HDFS to HIVE.
- Designed the schema, configured and deployed AWSRedshiftfor optimal storage and fast retrieval ofdata.
- Developed ETL mappings, testing, correction and enhancement and resolveddataintegrity issues and coordinated multiple OLAP and ETL projects for variousdatalineage and reconciliation.
- Analyzed data and predicted end customer behaviors and product performance by applying machine learning algorithms using SparkMLlib.
- Involved in moving the complete website infrastructure to Amazon Web Service
- Prediction of Function and Industry based on Job Text Analytics using R, Scikit-learn, NLTK, TF-IDF, Bayesian Classifier and Gensim
- Performed transformations of data using Spark and Hive according to business requirements for generating various analytical datasets.
- Design, Develop ETL process and create UNIX shell scripts to executeTeradataSQL, BTEQ, jobs.
- Analyzed the bug reports in BO reports by running similar SQL queries against the source system (s) to perform root-cause analysis.
- NLTK, StanfordNLP, RAKE to preprocess the data, entity extraction and keyword extraction.
- Used concepts of Data Modeling Star Schema/Snowflake modeling, FACT & Dimensions tables and Logical & Physical data modeling.
- Translated cell formulas for business users in Excel into VBA code to design, analyze, and deploy programs for their ad-hoc needs.
- Created dimension and fact tables in Redshift, ETL to get data from different sources and insert into Redshift, Tableau for reporting using Redshift as data source
- Coding usingTeradataAnalytical functions, BTEQ SQL ofTERADATA, write UNIX scripts to validate, format and execute the SQLs on UNIX environment.
- Worked on analyzing thedatastatistically and also prepared statistical reports SAS tool.
- Created mapreduce running over HDFS for data mining and analysis using R and Loading & Storage data to Pig Script and R for MapReduce operations.
- Created various types of data visualizations using R, and Tableau.
- Created numerous dashboards in tableau desktop based on thedatacollected from zonal and compass, while blendingdatafrom MS-excel and CSV files, with MS SQL server databases.
- DevelopedSPSSMacro, which reduced time of programming syntax and increased the productivity for whole data processing steps.
- Participated in big data architecture for both batch and real-time analytics and mapped data using scoring system over large data on HDFS
Environment: Horton works - Hadoop Map Reduce, Pyspark, Spark, R, Spark MLLib, Tableau, Informatica, SQL, Excel, VBA, BO, CSV, Erwin, SAS, AWS Redshift, Scala Nlp, Cassandra, Oracle, MongoDB, Cognos,SQL Server 2012, Teradata, DB2, SPSS, T-SQL, PL/SQL, Flat Files, XML, and Tableau.
Confidential, Brooklyn NY
Sr. Data Scientist/Data Architect
Responsibilities:
- Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine Learning use cases under Spark ML and Mllib.
- Solutions architect for transforming business problems into Big Data and Data Science solutions and define Big Data stratergy and Roap map.
- Fraud Detection - Mixture of anomaly detection (to detect new types of fraud) and classification algorithms (learning from already available fraud data) - Lsanomaly, OneClassSVM, Kmeans (how far is the new data from the nearest centroid), Logistic Regression, Decision Tree, Random Forest, GBM, Scikit-Learn, Python,SparkSQL,SparkML
- Identified areas of improvement in existing business by unearthing insights by analyzing vast amount ofdatausing machine learning techniques.
- Interpret problems and provides solutions to business problems usingdataanalysis,datamining, optimization tools, and machine learning techniques and statistics.
- Led technical implementation of advanced analytics projects, Defined the mathematical approaches, developer new and effective analytics algorithms and wrote the key pieces of mission-critical source code implementing advanced machine learning algorithms utilizing caffe, TensorFlow, Scala, Spark, MLLib, Python and other tools and languages needed.
- Involved in Customer segmentation, Clustering and Segment Analytics using Kmeans Clustering,Spark, Hive and Created Data Visualization using Matplotlib, Mpld3, Flask,Spark, Hiveand Wrote custom scripts usingsparkframework, RDD, Dataframes to do processing of data - Target group, Control group, ROI Calculation, Reports and Improved performance of Hadoop cluster,Sparkcluster, Hive queries andSparkqueries
- Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
- Designed and developed NLP models for sentiment analysis.
- Led discussions with users to gather business processes requirements anddatarequirements to develop a variety of Conceptual, Logical and PhysicalDataModels.Expert in Business Intelligence andDataVisualization tools:Tableau,Microstrategy.
- Built analytical data pipelines to port data in and out of Hadoop/HDFS from structured and unstructured sources and designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.
- Performed Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route and Performed data analysis by using Hive to retrieve the data from Hadoop cluster, Sql to retrieve data from Oracle database.
- Worked on machine learning on large sizedatausing Spark and MapReduce.
- Let the implementation of new statistical algorithms and operators on Hadoop and SQL platforms and utilized optimizations techniques, linear regressions, K-means clustering, Native Bayes and other approaches.
- DevelopedSpark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for bigdataresources.
- Developed Data Mapping, Data Governance, Transformation and Cleansing rules for the Master Data Management Architecture involving OLTP, ODS and OLAP.
- Datasources are extracted, transformed and loaded to generate CSVdatafiles withPythonprogramming and SQL queries.
- Stored and retrieveddatafromdata-warehouses using AmazonRedshift.
- Worked on TeradataSQL queries, Teradata Indexes, Utilities such as Mload, Tpump, Fast load and FastExport.
- Application of various machine learning algorithms and statistical modeling like decision trees, regression models, neural networks, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
- Used Data Warehousing Concepts like Ralph Kimball Methodology, Bill Inmon Methodology, OLAP, OLTP,StarSchema,SnowFlakeSchema, Fact Table and Dimension Table.
- Refined time-series data and validated mathematical models using analytical tools like R andSPSS to reduce forecasting errors.
- Worked ondatapre-processing and cleaning thedatato perform feature engineering and performeddataimputation techniques for the missing values in the dataset using Python.
- Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
Environment: Python, ER Studio, Hadoop, Map Reduce, EC2, S3, Pyspark, Spark, Spark MLLib, Tableau, Informatica, SQL, Excel, VBA, BO, CSV, Netezza, SAS, Matlab, AWS, Scala Nlp, SPSS, Cassandra, Oracle, Amazon Redshift, MongoDB, SQL Server 2012, Teradata, DB2, T-SQL, PL/SQL, Flat Files, XML, Tableau.
Confidential, Chicago IL
Sr. Data Modeler/Data Science
Responsibilities:
- Collaborates with cross-functional team in support of business case development and identifying modeling method(s) to provide business solutions. Determines the appropriate statistical and analytical methodologies to solve business problems within specific areas of expertise.
- GeneratingDataModels using Erwin9.6 and developed relational database system and involved in Logical modeling using the Dimensional Modeling techniques such as Star Schema and Snow Flake Schema.
- Guide the full lifecycle of a Hadoop solution, including requirements analysis, platform selection, technical architecture design, application design and development, testing, and deployment
- Development of best-in-class analytical solution through a combination of Hadoop distributed systems, Teradata, Graph database, SAS and R.
- Responsible for migrating thedataanddatamodels from SQL server and Oracle environment to Teradata environment and applied bigdataand Hadoop on top of it.
- Solve business problems from inception to completion by defining problem,dataacquisition (extraction, assembly and cleaning) from multiple systems todatamodeling (Erwin) and analysis.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Creating logical and physicaldatamodels using best practices to ensure high data quality and reduced redundancy.
- Consult on broad areas includingdatascience, spatial econometrics, machine learning, information technology and systems and economic policy with R
- Manage timely flow of business intelligence information to users and Involved in Normalization and De-Normalization of existing tables for faster query retrieval.
- Involved in making screen designs, UseCases and ERdiagrams for the project using ERWIN and Visio.
- Consolidate, integrate and interpretdatafrom multiple sources including SAS, R, Graph database with varying algorithm into a unified database schema.
- Define Big Data strategy, including designing multi-phased implementation roadmaps.
- Analyze the Business information requirements and research the OLTP source systems to identify the measures, dimensions and facts required for the reports.
- Performed Datamapping between source systems to Target systems, logicaldata modeling, created classdiagrams and ERdiagrams and used SQLqueries to filter data
- Lead design of high-level conceptual and logicalmodels that facilitate a cross-system/cross functional view of data requirements
- Involved in designing and developing Data Models and Data Marts that support the Business Intelligence Data Warehouse.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Used various techniques using Rdatastructures to get thedatain right format to be analyzed which is later used by other internal applications to calculate the thresholds.
- Maintaining conceptual, logical and physicaldatamodels along with corresponding metadata.
- Done data migration from an RDBMS to a NoSQL database, and gives the whole picture for data deployed in various data systems.
- Designed and developed thedatadictionary and Metadataof the models and maintain them.
- Involved in DataWarehouse Support - StarSchema and Dimensionalmodeling to help design datamarts and datawarehouse
- Developed LINUXShell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezzadatabase.
- Developed triggers, stored procedures, functions and packages using cursors and ref cursor concepts associated with the project using Pl/SQL
- Prepared documentation for all entities, attributes,datarelationships, primary and foreign key structures, allowed values, codes, business rules, glossary evolve and change during the project
- Exported the patterns analyzed back to Teradata using Sqoop.
- Troubleshoot test scripts, SQL queries, ETL jobs, and data warehouse/data mart/data store models.
- Used Normalization methods up to 3NF and De-normalization techniques for effective performance in OLTP systems.
- Works on the coding and development ofSASPrograms,SASMacros, PL/SQL, etc., Involves in testing the application with Test Scripts, and document issues for future reference.
- Responsible for different Data mapping activities from Source systems to Teradata
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS
- Developed the performance tuning of the database by using EXPLAINPLAN, TKPROFutilities and also debugging the SQLcode.
- Created multiple custom SQL queries in Teradata SQL Workbench to prepare the right data sets forTableaudashboards
- Perform analyses such as regression analysis, logistic regression, discriminant analysis, cluster analysis using SAS programming.
- Used Meta data tool for importing metadata from repository, new job categories and creating new data elements.
Environment: Erwin r9.6, R, Oracle 12c, MS-SQL Server, Hive, NoSQL, Teradata, Netezza, PL/SQL, MS-Visio, Informatica, T-SQL, SQL, Crystal Reports 2008, Java, SPSS, SAS, Tableau, Excel, HDFS, PIG, SSRS, SSIS, Metadata.
Confidential, Iowa City IA
Sr. Data Modeler/Data Analyst
Responsibilities:
- Worked with business users to gather requirements and create data flow, process flows and functional specification documents.
- Developed Data Mapping, Data Governance and transformation and cleansing rules for the Master Data Management Architecture involving OLTP, ODS.
- Based on client requirement, created design documents forworkdayreporting and created dashboard which gives all the information regarding those reports.
- Developed, enhanced and maintained Snow Flakes Schemas withindatawarehouse and datamart with conceptual datamodels.
- Designed 3rd normal form targetdatamodel and mapped to logical model.
- Involved in extensive Data validation using SQLqueries and back-end testing and used SQL for Querying the database in UNIX environment
- Involved withDataAnalysis primarily IdentifyingDataSets, SourceData, Source MetaData, Data Definitions andDataFormats
- Involved in dataanalysis and creating datamapping documents to capture source to target transformation rules.
- Used ER Studio and Visio to create 3NF and dimensional data models and published to the business users and ETL / BIteams.
- Involved in Datamapping specifications to create and execute detailed system test plans. The datamapping specifies what data will be extracted from an internal data warehouse, transformed and sent to an external entity.
- Developed Informatica SCD type-I, Type-II and Type III mappings and tuned them for better performance. Extensively used almost all of the transformations of Informatica including complex lookups, Stored Procedures, Update Strategy, mapplets and others.
- Creating or modifying the T-SQL queries as per the business requirements and worked on creating role playing dimensions, factlessFact, snowflake and starschemas.
- Using ER Studio modeling tool, publishing of adatadictionary, review of the model and dictionary with subject matter experts and generation ofdatadefinition language.
- Extracteddatafrom databases Oracle, Teradata, Netezza, SQL server and DB2 using Informatica to load it into a single repository forDataanalysis.
- Involved in development and implementation of SSIS, SSRS and SSAS application solutions for various business units across the organization.
- Managed full SDLC processes involving requirements management, workflow analysis, source data analysis,datamapping, metadata management,dataquality, testing strategy and maintenance of the model.
- Created customWorkdayreports and modify/troubleshoot existing custom reports.
- Created and modified several UNIX shell Scripts according to the changing needs of the project and client requirements.
- Identified and tracked the slowly changing dimensions, heterogeneous sources and determined the hierarchies in dimensions.
- Building, publishing customized interactive reports and dashboards, report scheduling usingTableau server.
- Wrote complex SQL queries for validating thedataagainst different kinds of reports generated by Business Objects.
- Wrote reports using Report Writer that extractWorkdaydata and manipulate it in other formats (Excel) for various needs.
- Used Teradata utilities such as Fast Export, MLOAD for handling various tasks.
- Analysis of functional and non-functional categorized data elements for dataprofiling and mapping from source to target data environment. Developed working documents to support findings and assign specific tasks.
- Translatedbusinessrequirements to technical requirements in terms of BO(Business Objects) universe and report design.
- Involved in fixing invalid mappings, testing of Stored Procedures and Functions, Unit and Integrating testing of Informatica Sessions, Batches and the Target Data.
- Involved in the validation of the OLAPUnittesting and SystemTesting of the OLAPReport Functionality and data displayed in the reports
Environment: ER Studio, Informatica Power Center 8.1/9.1, Power Connect/ Power exchange, Oracle 11g, Main frames,DB2 MS SQL Server 2008, SQL,PL/SQL, XML, Windows NT 4.0, Tableau, Workday, SPSS, SAS, Business Objects, XML, Tableau, Unix Shell Scripting, Teradata, Netezza, Aginity.
Confidential
Sr. Data Modeler/Data Analyst
Responsibilities:
- Created and maintained Logical and Physicalmodels for the data mart. Created partitions and indexes for the tables in the datamart.
- Performed data profiling and analysis applied various data cleansing rules designed data standards and architecture/designed the relational models.
- Maintained metadata (data definitions of table structures) and version controlling for the data model.
- Developed SQLscripts for creating tables, Sequences, Triggers, views and materializedviews
- Worked on query optimization and performance tuning using SQL Profiler and performance monitoring.
- Developed mappings to load Fact and Dimension tables, SCD Type 1 and SCD Type 2 dimensions and Incremental loading and unit tested the mappings.
- Utilized Erwin's forward/reverse engineering tools and target database schema conversion process.
- Worked on creating enterprise wide Model EDM for products and services in Teradata Environment based on the data from PDM. Conceived, designed, developed and implemented this model from the scratch.
- Building, publishing customized interactive reports and dashboards, report scheduling usingTableau server
- Write SQLscripts to test the mappings and Developed Traceability Matrix of Business Requirements mapped to Test Scripts to ensure any Change Control in requirements leads to test case update.
- Responsible for development and testing of conversion programs for importingDatafrom text files into map Oracle Database utilizing PERL shell scripts &SQL*Loader.
- Involved in extensive DATA validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
- Developed and executed load scripts usingTeradataclient utilities MULTILOAD, FASTLOAD and BTEQ.
- Exporting and importing the data between different platforms such asSAS, MS-Excel.
- Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services (SSRS).
- Worked with the ETL team to document the Transformation Rules forDataMigration from OLTP to Warehouse Environment for reporting purposes.
- Created SQLscripts to find dataquality issues and to identify keys, data anomalies, and data validation issues.
- Formatting the data sets read intoSASby using Format statement in the data step as well as Proc Format.
- AppliedBusinessObjectsbest practices during development with a strong focus on reusability and better performance.
- DevelopedTableauvisualizations and dashboards usingTableauDesktop.
- Used Graphical Entity-Relationship Diagramming to create new database design via easy to use, graphical interface.
- Designed different type of STARschemas for detailed data marts and plan data marts in the OLAP environment.
Environment: Erwin, MS SQL Server 2008, DB2, Oracle SQL Developer, PL/SQL, Business Objects, Erwin, MS office suite, Windows XP, TOAD, SQL*PLUS, SQL*LOADER, Teradata, Netezza, SAS, Tableau, Business Objects, SSRS, tableau, SQL Assistant, Informatica, XML.