- Over 8+ years of experience as a Sr. Big Data Engineer wif skills in analysis, design, development, testing and deploying various software applications.
- In depth knowledge of software development life cycle (SDLC), Waterfall, Iterative and Incremental, RUP, evolutionary prototyping and Agile/Scrum methodologies.
- Extensive experience in Technical consulting and end - to-end delivery wif data modeling, data governance.
- Good Writing Hive join query to fetch info from multiple tables, writing multiple jobs to collect output from Hive
- Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata and BTEQ.
- Excellent experience in system Analysis, ER Dimensional Modeling, Data Design and implementing RDBMS specific features.
- Hands on experience in importing, cleaning, transforming, and validating data and making conclusions from the data for decision-making purposes.
- Experience wif data modeling and design of both OLTP and OLAP systems.
- Good Working wif Big Data Ecosystem components like HBase, Sqoop, Oozie, Hive and Pig wif Cloudera Hadoop distribution.
- Experience in Performance tuning of Informatica (sources, mappings, targets and sessions) and tuning the SQL queries.
- Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required.
- Well experience in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
- Hands on experience in configuring and working wif Flume to load the data from multiple sources directly into HDFS.
- Proficient Knowledge on creating dashboards/reports using reporting tools like Tableau, Qlikview.
- Involved in the process of data acquisition, data pre-processing and data exploration of tele-communication project in Scala.
- Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
- Implemented a distributing messaging queue to integrate wif Cassandra using Apache Kafka and Zookeeper.
- Good Working on Apache Nifi as ETL tool for batch processing and real time processing.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Experience in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
- Excellent Performing in data validation and transformation using Python and Hadoop streaming.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Experience in Dimensional Data Modeling, Star/Snowflake schema, FACT and Dimension tables.
- Excellent experience in installing and running various Oozie workflows and automating parallel job executions.
- Experience working wif Microsoft Server tools like SSAS, SSIS and in generating on-demand scheduled reports using SQL Server Reporting Services (SSRS).
- Responsible for providing primary leadership in designing, leading, and directing all aspects of UAT Testing for the Oracle data warehouse
- Excellent understanding of Microsoft BI toolset including Excel, Power BI, SQL Server Analysis Services, Visio, Access.
Hadoop Ecosystem: MapReduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Stream sets, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11
Data Modeling Tools: Erwin R9.7/9.6, ER Studio V17
BI Tools: Tableau 10, Tableau server 10, Tableau Reader 10, SAP Business Objects, Crystal Reports
RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, MS Access, RDBMS, MySQL, DB2, Hive, Microsoft Azure SQL Database
Operating Systems: Microsoft WindowsVista7/8 and 10, UNIX, and Linux.
Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9
Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED
ETL/Data warehouse Tools: Informatica v10, SAP Business Objects Business Intelligence 4.2 Service Pack 03, Talend, Tableau, and Pentaho.
Other Tools: TOAD, BTEQ, MS-Office suite (Word, Excel, Project and Outlook).
Confidential - Durhum, NC
Sr.Big Data Engineer
- Worked wif Big Data Engineer, designers in troubleshooting MapReduce job failures and issues wif Hive, Pig and Sqoop.
- Used SDLC (System Development Life Cycle) methodologies like RUP and Agile methodology.
- Created data integration and technical solutions for Azure Data Lake for providing analytics and reports for improving marketing strategies.
- Worked on NoSQL databases including Cassandra and Implemented multi-data center and multi-rack Cassandra cluster.
- Developed and configured on Informatica MDM hub supports the Master Data Management (MDM), and Data Warehousing platforms to meet business needs.
- Developed Hive Functions and Queries for massaging data before loading the Hive Tables.
- Used crontab, Oozie workflows to automate the data feed processing from the various sources and for the incremental master data loads from the DB2 tables.
- Designed both 3NF data models for OLAP, OLTP systems and dimensional data models using Star and Snowflake Schemas.
- Developed Hive scripts to validate the data feeds on HDFS and capturing the invalid transactions.
- Exported the analyzed data to the relational databases using Sqoop for visualization and togenerate reports for the BI team.
- Worked on normalization techniques, normalized the data into 3rd Normal Form (3NF).
- Developed MapReduce jobs to create and load the files into NoSQL DB Apache HBase to consume by the Real time applications
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Involved in logical and physical designs and transform logical models into physical implementations for Oracle and Teradata.
- Implemented ETL techniques for Data Conversion, Data Extraction and Data Mapping for different processes as well as applications.
- Demonstrated Qlikview data analyst to create custom reports, charts and bookmarks.
- Captured the data logs from web server into HDFS using Flume for analysis.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Analyzed existing SSIS package, make changes to improve its performances, add standard logging and configuration system.
- Configured various topics on the Kafka server to handle transactions flowing from multiple ERP systems.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
- Developed and automated multiple departmental Reports using Tableau and MSExcel.
- Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from Teradata database.
Environment: Hadoop 3.0, Agile, Azure, Cassandra 3.1,MDM, Hive 2.3, Oozie 4.3, OLAP, OLTP, HDFS, Sqoop 1.4, MapReduce, HBase 1.2, MongoDB, Oracle 12c, Teradata r 15, ETL, Qlikview, Flume 1.8, Pig 0.17, SSIS, Kafka 1.1, Tableau 10.2, MS Excel 2016.
Confidential, Bellevue, WA
- Worked wif BigDataHadoop Ecosystem in ingestion, storage, querying, processing and analysis of bigdataand conventional RDBMS.
- Worked on Agile methodology in driving the team's success collaboratively in mitigating the Infrastructure security.
- Worked on Amazon Redshift and AWS a solution to load data, create data models and run BI on it.
- Used forward engineering approach for designing and creating databases for OLAP model.
- Developed various operational Drill-through and Drill-down reports using SSRS.
- Used advanced features of T-SQL in order to design and tune T-SQL to interface wif the Database
- Extensively used Metadata & Data Dictionary Management, Data Profiling and Data Mapping.
- Designed OLTP system environment and maintained documentation of Metadata.
- Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
- Developed Pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
- Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Implemented Forward engineering to create tables, views and SQL scripts and mapping documents.
- Analyzed the source data coming from various data sources like Mainframe & Oracle.
- Created HBase tables to store various data formats of PII data coming from different portfolios
- Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
- Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
- Worked wif MDM systems team wif respect to technical aspects and generating reports.
- Worked on AWS Redshift and RDS for implementing models and data on RDS and Redshift.
- Developed Hiveand MapReduce tools to design and manage HDFS data blocks and data distribution methods.
- Established process on the work flow to create Workflow diagrams using Microsoft Visio.
- Designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
- Created the cubes wif Star Schemas using facts and dimensions through SQL Server Analysis Services (SSAS).
- Used SSRS to create reports, customized Reports, on-demand reports, ad-hoc reports and involved in analyzing multi-dimensional reports in SSRS.
Environment: Hadoop 3.0, Agile, AWS, OLAP, SSRS, T-SQL, OLTP, Tableau 10.1, Pig 0.17, Oozie 4.3, PL/SQL, Oracle 12c, HBase 1.2, Kafka 1.1, HDFS, MDM, MapReduce, HDFS, Erwin 9.7, SSAS, SSRS.
Confidential - Bentonville, AR
- Worked as a DataModelerto review business requirement and compose source to targetdatamapping documents.
- Responsible for data warehousing, data modeling, data governance, standards, methodologies, guidelines and techniques.
- Participated in JAD session wif business users, sponsors and subject matter experts to understand the business requirement document.
- Designed and implemented end to enddatanear real-timedatapipeline by transferringdatafrom DB2 tables into Hive on HDFS using Sqoop.
- Worked wif DBAs to create a best fit physical data model from the logical data model.
- Translated business requirements into detailed, production-level technical specifications, new features, and created conceptual modelling.
- Created 3NF business area data modeling wif de-normalized physical implementation data and information requirements analysis using Erwin tool.
- Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata.
- Involved in creating Pipelines and Datasets to load the data onto data warehouse.
- Worked in importing and cleansing of data from various sources like Teradata, flat files, SQL Server wif high volume data.
- Extensive system study, design, development and testing were carried out in the Oracle environment to meet the customer requirements.
- Developed Sqoop jobs to collect masterdatafrom Oracle tables to be stored on Hive tables using Parquet file format.
- Created schema objects like Indexes, Views and Sequences. Tuning and optimization of SQL Queries.
- Applied Data Governance rules (primary qualifier, class words and valid abbreviation in Tablename and Column names).
- Identified Facts & Dimensions Tables and established the Grain of Fact for Dimensional Models.
- Prepared process flow/activity diagram for existing system using MS Visio and re-engineer the design based on business requirements.
- Worked on the reporting requirements and involved in generating the reports for the Data Model.
- Designing and implementing thedatalake on the NoSQL database HBase wif denormalized tables suited to feed the down- stream reporting applications.
- Developed Data Migration and Cleansing rules for the Integration Architecture (OLTP and DW)
- Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design
- Developed Star and Snowflake schemas based dimensional model to develop the data warehouse.
- Generate the DDL of the target data model and attached it to the Jira to be deployed in different Environments.
- Created reports from several discovered patterns using Microsoft excel to analyze pertinent data by pivoting.
Environment: Hive 1.0, HDFS, Sqoop, Erwin 9.6 Teradata r14, Oracle 11g, MS Visio, OLTP, JIRA.
Confidential - New Albany, OH
Data Analyst/Data Modeler
- Worked as a Data Analysts/Data Modeler to understand Business logic and User Requirements.
- Presented the data scenarios via, ER/Studio logical models and excel mockups to visualize the data better.
- Extensively used SQL, and PL/SQL to write stored procedures, functions, packages and triggers.
- Extensively worked on Shell scripts for running SSIS programs in batch mode on Unix.
- Star schema was developed for proposed central model and normalized star schema to snowflake schema.
- Designed different type of STAR schemas for detaileddatamarts and plandatamarts in the OLAP environment.
- Configured report server and authorized permissions to different users in SQL Server Reporting Services (SSRS)
- Designed and developed complex SQL scripts in SQL server database for creating tables for tableau reporting.
- Involved wifdataprofiling for multiple sources and answered complex business questions by providingdatato business users.
- Created PhysicalDataModel from the LogicalDataModel using Compare and Merge Utility in ER/Studio.
- Identified and documenteddatasources and transformation rules required to populate and maintaindataWarehouse content.
- Facilitated in developing testing procedures, test cases and User Acceptance Testing (UAT).
- Extensively used SQL for Data Analysis to understand and document the data behavioral trend.
- Developed and maintaineddatadictionary to create metadata reports for technical and business purpose.
- Performed Gap Analysis on existing data models and halped in controlling the gaps identified.
- Involved in extensive Data validation using SQL queries and back-end testing.
- Prepared complex T-SQL queries, views and stored procedures to load data into staging area.
- Involved in data mapping document from source to target and the data quality assessments for the source data.
- Performed Normalization of the existing OLTP systems (3rd NF), to speed up the DML statements execution time.
- Performed Data Modeling, Database Design, and Data Analysis wif the extensive use of ER/Studio
Environment: ER/Studio 0.98, SQL, PL/SQL, UNIX, OLAP, SSRS, T-SQL, OLTP.
- Worked extensively in data analysis by querying in SQL and generating various PL/SQL objects.
- Involved in creating new stored procedures and optimizing existing queries and stored procedures.
- Worked on scripting some complex stored procedures using T-SQL in creating metrics for the data.
- Worked on Data Verifications and Validations to evaluate the data generated according to the requirements is appropriate and consistent.
- Analyzed and build proof of concepts to convert SAS reports into tableau or use SAS dataset in Tableau.
- Performed thorough data analysis for the purpose of overhauling the database using SQL Server.
- DevelopedDatamapping, Transformation and Cleansing rules for theDataManagement involving OLTP, and OLAP.
- Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loadingdatainto target system from multiple sources
- Gathered and documented the Audit trail and traceability of extracted information fordata quality.
- Worked wif data investigation, discovery and mapping tools to scan every single data record from many sources.
- Involved in Data profiling and performed Data Analysis based on the requirements, which halped in catching many Sourcing Issues upfront.
- Produced dashboard SSRS reports under report server projects and publishing SSRS reports to the report’s server.
- Worked on Data Mining and data validation to ensure the accuracy of the data between the warehouse and source systems.
- Developed the financing reporting requirements by analyzing the existing business objects reports.
- Wrote multiple SQL queries to analyze thedataand presented the results using Excel, Access, and Crystal reports.
- Performed ad-hoc analyses, as needed, wif the ability to comprehend analysis as needed
Environment: SQL, PL/SQL, T-SQL, Tableau 9.0, SAS, OLTP, OLAP, SSIS, SSRS, Excel 2010.