Sr. Data Engineer Resume
Menlo Park, CA
SUMMARY
- Above 8+ years of IT Experience in Data Engineering and Data Analysis with high proficiency in developing Data Warehouse & Business intelligence professional with applied information Technology.
- Extensive experience in Relational and Dimensional Data modeling for creating Logical and Physical Design of Database and ER Diagrams using multiple data modeling tools like Erwin and ER Studio.
- Experienced performing structural modifications using Map - Reduce, analyzing data using Hive and visualizing in dashboards using Tableau
- Experienced in Data Management solution that covers DWH/Data Architecture design, Data Governance Implementation and Big Data.
- Experienced using "Bigdata" work on Hadoop, Spark, PySpark, Hive, HDFS and other NoSQL platforms.
- Experienced in all phases of the software development life cycle (SDLC), from requirements definition through implementation and supported my models in transformation and analysis phase
- Experienced in Data Modeling using Dimensional Data Modeling, Star Schema modeling, Fact and Dimensions tables including Physical, Logical data modeling
- Experienced with Data Conversion, Data Quality, and Data Profiling, Performance Tuning and System Testing and implementing RDBMS features.
- Experienced with Business Process Modeling, Process Flow Modeling &Data flow modeling.
- Expertise in implementing Security models for Dashboards, Row level, object Level, Role Level, Dashboard Level.
- Excellent experience in Normalization (1NF, 2NF, 3NF and BCNF) and De-normalization techniques for TEMPeffective and optimum performance in OLTP and OLAP environments.
- Excellent experience in Extract, Transfer and Load process using ETL tools like Data Stage, Informatica, Data Integrator and SSIS for Data migration and Data Warehousing projects.
- Experienced in the development of Data Warehouse, Business Intelligence architecture that involves data integration and the conversion of data from multiple sources and platforms.
- Expertise in Data Analysis using SQL on Oracle, MS SQL Server, DB2 & Teradata, Netezza.
- Having good knowledge in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
- Strong experience working with conceptual, logical and physical data modeling considering Meta data standards.
- Experience working with Agile and Waterfall data modeling methodologies.
- Experience in Ralph Kimball and Bill Inmon approaches.
- Expertise in UML (class diagrams, object diagrams, use case diagrams, state diagrams, sequence diagrams, activity diagrams, and collaboration diagrams) as a business analysis methodology for application functionality designs using Rational Rose and MS-Visio.
- Involved in various projects related to Data Modeling, System/Data Analysis, Design and Development for both OLTP and Data warehousing environments.
- Facilitated data requirement meetings with business and technical stakeholders and resolved conflicts to drive decisions.
- Experience in Data transformation and Data mapping from source to target database schemas and also data cleansing.
- Experience in performance analysis and created partitions, indexes and Aggregate tables where necessary.
- Experience with DBA tasks involving database creation, performance tuning, creation of indexes, creating and modifying table spaces for optimization purposes.
- Performed extensive Data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
- Having very Good exposure to ETL tool like Informatica.
- Excellent communication skills, self-starter with ability to work with minimal guidance.
TECHNICAL SKILLS
Big Data technologies: MapReduce, HBase, HDFS, Sqoop, Spark, Hadoop, Hive, PIG, Impala.
Data Modeling Tools: ER/Studio 9.7/9.0, Erwin 9.8/9.5, Power Sybase Designer.
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9
Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack
Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED
Databases: Oracle 12c, Teradata R15, MS SQL Server 2019, DB2.
Testing and defect tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio & Visual Source Safe
Operating System: Windows 10, Unix, Sun Solaris
ETL/Data warehouse Tools: Informatica 10.0/9.6, SAP Business Objects XIR3.1/XIR2, Talend
BI Tools: Tableau, Power BI
Project Development Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.
PROFESSIONAL EXPERIENCE
Confidential - Menlo Park, CA
Sr. Data Engineer
Responsibilities:
- As a Sr. Data Engineer designed and deployed scalable, highly available, and fault tolerant systems on Azure.
- Involved in completeSDLClife cycle of big data project that includes requirement analysis, design, coding, testing and production.
- Lead the estimation, review the estimates, identify the complexities and communicate to all the stakeholders.
- Defined the business objectives comprehensively through discussions with business stakeholders, functional analysts and participating in requirement collection sessions.
- Migrated on-primes environment on Cloud using MS Azure.
- Performed data Ingestion for the incoming web feeds into the Data lake store which includes both structured and unstructured data.
- Designed the business requirement collection approach based on the project scope and SDLC (Agile) methodology.
- Migrated data warehouses to Snowflake Data warehouse.
- Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
- Installed and configured Hadoop Ecosystem components.
- Defined virtual warehouse sizing for Snowflake for different type of workloads.
- Extensively used Agile Method for daily scrum to discuss the project related information.
- Worked with data ingestions from multiple sources into the Azure SQL data warehouse
- Transformed and loading data into Azure SQL Database.
- Wrote Spark applications for Data validation, cleansing, transformations and custom aggregations.
- Developed HIVE scripts to transfer data from and to HDFS.
- Implemented Hadoop based data warehouses, integrated Hadoop with Enterprise Data Warehouse systems.
- Performed reverse engineering using Erwin to redefine entities, attributes and relationships existing database.
- Development and maintenance of data pipeline on Azure Analytics platform using Azure Databricks.
- Created Airflow Scheduling scripts in Python.
- Ingested data into HDFS using Sqoop and scheduled an incremental load to HDFS.
- Worked with Hadoop infrastructure to storage data in HDFS storage and use HIVE SQL to migrate underlying SQL codebase in Azure.
- Created Data Pipeline to migrate data from Azure Blob Storage to Snowflake.
- Worked on Snowflake modeling and highly proficient in data warehousing techniques for data cleansing, Slowly Changing Dimension phenomenon, surrogate key assignment and change data capture.
- MaintainedNoSQLdatabaseto handle unstructured data, clean the data by removing invalidate data, unifying the format and rearranging the structure and load for following steps.
- Participated inNoSQLdatabase maintaining withAzure Sql DB.
- Involved in Kafka and building use case relevant to our environment.
- Identified data within different data stores, such as tables, files, folders, and documents to create a dataset in pipeline using Azure HDInsight.
- Optimized and updated UML Models (Visio) and RelationalDataModels for various applications.
- Wrote Python scripts to parse XML documents and load the data in database.
- Written DDL and DML statements for creating, altering tables and converting characters into numeric values.
- Translated business concepts into XML vocabularies by designing XML Schemas with UML.
- Worked on Data load using Azure Data factory using external table approach.
- Automated recurring reports using SQL and Python and visualized them on BI platform like Power BI.
- Designed and generated various dashboards, reports using various Power BI Visualizations.
- Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools.
- Developed purging scripts and routines to purge data on Azure SQL Server and Azure Blob storage.
- Developed Python Scripts for automation purpose and Component unit testing using Azure Emulator.
- Involved in T-SQL queries and optimizing the queries in SQL Server.
- Maintaining data storage in Azure Data Lake.
Confidential - Dublin, OH
Sr. Data Engineer
Responsibilities:
- Massively involved as Data Engineer role to review business requirement and compose source to target data mapping documents.
- Extensively used Agile Method for daily scrum to discuss the project related information.
- Provided a summary of the Project's goals, and the specific expectation of business users from BI and how it aligns with the project goals.
- Provided suggestion to implement multitasking for existing Hive Architecture in Hadoop also suggested UI customization in Hadoop.
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
- Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
- Responsible for a setup of 5 node development cluster for a Proof of Concept which was later implemented as a fulltime project by Fortune Brands.
- Architect and designed the data flow for the collapse of 4 legacy data warehouses into an AWS Data Lake.
- Installed and Configured Hadoop cluster using Amazon Web Services (AWS) for POC purposes.
- Connected to AWS Redshift through Tableau to extract live data for real time analysis.
- Developed Data mapping, Transformation and Cleansing rules for the Data Management involving OLTP and OLAP.
- Used AWS S3 Buckets to store the file and injected the files into Snowflake tables using Snow Pipe and run deltas using Data pipelines.
- Worked on complex SNOW SQL and Python Queries in Snowflake.
- Configured the above jobs in Airflow.
- Resolve AML related issues to ensure adoption of standards, guidelines in the organization. Resolution of day-to-day issues and worked with the users and testing team towards resolution of issues and fraud incident related tickets.
- Used Erwin tool to develop a Conceptual Model based on business requirements analysis.
- Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources
- Load real time data from various data sources into HDFS using Kafka.
- Worked on normalization techniques. Normalized the data into 3rd Normal Form (3NF).
- Involved in Data profiling in order to detect and correct inaccurate data and maintain the data quality.
- Worked on configuring and managing disaster recovery and backup on Cassandra Data.
- Implemented Kafka producers create custom partitions, configured brokers and implemented High level consumers to implement data platform.
- Designed and developed an entire module called CDC (change data capture) in python and deployed in AWS GLUE using PySpark library and python.
- Involved in writing API for Amazon Lambda to manage some of the AWS Services.
- Updated Python scripts to match training data with our database stored in AWS Cloud.
- Normalized the database based on the new model developed to put them into the 3NF of thedatawarehouse.
- Running entire Big Data on AWS environments.
- Involved in extensiveDatavalidation by writing several complex SQL queries.
- Performeddatacleaning anddatamanipulation activities using NZSQL utility.
- Designed thedatamarts using the Ralph Kimball's DimensionalDataMart modeling methodology using Erwin.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Worked with MDM system steam with respect to technical aspects and generating reports.
- Extracted Mega Data from Redshift AWS, and Elastic Search engine using SQL Queries to create reports.
- Worked onDatagovernance,dataquality,datalineage establishment processes.
- Executed change management processes surrounding new releases of SAS functionality
- Worked in importing and cleansing of data from various sources.
- Performed Data Cleaning, features scaling, features engineering using packages in python.
- Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
- Created various types of data visualizations using Python and Tableau.
- Written and executed unit, system, integration and UAT scripts in a data warehouse project.
Confidential - San Antonio, TX
Sr. Data Modeler
Responsibilities:
- Worked as a Sr. Data Modeler to generate Data Models using Erwin and developed relational database system.
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Responsible for data governance rules and standards to maintain the consistency of the business element names in the different data layers.
- Developed Big Data solutions focused on pattern matching and predictive modeling
- Worked on analyzing Hadoop stack and different big data analytic tools.
- Extensively used agile methodology as the Organization Standard to implement the data Models.
- Worked in an environment of Amazon Web Services (AWS) provisioning and used AWS services.
- Created semantically rich logical data models (non-relational/NoSQL) that define the Business data requirements.
- Converted conceptual models into logical models with detailed descriptions of entities and dimensions for Enterprise Data Warehouse.
- Involved in creating Pipelines and Datasets to load the data onto Data Warehouse.
- Coordinate with Data Architects to Design Big Data, Hadoop projects and provide for a designer that is an idea-driven.
- Created database objects in AWS Redshift. Followed AWS best practices to convert data types from oracle to Redshift.
- Worked on NoSQL Databases as Cassandra
- Identified data organized into logical groupings and domains, independent of any application or system.
- Developed data models and data migration strategies utilizing sound concepts of data modeling including star schema, snowflake schema.
- Created S3 buckets in the AWS environment to store files, sometimes which are required to serve static content for a web application.
- Developed Model aggregation layers and specific star schemas as subject areas within a logical and physical model.
- Identified Facts &Dimensions Tables and established the Grain of Fact for Dimensional Models.
- Configured Inbound/Outbound in AWS Security groups according to the requirements.
- Established measures to chart progress related to the completeness and quality of metadata for enterprise information.
- Developed the data dictionary for various projects for the standard data definitions related data analytics.
- Managed storage in AWS using Elastic Block Storage, S3, created Volumes and configured Snapshots.
- Generated the DDL of the target data model and attached it to the Jira to be deployed in different Environments.
- Conducted data modeling for JAD sessions and communicated data related standards.
- Investigate Data Quality Issues and provide recommendation & solution to address them.
- Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
- Validated the data of reports by writing SQL queries in PL/SQL Developer against ODS.
- Prepared reports to summarize the daily data quality status and work activities.
- Performed ad-hoc analyses, as needed, with the ability to comprehend analysis as needed.
Environment: Erwin 9.7, NoSQL, Sqoop, Cassandra 3.11, AWS, Hadoop 3.0, SQL, Pl/SQL
Confidential - Rosemont, IL
Data Modeler
Responsibilities:
- Responsible for the data modeling design delivery, data model development, review, approval and Data warehouse implementation.
- Extensively used Agile methodology as the Organization Standard to implement the data Models.
- Provided a consultative approach with business users, asking questions to understand the business need and deriving the data flow, logical, and physical data models based on those needs.
- Extracted Mega Data from Amazon Redshift, AWS, and Elastic Search engine using SQL Queries to create reports.
- Worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
- Created Physical & logical data model from the conceptual model and it's conversion into the physical database with the DDL's using forward engineering options in Erwin.
- Used Erwin model mart for TEMPeffective model management of sharing, dividing and reusing model information and design for productivity improvement.
- Used Model Manager Option in Erwin to synchronize the data models in Model Mart approach.
- Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snow Flake Schemas
- Involved in Dimensional modeling (Star Schema) of the Data warehouse and used Erwin to design the business process, dimensions and measured facts.
- Worked on implementing and executing enterprise data governance and data quality framework.
- Completed enhancement for MDM (Master data management) and suggested the implementation for hybrid MDM (Master Data Management)
- Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Created T-SQL queries as per the business requirements.
- Storing and loading the data from HDFS to AWS S3 and backing up and Created tables in AWS cluster with S3 storage.
- Designed Metadata Repository to store data definitions for entities, attributes & mappings between data warehouse and source system data elements.
- Developed Data Mapping, Data Governance, and Transformation and cleansing rules for the Master Data Management Architecture
- Worked on designing, implementing and deploying into production an Enterprise data warehouse
- Used SQL on the new AWS Databases like Redshift and Relation Data Services.
- Designed and Developed Shell Scripts, Data Conversions and Data Cleansing.
- Involved in capturing data lineage, table and column data definitions, valid values and others necessary information in the data model.
- Used SQL for Querying the database in UNIX environment.
Confidential
Data Analyst
Responsibilities:
- Understood business processes, data entities, data producers, and data dependencies.
- Conducted meetings with the business and technical team to gather necessary analytical data requirements in JAD sessions.
- Created the automated build and deployment process for application, re-engineering setup for better user experience, and leading up to building a continuous integration system.
- Involved in the Complete Software development life cycle (SDLC) to develop the application.
- Used and supported database applications and tools for extraction, transformation and analysis of raw data.
- Assisted in defining business requirements for the IT team and created BRD and functional specifications documents along with mapping documents to assist the developers in their coding.
- Involved in building various logics to handle Slowly Changing Dimensions, Change Data Capture, and Deletes for Incremental Loads in to the Data warehouse.
- Involved in designing fact, dimension and aggregate tables for Data warehouse Star Schema.
- Performed Reverse Engineering of the legacy application using DDL scripts in Erwin, and developed Logical and Physical data models for Central Model consolidation
- Monitored the Data quality of the daily processes and ensure integrity of data was maintained to ensure TEMPeffective functioning of the departments.
- Developed data mapping documents for integration into a central model and depicting data flow across systems & maintain all files into electronic filing system.
- Worked and extracted data from various database sources like DB2, CSV, XML and Flat files into the DataStage.
- Developed and programmed test scripts to identify and manage data inconsistencies and testing of ETL processes
- Created data masking mappings to mask the sensitive data between production and test environment.
- Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
- Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and DB2.
- Written SQL scripts to test the mappings and Developed Traceability Matrix of Business Requirements mapped to Test Scripts to ensure any Change Control in requirements leads to test case update.
- Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
- Involved in updating metadata repository with details on the nature and use of applications/data transformations to facilitate impact analysis.