We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

3.00/5 (Submit Your Rating)

Menlo Park, CA

SUMMARY

  • Above 8+ years of IT Experience in Data Engineering and Data Analysis with high proficiency in developing Data Warehouse & Business intelligence professional with applied information Technology.
  • Extensive experience in Relational and Dimensional Data modeling for creating Logical and Physical Design of Database and ER Diagrams using multiple data modeling tools like Erwin and ER Studio.
  • Experienced performing structural modifications using Map - Reduce, analyzing data using Hive and visualizing in dashboards using Tableau
  • Experienced in Data Management solution that covers DWH/Data Architecture design, Data Governance Implementation and Big Data.
  • Experienced using "Bigdata" work on Hadoop, Spark, PySpark, Hive, HDFS and other NoSQL platforms.
  • Experienced in all phases of the software development life cycle (SDLC), from requirements definition through implementation and supported my models in transformation and analysis phase
  • Experienced in Data Modeling using Dimensional Data Modeling, Star Schema modeling, Fact and Dimensions tables including Physical, Logical data modeling
  • Experienced with Data Conversion, Data Quality, and Data Profiling, Performance Tuning and System Testing and implementing RDBMS features.
  • Experienced with Business Process Modeling, Process Flow Modeling &Data flow modeling.
  • Expertise in implementing Security models for Dashboards, Row level, object Level, Role Level, Dashboard Level.
  • Excellent experience in Normalization (1NF, 2NF, 3NF and BCNF) and De-normalization techniques for TEMPeffective and optimum performance in OLTP and OLAP environments.
  • Excellent experience in Extract, Transfer and Load process using ETL tools like Data Stage, Informatica, Data Integrator and SSIS for Data migration and Data Warehousing projects.
  • Experienced in the development of Data Warehouse, Business Intelligence architecture that involves data integration and the conversion of data from multiple sources and platforms.
  • Expertise in Data Analysis using SQL on Oracle, MS SQL Server, DB2 & Teradata, Netezza.
  • Having good knowledge in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
  • Strong experience working with conceptual, logical and physical data modeling considering Meta data standards.
  • Experience working with Agile and Waterfall data modeling methodologies.
  • Experience in Ralph Kimball and Bill Inmon approaches.
  • Expertise in UML (class diagrams, object diagrams, use case diagrams, state diagrams, sequence diagrams, activity diagrams, and collaboration diagrams) as a business analysis methodology for application functionality designs using Rational Rose and MS-Visio.
  • Involved in various projects related to Data Modeling, System/Data Analysis, Design and Development for both OLTP and Data warehousing environments.
  • Facilitated data requirement meetings with business and technical stakeholders and resolved conflicts to drive decisions.
  • Experience in Data transformation and Data mapping from source to target database schemas and also data cleansing.
  • Experience in performance analysis and created partitions, indexes and Aggregate tables where necessary.
  • Experience with DBA tasks involving database creation, performance tuning, creation of indexes, creating and modifying table spaces for optimization purposes.
  • Performed extensive Data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
  • Having very Good exposure to ETL tool like Informatica.
  • Excellent communication skills, self-starter with ability to work with minimal guidance.

TECHNICAL SKILLS

Big Data technologies: MapReduce, HBase, HDFS, Sqoop, Spark, Hadoop, Hive, PIG, Impala.

Data Modeling Tools: ER/Studio 9.7/9.0, Erwin 9.8/9.5, Power Sybase Designer.

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED

Databases: Oracle 12c, Teradata R15, MS SQL Server 2019, DB2.

Testing and defect tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio & Visual Source Safe

Operating System: Windows 10, Unix, Sun Solaris

ETL/Data warehouse Tools: Informatica 10.0/9.6, SAP Business Objects XIR3.1/XIR2, Talend

BI Tools: Tableau, Power BI

Project Development Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.

PROFESSIONAL EXPERIENCE

Confidential - Menlo Park, CA

Sr. Data Engineer

Responsibilities:

  • As a Sr. Data Engineer designed and deployed scalable, highly available, and fault tolerant systems on Azure.
  • Involved in completeSDLClife cycle of big data project that includes requirement analysis, design, coding, testing and production.
  • Lead the estimation, review the estimates, identify the complexities and communicate to all the stakeholders.
  • Defined the business objectives comprehensively through discussions with business stakeholders, functional analysts and participating in requirement collection sessions.
  • Migrated on-primes environment on Cloud using MS Azure.
  • Performed data Ingestion for the incoming web feeds into the Data lake store which includes both structured and unstructured data.
  • Designed the business requirement collection approach based on the project scope and SDLC (Agile) methodology.
  • Migrated data warehouses to Snowflake Data warehouse.
  • Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
  • Installed and configured Hadoop Ecosystem components.
  • Defined virtual warehouse sizing for Snowflake for different type of workloads.
  • Extensively used Agile Method for daily scrum to discuss the project related information.
  • Worked with data ingestions from multiple sources into the Azure SQL data warehouse
  • Transformed and loading data into Azure SQL Database.
  • Wrote Spark applications for Data validation, cleansing, transformations and custom aggregations.
  • Developed HIVE scripts to transfer data from and to HDFS.
  • Implemented Hadoop based data warehouses, integrated Hadoop with Enterprise Data Warehouse systems.
  • Performed reverse engineering using Erwin to redefine entities, attributes and relationships existing database.
  • Development and maintenance of data pipeline on Azure Analytics platform using Azure Databricks.
  • Created Airflow Scheduling scripts in Python.
  • Ingested data into HDFS using Sqoop and scheduled an incremental load to HDFS.
  • Worked with Hadoop infrastructure to storage data in HDFS storage and use HIVE SQL to migrate underlying SQL codebase in Azure.
  • Created Data Pipeline to migrate data from Azure Blob Storage to Snowflake.
  • Worked on Snowflake modeling and highly proficient in data warehousing techniques for data cleansing, Slowly Changing Dimension phenomenon, surrogate key assignment and change data capture.
  • MaintainedNoSQLdatabaseto handle unstructured data, clean the data by removing invalidate data, unifying the format and rearranging the structure and load for following steps.
  • Participated inNoSQLdatabase maintaining withAzure Sql DB.
  • Involved in Kafka and building use case relevant to our environment.
  • Identified data within different data stores, such as tables, files, folders, and documents to create a dataset in pipeline using Azure HDInsight.
  • Optimized and updated UML Models (Visio) and RelationalDataModels for various applications.
  • Wrote Python scripts to parse XML documents and load the data in database.
  • Written DDL and DML statements for creating, altering tables and converting characters into numeric values.
  • Translated business concepts into XML vocabularies by designing XML Schemas with UML.
  • Worked on Data load using Azure Data factory using external table approach.
  • Automated recurring reports using SQL and Python and visualized them on BI platform like Power BI.
  • Designed and generated various dashboards, reports using various Power BI Visualizations.
  • Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools.
  • Developed purging scripts and routines to purge data on Azure SQL Server and Azure Blob storage.
  • Developed Python Scripts for automation purpose and Component unit testing using Azure Emulator.
  • Involved in T-SQL queries and optimizing the queries in SQL Server.
  • Maintaining data storage in Azure Data Lake.

Confidential - Dublin, OH

Sr. Data Engineer

Responsibilities:

  • Massively involved as Data Engineer role to review business requirement and compose source to target data mapping documents.
  • Extensively used Agile Method for daily scrum to discuss the project related information.
  • Provided a summary of the Project's goals, and the specific expectation of business users from BI and how it aligns with the project goals.
  • Provided suggestion to implement multitasking for existing Hive Architecture in Hadoop also suggested UI customization in Hadoop.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
  • Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
  • Responsible for a setup of 5 node development cluster for a Proof of Concept which was later implemented as a fulltime project by Fortune Brands.
  • Architect and designed the data flow for the collapse of 4 legacy data warehouses into an AWS Data Lake.
  • Installed and Configured Hadoop cluster using Amazon Web Services (AWS) for POC purposes.
  • Connected to AWS Redshift through Tableau to extract live data for real time analysis.
  • Developed Data mapping, Transformation and Cleansing rules for the Data Management involving OLTP and OLAP.
  • Used AWS S3 Buckets to store the file and injected the files into Snowflake tables using Snow Pipe and run deltas using Data pipelines.
  • Worked on complex SNOW SQL and Python Queries in Snowflake.
  • Configured the above jobs in Airflow.
  • Resolve AML related issues to ensure adoption of standards, guidelines in the organization. Resolution of day-to-day issues and worked with the users and testing team towards resolution of issues and fraud incident related tickets.
  • Used Erwin tool to develop a Conceptual Model based on business requirements analysis.
  • Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources
  • Load real time data from various data sources into HDFS using Kafka.
  • Worked on normalization techniques. Normalized the data into 3rd Normal Form (3NF).
  • Involved in Data profiling in order to detect and correct inaccurate data and maintain the data quality.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Implemented Kafka producers create custom partitions, configured brokers and implemented High level consumers to implement data platform.
  • Designed and developed an entire module called CDC (change data capture) in python and deployed in AWS GLUE using PySpark library and python.
  • Involved in writing API for Amazon Lambda to manage some of the AWS Services.
  • Updated Python scripts to match training data with our database stored in AWS Cloud.
  • Normalized the database based on the new model developed to put them into the 3NF of thedatawarehouse.
  • Running entire Big Data on AWS environments.
  • Involved in extensiveDatavalidation by writing several complex SQL queries.
  • Performeddatacleaning anddatamanipulation activities using NZSQL utility.
  • Designed thedatamarts using the Ralph Kimball's DimensionalDataMart modeling methodology using Erwin.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Worked with MDM system steam with respect to technical aspects and generating reports.
  • Extracted Mega Data from Redshift AWS, and Elastic Search engine using SQL Queries to create reports.
  • Worked onDatagovernance,dataquality,datalineage establishment processes.
  • Executed change management processes surrounding new releases of SAS functionality
  • Worked in importing and cleansing of data from various sources.
  • Performed Data Cleaning, features scaling, features engineering using packages in python.
  • Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
  • Created various types of data visualizations using Python and Tableau.
  • Written and executed unit, system, integration and UAT scripts in a data warehouse project.

Confidential - San Antonio, TX

Sr. Data Modeler

Responsibilities:

  • Worked as a Sr. Data Modeler to generate Data Models using Erwin and developed relational database system.
  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Responsible for data governance rules and standards to maintain the consistency of the business element names in the different data layers.
  • Developed Big Data solutions focused on pattern matching and predictive modeling
  • Worked on analyzing Hadoop stack and different big data analytic tools.
  • Extensively used agile methodology as the Organization Standard to implement the data Models.
  • Worked in an environment of Amazon Web Services (AWS) provisioning and used AWS services.
  • Created semantically rich logical data models (non-relational/NoSQL) that define the Business data requirements.
  • Converted conceptual models into logical models with detailed descriptions of entities and dimensions for Enterprise Data Warehouse.
  • Involved in creating Pipelines and Datasets to load the data onto Data Warehouse.
  • Coordinate with Data Architects to Design Big Data, Hadoop projects and provide for a designer that is an idea-driven.
  • Created database objects in AWS Redshift. Followed AWS best practices to convert data types from oracle to Redshift.
  • Worked on NoSQL Databases as Cassandra
  • Identified data organized into logical groupings and domains, independent of any application or system.
  • Developed data models and data migration strategies utilizing sound concepts of data modeling including star schema, snowflake schema.
  • Created S3 buckets in the AWS environment to store files, sometimes which are required to serve static content for a web application.
  • Developed Model aggregation layers and specific star schemas as subject areas within a logical and physical model.
  • Identified Facts &Dimensions Tables and established the Grain of Fact for Dimensional Models.
  • Configured Inbound/Outbound in AWS Security groups according to the requirements.
  • Established measures to chart progress related to the completeness and quality of metadata for enterprise information.
  • Developed the data dictionary for various projects for the standard data definitions related data analytics.
  • Managed storage in AWS using Elastic Block Storage, S3, created Volumes and configured Snapshots.
  • Generated the DDL of the target data model and attached it to the Jira to be deployed in different Environments.
  • Conducted data modeling for JAD sessions and communicated data related standards.
  • Investigate Data Quality Issues and provide recommendation & solution to address them.
  • Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
  • Validated the data of reports by writing SQL queries in PL/SQL Developer against ODS.
  • Prepared reports to summarize the daily data quality status and work activities.
  • Performed ad-hoc analyses, as needed, with the ability to comprehend analysis as needed.

Environment: Erwin 9.7, NoSQL, Sqoop, Cassandra 3.11, AWS, Hadoop 3.0, SQL, Pl/SQL

Confidential - Rosemont, IL

Data Modeler

Responsibilities:

  • Responsible for the data modeling design delivery, data model development, review, approval and Data warehouse implementation.
  • Extensively used Agile methodology as the Organization Standard to implement the data Models.
  • Provided a consultative approach with business users, asking questions to understand the business need and deriving the data flow, logical, and physical data models based on those needs.
  • Extracted Mega Data from Amazon Redshift, AWS, and Elastic Search engine using SQL Queries to create reports.
  • Worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
  • Created Physical & logical data model from the conceptual model and it's conversion into the physical database with the DDL's using forward engineering options in Erwin.
  • Used Erwin model mart for TEMPeffective model management of sharing, dividing and reusing model information and design for productivity improvement.
  • Used Model Manager Option in Erwin to synchronize the data models in Model Mart approach.
  • Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snow Flake Schemas
  • Involved in Dimensional modeling (Star Schema) of the Data warehouse and used Erwin to design the business process, dimensions and measured facts.
  • Worked on implementing and executing enterprise data governance and data quality framework.
  • Completed enhancement for MDM (Master data management) and suggested the implementation for hybrid MDM (Master Data Management)
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Created T-SQL queries as per the business requirements.
  • Storing and loading the data from HDFS to AWS S3 and backing up and Created tables in AWS cluster with S3 storage.
  • Designed Metadata Repository to store data definitions for entities, attributes & mappings between data warehouse and source system data elements.
  • Developed Data Mapping, Data Governance, and Transformation and cleansing rules for the Master Data Management Architecture
  • Worked on designing, implementing and deploying into production an Enterprise data warehouse
  • Used SQL on the new AWS Databases like Redshift and Relation Data Services.
  • Designed and Developed Shell Scripts, Data Conversions and Data Cleansing.
  • Involved in capturing data lineage, table and column data definitions, valid values and others necessary information in the data model.
  • Used SQL for Querying the database in UNIX environment.

Confidential

Data Analyst

Responsibilities:

  • Understood business processes, data entities, data producers, and data dependencies.
  • Conducted meetings with the business and technical team to gather necessary analytical data requirements in JAD sessions.
  • Created the automated build and deployment process for application, re-engineering setup for better user experience, and leading up to building a continuous integration system.
  • Involved in the Complete Software development life cycle (SDLC) to develop the application.
  • Used and supported database applications and tools for extraction, transformation and analysis of raw data.
  • Assisted in defining business requirements for the IT team and created BRD and functional specifications documents along with mapping documents to assist the developers in their coding.
  • Involved in building various logics to handle Slowly Changing Dimensions, Change Data Capture, and Deletes for Incremental Loads in to the Data warehouse.
  • Involved in designing fact, dimension and aggregate tables for Data warehouse Star Schema.
  • Performed Reverse Engineering of the legacy application using DDL scripts in Erwin, and developed Logical and Physical data models for Central Model consolidation
  • Monitored the Data quality of the daily processes and ensure integrity of data was maintained to ensure TEMPeffective functioning of the departments.
  • Developed data mapping documents for integration into a central model and depicting data flow across systems & maintain all files into electronic filing system.
  • Worked and extracted data from various database sources like DB2, CSV, XML and Flat files into the DataStage.
  • Developed and programmed test scripts to identify and manage data inconsistencies and testing of ETL processes
  • Created data masking mappings to mask the sensitive data between production and test environment.
  • Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
  • Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and DB2.
  • Written SQL scripts to test the mappings and Developed Traceability Matrix of Business Requirements mapped to Test Scripts to ensure any Change Control in requirements leads to test case update.
  • Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
  • Involved in updating metadata repository with details on the nature and use of applications/data transformations to facilitate impact analysis.

We'd love your feedback!