Sr. Data Engineer Resume Menlo Park, CA - Hire IT People

SUMMARY

Above 8+ years of IT Experience in Data Engineering and Data Analysis with high proficiency in developing Data Warehouse & Business intelligence professional with applied information Technology.
Extensive experience in Relational and Dimensional Data modeling for creating Logical and Physical Design of Database and ER Diagrams using multiple data modeling tools like Erwin and ER Studio.
Experienced performing structural modifications using Map - Reduce, analyzing data using Hive and visualizing in dashboards using Tableau
Experienced in Data Management solution that covers DWH/Data Architecture design, Data Governance Implementation and Big Data.
Experienced using "Bigdata" work on Hadoop, Spark, PySpark, Hive, HDFS and other NoSQL platforms.
Experienced in all phases of the software development life cycle (SDLC), from requirements definition through implementation and supported my models in transformation and analysis phase
Experienced in Data Modeling using Dimensional Data Modeling, Star Schema modeling, Fact and Dimensions tables including Physical, Logical data modeling
Experienced with Data Conversion, Data Quality, and Data Profiling, Performance Tuning and System Testing and implementing RDBMS features.
Experienced with Business Process Modeling, Process Flow Modeling &Data flow modeling.
Expertise in implementing Security models for Dashboards, Row level, object Level, Role Level, Dashboard Level.
Excellent experience in Normalization (1NF, 2NF, 3NF and BCNF) and De-normalization techniques for TEMPeffective and optimum performance in OLTP and OLAP environments.
Excellent experience in Extract, Transfer and Load process using ETL tools like Data Stage, Informatica, Data Integrator and SSIS for Data migration and Data Warehousing projects.
Experienced in the development of Data Warehouse, Business Intelligence architecture that involves data integration and the conversion of data from multiple sources and platforms.
Expertise in Data Analysis using SQL on Oracle, MS SQL Server, DB2 & Teradata, Netezza.
Having good knowledge in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
Strong experience working with conceptual, logical and physical data modeling considering Meta data standards.
Experience working with Agile and Waterfall data modeling methodologies.
Experience in Ralph Kimball and Bill Inmon approaches.
Expertise in UML (class diagrams, object diagrams, use case diagrams, state diagrams, sequence diagrams, activity diagrams, and collaboration diagrams) as a business analysis methodology for application functionality designs using Rational Rose and MS-Visio.
Involved in various projects related to Data Modeling, System/Data Analysis, Design and Development for both OLTP and Data warehousing environments.
Facilitated data requirement meetings with business and technical stakeholders and resolved conflicts to drive decisions.
Experience in Data transformation and Data mapping from source to target database schemas and also data cleansing.
Experience in performance analysis and created partitions, indexes and Aggregate tables where necessary.
Experience with DBA tasks involving database creation, performance tuning, creation of indexes, creating and modifying table spaces for optimization purposes.
Performed extensive Data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
Having very Good exposure to ETL tool like Informatica.
Excellent communication skills, self-starter with ability to work with minimal guidance.

TECHNICAL SKILLS

Big Data technologies: MapReduce, HBase, HDFS, Sqoop, Spark, Hadoop, Hive, PIG, Impala.

Data Modeling Tools: ER/Studio 9.7/9.0, Erwin 9.8/9.5, Power Sybase Designer.

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED

Databases: Oracle 12c, Teradata R15, MS SQL Server 2019, DB2.

Testing and defect tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio & Visual Source Safe

Operating System: Windows 10, Unix, Sun Solaris

ETL/Data warehouse Tools: Informatica 10.0/9.6, SAP Business Objects XIR3.1/XIR2, Talend

BI Tools: Tableau, Power BI

Project Development Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.

PROFESSIONAL EXPERIENCE

Confidential - Menlo Park, CA

Sr. Data Engineer

Responsibilities:

As a Sr. Data Engineer designed and deployed scalable, highly available, and fault tolerant systems on Azure.
Involved in completeSDLClife cycle of big data project that includes requirement analysis, design, coding, testing and production.
Lead the estimation, review the estimates, identify the complexities and communicate to all the stakeholders.
Defined the business objectives comprehensively through discussions with business stakeholders, functional analysts and participating in requirement collection sessions.
Migrated on-primes environment on Cloud using MS Azure.
Performed data Ingestion for the incoming web feeds into the Data lake store which includes both structured and unstructured data.
Designed the business requirement collection approach based on the project scope and SDLC (Agile) methodology.
Migrated data warehouses to Snowflake Data warehouse.
Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
Installed and configured Hadoop Ecosystem components.
Defined virtual warehouse sizing for Snowflake for different type of workloads.
Extensively used Agile Method for daily scrum to discuss the project related information.
Worked with data ingestions from multiple sources into the Azure SQL data warehouse
Transformed and loading data into Azure SQL Database.
Wrote Spark applications for Data validation, cleansing, transformations and custom aggregations.
Developed HIVE scripts to transfer data from and to HDFS.
Implemented Hadoop based data warehouses, integrated Hadoop with Enterprise Data Warehouse systems.
Performed reverse engineering using Erwin to redefine entities, attributes and relationships existing database.
Development and maintenance of data pipeline on Azure Analytics platform using Azure Databricks.
Created Airflow Scheduling scripts in Python.
Ingested data into HDFS using Sqoop and scheduled an incremental load to HDFS.
Worked with Hadoop infrastructure to storage data in HDFS storage and use HIVE SQL to migrate underlying SQL codebase in Azure.
Created Data Pipeline to migrate data from Azure Blob Storage to Snowflake.
Worked on Snowflake modeling and highly proficient in data warehousing techniques for data cleansing, Slowly Changing Dimension phenomenon, surrogate key assignment and change data capture.
MaintainedNoSQLdatabaseto handle unstructured data, clean the data by removing invalidate data, unifying the format and rearranging the structure and load for following steps.
Participated inNoSQLdatabase maintaining withAzure Sql DB.
Involved in Kafka and building use case relevant to our environment.
Identified data within different data stores, such as tables, files, folders, and documents to create a dataset in pipeline using Azure HDInsight.
Optimized and updated UML Models (Visio) and RelationalDataModels for various applications.
Wrote Python scripts to parse XML documents and load the data in database.
Written DDL and DML statements for creating, altering tables and converting characters into numeric values.
Translated business concepts into XML vocabularies by designing XML Schemas with UML.
Worked on Data load using Azure Data factory using external table approach.
Automated recurring reports using SQL and Python and visualized them on BI platform like Power BI.
Designed and generated various dashboards, reports using various Power BI Visualizations.
Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom visualization tools.
Developed purging scripts and routines to purge data on Azure SQL Server and Azure Blob storage.
Developed Python Scripts for automation purpose and Component unit testing using Azure Emulator.
Involved in T-SQL queries and optimizing the queries in SQL Server.
Maintaining data storage in Azure Data Lake.

Confidential - Dublin, OH

Sr. Data Engineer

Responsibilities:

Massively involved as Data Engineer role to review business requirement and compose source to target data mapping documents.
Extensively used Agile Method for daily scrum to discuss the project related information.
Provided a summary of the Project's goals, and the specific expectation of business users from BI and how it aligns with the project goals.
Provided suggestion to implement multitasking for existing Hive Architecture in Hadoop also suggested UI customization in Hadoop.
Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
Responsible for a setup of 5 node development cluster for a Proof of Concept which was later implemented as a fulltime project by Fortune Brands.
Architect and designed the data flow for the collapse of 4 legacy data warehouses into an AWS Data Lake.
Installed and Configured Hadoop cluster using Amazon Web Services (AWS) for POC purposes.
Connected to AWS Redshift through Tableau to extract live data for real time analysis.
Developed Data mapping, Transformation and Cleansing rules for the Data Management involving OLTP and OLAP.
Used AWS S3 Buckets to store the file and injected the files into Snowflake tables using Snow Pipe and run deltas using Data pipelines.
Worked on complex SNOW SQL and Python Queries in Snowflake.
Configured the above jobs in Airflow.
Resolve AML related issues to ensure adoption of standards, guidelines in the organization. Resolution of day-to-day issues and worked with the users and testing team towards resolution of issues and fraud incident related tickets.
Used Erwin tool to develop a Conceptual Model based on business requirements analysis.
Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources
Load real time data from various data sources into HDFS using Kafka.
Worked on normalization techniques. Normalized the data into 3rd Normal Form (3NF).
Involved in Data profiling in order to detect and correct inaccurate data and maintain the data quality.
Worked on configuring and managing disaster recovery and backup on Cassandra Data.
Implemented Kafka producers create custom partitions, configured brokers and implemented High level consumers to implement data platform.
Designed and developed an entire module called CDC (change data capture) in python and deployed in AWS GLUE using PySpark library and python.
Involved in writing API for Amazon Lambda to manage some of the AWS Services.
Updated Python scripts to match training data with our database stored in AWS Cloud.
Normalized the database based on the new model developed to put them into the 3NF of thedatawarehouse.
Running entire Big Data on AWS environments.
Involved in extensiveDatavalidation by writing several complex SQL queries.
Performeddatacleaning anddatamanipulation activities using NZSQL utility.
Designed thedatamarts using the Ralph Kimball's DimensionalDataMart modeling methodology using Erwin.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Worked with MDM system steam with respect to technical aspects and generating reports.
Extracted Mega Data from Redshift AWS, and Elastic Search engine using SQL Queries to create reports.
Worked onDatagovernance,dataquality,datalineage establishment processes.
Executed change management processes surrounding new releases of SAS functionality
Worked in importing and cleansing of data from various sources.
Performed Data Cleaning, features scaling, features engineering using packages in python.
Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
Created various types of data visualizations using Python and Tableau.
Written and executed unit, system, integration and UAT scripts in a data warehouse project.

Confidential - San Antonio, TX

Sr. Data Modeler

Responsibilities:

Worked as a Sr. Data Modeler to generate Data Models using Erwin and developed relational database system.
Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
Responsible for data governance rules and standards to maintain the consistency of the business element names in the different data layers.
Developed Big Data solutions focused on pattern matching and predictive modeling
Worked on analyzing Hadoop stack and different big data analytic tools.
Extensively used agile methodology as the Organization Standard to implement the data Models.
Worked in an environment of Amazon Web Services (AWS) provisioning and used AWS services.
Created semantically rich logical data models (non-relational/NoSQL) that define the Business data requirements.
Converted conceptual models into logical models with detailed descriptions of entities and dimensions for Enterprise Data Warehouse.
Involved in creating Pipelines and Datasets to load the data onto Data Warehouse.
Coordinate with Data Architects to Design Big Data, Hadoop projects and provide for a designer that is an idea-driven.
Created database objects in AWS Redshift. Followed AWS best practices to convert data types from oracle to Redshift.
Worked on NoSQL Databases as Cassandra
Identified data organized into logical groupings and domains, independent of any application or system.
Developed data models and data migration strategies utilizing sound concepts of data modeling including star schema, snowflake schema.
Created S3 buckets in the AWS environment to store files, sometimes which are required to serve static content for a web application.
Developed Model aggregation layers and specific star schemas as subject areas within a logical and physical model.
Identified Facts &Dimensions Tables and established the Grain of Fact for Dimensional Models.
Configured Inbound/Outbound in AWS Security groups according to the requirements.
Established measures to chart progress related to the completeness and quality of metadata for enterprise information.
Developed the data dictionary for various projects for the standard data definitions related data analytics.
Managed storage in AWS using Elastic Block Storage, S3, created Volumes and configured Snapshots.
Generated the DDL of the target data model and attached it to the Jira to be deployed in different Environments.
Conducted data modeling for JAD sessions and communicated data related standards.
Investigate Data Quality Issues and provide recommendation & solution to address them.
Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
Validated the data of reports by writing SQL queries in PL/SQL Developer against ODS.
Prepared reports to summarize the daily data quality status and work activities.
Performed ad-hoc analyses, as needed, with the ability to comprehend analysis as needed.

Environment: Erwin 9.7, NoSQL, Sqoop, Cassandra 3.11, AWS, Hadoop 3.0, SQL, Pl/SQL

Confidential - Rosemont, IL

Data Modeler

Responsibilities:

Responsible for the data modeling design delivery, data model development, review, approval and Data warehouse implementation.
Extensively used Agile methodology as the Organization Standard to implement the data Models.
Provided a consultative approach with business users, asking questions to understand the business need and deriving the data flow, logical, and physical data models based on those needs.
Extracted Mega Data from Amazon Redshift, AWS, and Elastic Search engine using SQL Queries to create reports.
Worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
Created Physical & logical data model from the conceptual model and it's conversion into the physical database with the DDL's using forward engineering options in Erwin.
Used Erwin model mart for TEMPeffective model management of sharing, dividing and reusing model information and design for productivity improvement.
Used Model Manager Option in Erwin to synchronize the data models in Model Mart approach.
Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snow Flake Schemas
Involved in Dimensional modeling (Star Schema) of the Data warehouse and used Erwin to design the business process, dimensions and measured facts.
Worked on implementing and executing enterprise data governance and data quality framework.
Completed enhancement for MDM (Master data management) and suggested the implementation for hybrid MDM (Master Data Management)
Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
Created T-SQL queries as per the business requirements.
Storing and loading the data from HDFS to AWS S3 and backing up and Created tables in AWS cluster with S3 storage.
Designed Metadata Repository to store data definitions for entities, attributes & mappings between data warehouse and source system data elements.
Developed Data Mapping, Data Governance, and Transformation and cleansing rules for the Master Data Management Architecture
Worked on designing, implementing and deploying into production an Enterprise data warehouse
Used SQL on the new AWS Databases like Redshift and Relation Data Services.
Designed and Developed Shell Scripts, Data Conversions and Data Cleansing.
Involved in capturing data lineage, table and column data definitions, valid values and others necessary information in the data model.
Used SQL for Querying the database in UNIX environment.

Confidential

Data Analyst

Responsibilities:

Understood business processes, data entities, data producers, and data dependencies.
Conducted meetings with the business and technical team to gather necessary analytical data requirements in JAD sessions.
Created the automated build and deployment process for application, re-engineering setup for better user experience, and leading up to building a continuous integration system.
Involved in the Complete Software development life cycle (SDLC) to develop the application.
Used and supported database applications and tools for extraction, transformation and analysis of raw data.
Assisted in defining business requirements for the IT team and created BRD and functional specifications documents along with mapping documents to assist the developers in their coding.
Involved in building various logics to handle Slowly Changing Dimensions, Change Data Capture, and Deletes for Incremental Loads in to the Data warehouse.
Involved in designing fact, dimension and aggregate tables for Data warehouse Star Schema.
Performed Reverse Engineering of the legacy application using DDL scripts in Erwin, and developed Logical and Physical data models for Central Model consolidation
Monitored the Data quality of the daily processes and ensure integrity of data was maintained to ensure TEMPeffective functioning of the departments.
Developed data mapping documents for integration into a central model and depicting data flow across systems & maintain all files into electronic filing system.
Worked and extracted data from various database sources like DB2, CSV, XML and Flat files into the DataStage.
Developed and programmed test scripts to identify and manage data inconsistencies and testing of ETL processes
Created data masking mappings to mask the sensitive data between production and test environment.
Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and DB2.
Written SQL scripts to test the mappings and Developed Traceability Matrix of Business Requirements mapped to Test Scripts to ensure any Change Control in requirements leads to test case update.
Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
Involved in updating metadata repository with details on the nature and use of applications/data transformations to facilitate impact analysis.

We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

Menlo Park, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship