Data Engineer Resume
Pittsburgh, PennsylvaniA
SUMMARY
- Around 10+ years of Information Technology experience in Analysis, Design, Development and Implementation as a Data Engineer/Architect, Data Modeler and Data Analyst.
- 5 years of Data Engineering and Data Architecture Experience.
- Good understanding and knowledge with Agile and Waterfall environments.
- Excellent knowledge of waterfall and spiral methodologies of Software Development Life Cycle (SDLC).
- Developing data pipeline using Sqoop, and MapReduce to ingest workforce data into HDFS for analysis.
- Development level experience in Microsoft Azure providing data movement and scheduling functionality to cloud - based technologies such as Azure Blob Storage and Azure SQL Database.
- Hands on experience on AWS cloud services ( Amazon Redshift and Data Pipeline)
- Extensive Python scripting experience for Scheduling and Process Automation.
- Experience in Hadoop ecosystem experience in ingestion, storage, querying, processing and analysis of big data
- Excellent knowledge in Migrating servers, databases, and applications from on premise to AWS and Google Cloud Platform.
- Extensive knowledge of Big Data, Hadoop, MapReduce, Hive and other emerging technologies.
- Experience in performing analytics on structured data using Hive queries, operations
- Experience in implementing Real-Time streaming and analytics using various technologies i.e. Spark streaming and Kafka.
- Strong experience in database design, writing complex SQL queries and stored procedures using PL/SQL.
- Experience with Oozie Scheduler in setting up workflow jobs with Map/Reduce and Pig jobs
- Knowledge and experience of architecture and functionality of NOSQL DB like HBase and Cassandra.
- Experience in designingDB2 Architecturefor Modeling aDataWarehouse by using tools likeErwin, Power DesignerandER/Studio.
- Extensive experience in various Teradata utilities like Fastload, Multiload and Teradata SQL Assistant.
- Knowledge of Star Schema Modeling, and Snowflake modeling, FACT and Dimensions tables, physical and logical modeling.
- Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
- Experience in writing and executing unit, system, integration and UAT scripts in adata warehouse projects.
- Expert in generating on-demand and scheduled reports for business analysis or management decision using POWER BI.
- Experience with data transformations utilizing SnowSQL in Snowflake.
- Experience in developing ETL framework using Talend for extracting and processing data.
- Sound knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
- Experience with Tableau in analysis and creation of dashboard and user stories.
- Strong experience in using MS Excel and MS Access to dump the data and analyze based on business needs.
- Ability to learn and adapt quickly to the emerging new technologies.
PROFESSIONAL EXPERIENCE
Confidential
Data Engineer
Responsibilities:
- Working as a Data Engineer involved in the entire life cycle of the project starting from requirements gathering to end of system integration.
- Used JIRA as an agile tool to keep track of the stories that were worked on using the Agile methodology.
- Worked with business stakeholders to document Data Governance.
- Involved in designing & managing the data integration architecture based on ad-hoc, continuous and scheduled requests or operations.
- Migrated the Data using Azure database Migration Service (AMS).
- Migrated SQL Server and Oracle database to Microsoft Azure Cloud.
- Designed and implemented secure data pipelines into a Snowflake data warehouse from on-premises and cloud data sources.
- Gathered and documented MDM application, conversion and integration requirements.
- Created, maintained, and defined Data Architecture artifacts based on Enterprise Architecture requirements.
- Worked on Data Management interface requirements.
- Developed, implemented and managed Data Governance process.
- Developed complete end to end Big-data processing in hadoop eco system.
- Involved in Moving Data in and out of Windows Azure SQL Databases and Blob Storage.
- Created ETL Pipeline using Spark and Hive for ingest data from multiple sources.
- Automated Cube refresh using the Azure Functions.
- Designed ad Data Store and as per OLTP archive policy, where data is reverse shared between ODS and OLTP.
- Worked on ER/Studio for Conceptual, logical, physical data modeling and for generation of DDL scripts.
- Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Designed and developed user defined functions, stored procedures, triggers for Cosmos DB.
- Created and maintained optimal data pipeline architecture in cloud Microsoft Azure using Data Factory and Azure Data bricks.
- Worked with Azure SQL Database Import and Export Service.
- Developed MapReduce Programs for data analysis and data cleaning.
- Used Sqoop to import data into HDFS and Hive from other data systems.
- Worked on Azure Cloud technologies such as Azure SQL, Azure Data Lake (ADL), Azure Data Factory, BLOB Storage, and HD Insight.
- Involved in writing Pig scripts to transform raw data into forming baseline data.
- Created tables, views, secure views, user defined functions in Snowflake Cloud Data Warehouse.
- Used Azure Data Factory extensively for ingesting data from disparate source systems.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Developed workflows in Oozie for business requirements to extract the data using Sqoop.
- Implemented row-level security policies in Azure Synapse Analytics (SQL DW) and within Power BI data models
- Developed near real time data pipeline using spark
- Created and Configured Azure CosmosDB.
- Developed Python scripts to do file validations in Databricks and automated the process using ADF.
- Worked with Data Governance, Data Quality and Reporting.
- Involved in generating and documenting Metadata while designing OLTP and OLAP systems environment.
- Involved in MDM process including data modeling, ETL process, and prepared data mapping documents based on graph documents.
- Involved in Data Pipeline and ETL process and Testing.
- Imported data from RDBMS to Hadoop using Sqoop import.
- Implemented Disaster Recovery, backup migration and Azure deployments.
- Worked on Data Factory editor, to create linked services, tables, data sets, and pipelines
- Created automation and deployment templates for relational and NSQL databases.
- Attended weekly status meetings and represented the MDM Data platform.
Environment: ER/Studio, Hadoop3.3, Agile, Azure Data Lake, ODS, Spark3.2.0, Power BI, Snowflake Data Warehouse, Azure SQL, Python3.1, Azure CosmosDB, MDM, Azure Data Factory, Azure Synapse, HDFS, Hive3.2.1, Sqoop1.4.7, MapReduce, JIRA8.19.1, Kafka2.8.
Confidential - Pittsburgh, Pennsylvania
Data Architect/Data Engineer
Responsibilities:
- As a Data Architect/Engineer designed and deployed scalable, highly available, and fault tolerant systems on Azure.
- Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams.
- Worked with Azure Data Lake, Azure Data Factory, Databricks, Synapse (SQL Data Warehouse), Azure Blob storage, Azure Storage Explorer.
- Defined and streamline Data Governance process and procedures to achieve efficiency.
- Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
- Gathered requirements for MDM tool implementation and user workflow process.
- Extensively used Talend for Big data Integration using Hadoop
- Integrated NoSQL database like Hbase with Map Reduce to move bulk amount of data into HBase.
- Redesigned the Views in snowflake to increase the performance.
- Createddesignsand process flows on how tostandardize Power BI dashboardsto meet thebusiness requirement.
- Involved in the solution architecture and Design for data load and the migration of data to Hadoop.
- Played a vital role in Data Analysis of the Legacy data structures and ODS data structures.
- Created automated python scripts to convert the data from different sources and to generate the ETL pipelines.
- Worked with Azure Databricks, Azure Data Factory and Pyspark.
- Analyzed massive and highly complex HIVE data sets, performing ad-hoc analysis and data manipulation.
- Generated JSON files from the JSON models created for Zip Code, Group and Claims using Snowflake DB.
- Implemented Copy activity, custom Azure Data Factory pipeline activities.
- Worked on Partitioning, Bucketing, Join optimizations and query optimizations in Hive
- Used MapReduce programs for data cleaning and transformations and load the output into the Hive tables in different file formats
- Involved in designing dashboards and key measurement metrics of Data Governance and Data Quality programs.
- Architected and implemented ETL and data movement solutions using Azure Data Platform services (Azure Data Lake, Azure Data Factory, Databricks, Delta lake).
- Worked on snow-flaking the Dimensions to remove redundancy.
- Worked on Oozie workflow engine for job scheduling.
- Created Sqoop job with incremental load to populate Hive External tables.
- Developed MapReduce and Pig scripts to cleanse, transform the raw data into meaningful business information and uploaded it into Hive.
- Worked on loading data into SnowflakeDB in the cloud from various sources.
- Designed, configured and managed the backup and disaster recovery for HDFS data.
- UsedAzurereporting services to upload and download reports
- Implemented Apache Drill on Hadoop to join data from SQL and No SQL databases and store it in Hadoop.
- Created and Maintained Tables and views in Snowflake
- Extensively used Erwin as the main tool for modeling along with Visio.
- Involved in dimensional modeling of the data warehouse to design the business process.
- Involved in testing the XML files and checked whether data is parsed and loaded to staging tables.
- Involved in writing T-SQL programming to implement Stored Procedures.
Environment: Erwin9.8, Agile, Azure Data Lake, MDM, ODS, Azure Data Bricks, Azure Synapse, Hadoop3.0, HBase1.2, MS Visio, Python, SnowflakeDB, Oozie4.3, Pig0.17, Talend, HDFS, Hive, XML, MapReduce, Power BI, T-SQL.
Confidential - Columbus, OH
Data Modeler/Data Architect
Responsibilities:
- Created business requirementdocumentsandintegratedthe requirements and underlying platform functionality.
- Used Agile Methodology of Data Warehouse development using Kanbanize.
- Created and maintained data model/architecture standards, including Master Data Management(MDM).
- Provided suggestion to implement multitasking for existing Hive Architecture in Hadoop.
- Worked on POC to check various cloud offerings including Google Cloud Platform (GCP).
- Processed and loaded bound and unbound Data from Google pub/sub topic to BigQuery using cloud Dataflow with Python.
- Involved in Data Architecture, data mapping and Data architecture artifacts design.
- Created dashboard for the data governance metrics, using Tableau.
- Developed HIVE scripts to transfer data from and to HDFS.
- Created SQL Server Configurations, Performance Tuning of stored procedures, SSIS Packages.
- Created architecture stack blueprint for data access with NoSQL Database Cassandra.
- Used Rest API with Python to ingest Data from and some other site to BIG QUERY.
- Monitored BigQuery, Dataproc and cloud Data flow jobs via Stack driver for all the environment.
- Implemented data ingestion and handling clusters in real time processing using Kafka.
- Developed and implemented data cleansing, data security, data profiling and data monitoring processes.
- Built, maintained and tested infrastructure to aggregate critical business data into Google Cloud Platform (GCP) Big Query and GCP Storage for analysis.
- Built a program with Python and execute it in cloud Dataflow to run Data validation between raw source file and Big query tables.
- Written a Python program to maintain raw file archival in GCS bucket.
- Involved in using G-cloud function with Python to load Data in to BigQuery for on arrival csv files in GCS bucket.
- Participated in integration of MDM (Master Data Management) Hub and data warehouses.
- Implemented logical and physical relational database and maintained Database Objects in the data model using Erwin.
- Worked with Sqoop commands to import the data from different databases.
- Developed complex hive queries to support business analysis.
- Created user, allocation of table space quotas with privileges and roles for Oracle databases.
- Involved in building Data Models and Dimensional Modeling with 3NF, Star and Snowflake schemas for OLAP and Operational data store (ODS) applications.
Environment: GCP, BigQuery, Hadoop, Agile, Cassandra3.0, SSIS, Kafka1.1, RestAPI, Python, DataProc, GCS, Erwin, Sqoop, Oracle, ODS, MDM, OLAP, Tableau.
Confidential - Minneapolis, MN
Data Modeler
Responsibilities:
- Worked with Data Modeling for requirements gathering, business analysis and project coordination.
- Initiated and conducted JAD sessions inviting various teams to finalize the requireddata fields and their formats.
- Worked on reverse engineering on the existing data models and updates the data models.
- Designed the data aggregations on Hive for ETL processing on Amazon EMR to process data as per business requirement
- Created the DDL scripts using ER/Studio and source to target mappings to bring the data from Source to the warehouse.
- Involved in designing and deploying AWS Solutions using EC2, S3, RDS and Redshift.
- Worked on Teradata utilities like FLOAD, MLOAD and TPUMP to load data to stage and DWH.
- Exported the analyzed data to the RDBMS using Sqoop for to generate reports for the BI team.
- Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
- Used forward engineering approach for designing and creating databases for OLAP model.
- Performed Data Profiling and Data Quality Analysis to analyze and to support data quality controls
- Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design
- Implemented the Slowly changing dimension scheme (Type II & Type I) for most of the dimensions.
- Optimized and tuned ETL processes & SQL Queries for better performance.
- Created storage with Amazon S3 for storing data.
- Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
- Involved in workflows and monitored jobs using Informatica tools.
- Used AWS S3 to store large amount of data in identical/similar repository.
- Generated drill down and drill through reports using SSRS.
- Extracted data from Teradata database and loaded into Data warehouse.
- Created database objects like tables, views, Stored procedures, Packages using Oracle Utilities like PL/SQL and SQL* Plus.
- Performed Impact Analysis and Gap Analysis.
- Developed statistics and visual analysis for warranty data using MS Excel and Tableau.
- Performed UAT testing before Production Phase of the database components being built.
Environment: ER/Studio, Teradata, AWS(EC2, S3,& Redshift), SQL, PL/SQL, OLAP, OLTP, Infromatica, SSRS, MS Excel, Tableau.
Confidential
Data Analyst/Data Modeler
Responsibilities:
- Performed Data Analysis, Data Modeling, Data Profiling and Requirement Analysis.
- Performed requirement gathering with business groups and transform into Business Requirement Document (BRDs).
- Extensively usedDatavalidation by writing several complex SQL queries and Involved in back-end testing and worked withdataquality issues.
- Created a Data Mapping document after each assignment and wrote the transformation rules for each field as applicable.
- Introduced aDataDictionary for the process, which simplified a lot of the work around the project.
- Designed class and activity diagrams using Power Designer.
- Worked with developers on the database design and schema optimization and other DB2 aspects.
- Implemented Star Schema methodologies in modeling and designing the logical data model into Dimensional Models.
- Involved in SSIS Package Configuration, Development, Deployment and Support of SSIS.
- Worked on database design for OLTP and OLAP systems.
- Wrote a complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process.
- Used SQL Profiler for monitoring and troubleshooting performance issues in T-SQL code and stored procedures.
- Performed Verification, Validation, and Transformations on the Inputdata(Text files, XML files) before loading into target database.
- Used advanced Microsoft Excel to create pivot tables.
- Synthesized and translated Businessdataneeds into creative visualizations in Tableau
- Built Adhoc reports using stand-alone tables.
- Created UNIX shell scripts to be used in conjunction with files.
Environment: Power Designer, DB2, SSIS, OLAP, OLTP, SQL, PL/SQL, Tableau, T-SQL, MS Excel, UNIX.