Data Engineer Resume Pittsburgh, Pennsylvania - Hire IT People

SUMMARY

Around 10+ years of Information Technology experience in Analysis, Design, Development and Implementation as a Data Engineer/Architect, Data Modeler and Data Analyst.
5 years of Data Engineering and Data Architecture Experience.
Good understanding and knowledge with Agile and Waterfall environments.
Excellent knowledge of waterfall and spiral methodologies of Software Development Life Cycle (SDLC).
Developing data pipeline using Sqoop, and MapReduce to ingest workforce data into HDFS for analysis.
Development level experience in Microsoft Azure providing data movement and scheduling functionality to cloud - based technologies such as Azure Blob Storage and Azure SQL Database.
Hands on experience on AWS cloud services ( Amazon Redshift and Data Pipeline)
Extensive Python scripting experience for Scheduling and Process Automation.
Experience in Hadoop ecosystem experience in ingestion, storage, querying, processing and analysis of big data
Excellent knowledge in Migrating servers, databases, and applications from on premise to AWS and Google Cloud Platform.
Extensive knowledge of Big Data, Hadoop, MapReduce, Hive and other emerging technologies.
Experience in performing analytics on structured data using Hive queries, operations
Experience in implementing Real-Time streaming and analytics using various technologies i.e. Spark streaming and Kafka.
Strong experience in database design, writing complex SQL queries and stored procedures using PL/SQL.
Experience with Oozie Scheduler in setting up workflow jobs with Map/Reduce and Pig jobs
Knowledge and experience of architecture and functionality of NOSQL DB like HBase and Cassandra.
Experience in designingDB2 Architecturefor Modeling aDataWarehouse by using tools likeErwin, Power DesignerandER/Studio.
Extensive experience in various Teradata utilities like Fastload, Multiload and Teradata SQL Assistant.
Knowledge of Star Schema Modeling, and Snowflake modeling, FACT and Dimensions tables, physical and logical modeling.
Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
Experience in writing and executing unit, system, integration and UAT scripts in adata warehouse projects.
Expert in generating on-demand and scheduled reports for business analysis or management decision using POWER BI.
Experience with data transformations utilizing SnowSQL in Snowflake.
Experience in developing ETL framework using Talend for extracting and processing data.
Sound knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
Experience with Tableau in analysis and creation of dashboard and user stories.
Strong experience in using MS Excel and MS Access to dump the data and analyze based on business needs.
Ability to learn and adapt quickly to the emerging new technologies.

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer

Responsibilities:

Working as a Data Engineer involved in the entire life cycle of the project starting from requirements gathering to end of system integration.
Used JIRA as an agile tool to keep track of the stories that were worked on using the Agile methodology.
Worked with business stakeholders to document Data Governance.
Involved in designing & managing the data integration architecture based on ad-hoc, continuous and scheduled requests or operations.
Migrated the Data using Azure database Migration Service (AMS).
Migrated SQL Server and Oracle database to Microsoft Azure Cloud.
Designed and implemented secure data pipelines into a Snowflake data warehouse from on-premises and cloud data sources.
Gathered and documented MDM application, conversion and integration requirements.
Created, maintained, and defined Data Architecture artifacts based on Enterprise Architecture requirements.
Worked on Data Management interface requirements.
Developed, implemented and managed Data Governance process.
Developed complete end to end Big-data processing in hadoop eco system.
Involved in Moving Data in and out of Windows Azure SQL Databases and Blob Storage.
Created ETL Pipeline using Spark and Hive for ingest data from multiple sources.
Automated Cube refresh using the Azure Functions.
Designed ad Data Store and as per OLTP archive policy, where data is reverse shared between ODS and OLTP.
Worked on ER/Studio for Conceptual, logical, physical data modeling and for generation of DDL scripts.
Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
Designed and developed user defined functions, stored procedures, triggers for Cosmos DB.
Created and maintained optimal data pipeline architecture in cloud Microsoft Azure using Data Factory and Azure Data bricks.
Worked with Azure SQL Database Import and Export Service.
Developed MapReduce Programs for data analysis and data cleaning.
Used Sqoop to import data into HDFS and Hive from other data systems.
Worked on Azure Cloud technologies such as Azure SQL, Azure Data Lake (ADL), Azure Data Factory, BLOB Storage, and HD Insight.
Involved in writing Pig scripts to transform raw data into forming baseline data.
Created tables, views, secure views, user defined functions in Snowflake Cloud Data Warehouse.
Used Azure Data Factory extensively for ingesting data from disparate source systems.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Developed workflows in Oozie for business requirements to extract the data using Sqoop.
Implemented row-level security policies in Azure Synapse Analytics (SQL DW) and within Power BI data models
Developed near real time data pipeline using spark
Created and Configured Azure CosmosDB.
Developed Python scripts to do file validations in Databricks and automated the process using ADF.
Worked with Data Governance, Data Quality and Reporting.
Involved in generating and documenting Metadata while designing OLTP and OLAP systems environment.
Involved in MDM process including data modeling, ETL process, and prepared data mapping documents based on graph documents.
Involved in Data Pipeline and ETL process and Testing.
Imported data from RDBMS to Hadoop using Sqoop import.
Implemented Disaster Recovery, backup migration and Azure deployments.
Worked on Data Factory editor, to create linked services, tables, data sets, and pipelines
Created automation and deployment templates for relational and NSQL databases.
Attended weekly status meetings and represented the MDM Data platform.

Environment: ER/Studio, Hadoop3.3, Agile, Azure Data Lake, ODS, Spark3.2.0, Power BI, Snowflake Data Warehouse, Azure SQL, Python3.1, Azure CosmosDB, MDM, Azure Data Factory, Azure Synapse, HDFS, Hive3.2.1, Sqoop1.4.7, MapReduce, JIRA8.19.1, Kafka2.8.

Confidential - Pittsburgh, Pennsylvania

Data Architect/Data Engineer

Responsibilities:

As a Data Architect/Engineer designed and deployed scalable, highly available, and fault tolerant systems on Azure.
Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams.
Worked with Azure Data Lake, Azure Data Factory, Databricks, Synapse (SQL Data Warehouse), Azure Blob storage, Azure Storage Explorer.
Defined and streamline Data Governance process and procedures to achieve efficiency.
Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
Gathered requirements for MDM tool implementation and user workflow process.
Extensively used Talend for Big data Integration using Hadoop
Integrated NoSQL database like Hbase with Map Reduce to move bulk amount of data into HBase.
Redesigned the Views in snowflake to increase the performance.
Createddesignsand process flows on how tostandardize Power BI dashboardsto meet thebusiness requirement.
Involved in the solution architecture and Design for data load and the migration of data to Hadoop.
Played a vital role in Data Analysis of the Legacy data structures and ODS data structures.
Created automated python scripts to convert the data from different sources and to generate the ETL pipelines.
Worked with Azure Databricks, Azure Data Factory and Pyspark.
Analyzed massive and highly complex HIVE data sets, performing ad-hoc analysis and data manipulation.
Generated JSON files from the JSON models created for Zip Code, Group and Claims using Snowflake DB.
Implemented Copy activity, custom Azure Data Factory pipeline activities.
Worked on Partitioning, Bucketing, Join optimizations and query optimizations in Hive
Used MapReduce programs for data cleaning and transformations and load the output into the Hive tables in different file formats
Involved in designing dashboards and key measurement metrics of Data Governance and Data Quality programs.
Architected and implemented ETL and data movement solutions using Azure Data Platform services (Azure Data Lake, Azure Data Factory, Databricks, Delta lake).
Worked on snow-flaking the Dimensions to remove redundancy.
Worked on Oozie workflow engine for job scheduling.
Created Sqoop job with incremental load to populate Hive External tables.
Developed MapReduce and Pig scripts to cleanse, transform the raw data into meaningful business information and uploaded it into Hive.
Worked on loading data into SnowflakeDB in the cloud from various sources.
Designed, configured and managed the backup and disaster recovery for HDFS data.
UsedAzurereporting services to upload and download reports
Implemented Apache Drill on Hadoop to join data from SQL and No SQL databases and store it in Hadoop.
Created and Maintained Tables and views in Snowflake
Extensively used Erwin as the main tool for modeling along with Visio.
Involved in dimensional modeling of the data warehouse to design the business process.
Involved in testing the XML files and checked whether data is parsed and loaded to staging tables.
Involved in writing T-SQL programming to implement Stored Procedures.

Environment: Erwin9.8, Agile, Azure Data Lake, MDM, ODS, Azure Data Bricks, Azure Synapse, Hadoop3.0, HBase1.2, MS Visio, Python, SnowflakeDB, Oozie4.3, Pig0.17, Talend, HDFS, Hive, XML, MapReduce, Power BI, T-SQL.

Confidential - Columbus, OH

Data Modeler/Data Architect

Responsibilities:

Created business requirementdocumentsandintegratedthe requirements and underlying platform functionality.
Used Agile Methodology of Data Warehouse development using Kanbanize.
Created and maintained data model/architecture standards, including Master Data Management(MDM).
Provided suggestion to implement multitasking for existing Hive Architecture in Hadoop.
Worked on POC to check various cloud offerings including Google Cloud Platform (GCP).
Processed and loaded bound and unbound Data from Google pub/sub topic to BigQuery using cloud Dataflow with Python.
Involved in Data Architecture, data mapping and Data architecture artifacts design.
Created dashboard for the data governance metrics, using Tableau.
Developed HIVE scripts to transfer data from and to HDFS.
Created SQL Server Configurations, Performance Tuning of stored procedures, SSIS Packages.
Created architecture stack blueprint for data access with NoSQL Database Cassandra.
Used Rest API with Python to ingest Data from and some other site to BIG QUERY.
Monitored BigQuery, Dataproc and cloud Data flow jobs via Stack driver for all the environment.
Implemented data ingestion and handling clusters in real time processing using Kafka.
Developed and implemented data cleansing, data security, data profiling and data monitoring processes.
Built, maintained and tested infrastructure to aggregate critical business data into Google Cloud Platform (GCP) Big Query and GCP Storage for analysis.
Built a program with Python and execute it in cloud Dataflow to run Data validation between raw source file and Big query tables.
Written a Python program to maintain raw file archival in GCS bucket.
Involved in using G-cloud function with Python to load Data in to BigQuery for on arrival csv files in GCS bucket.
Participated in integration of MDM (Master Data Management) Hub and data warehouses.
Implemented logical and physical relational database and maintained Database Objects in the data model using Erwin.
Worked with Sqoop commands to import the data from different databases.
Developed complex hive queries to support business analysis.
Created user, allocation of table space quotas with privileges and roles for Oracle databases.
Involved in building Data Models and Dimensional Modeling with 3NF, Star and Snowflake schemas for OLAP and Operational data store (ODS) applications.

Environment: GCP, BigQuery, Hadoop, Agile, Cassandra3.0, SSIS, Kafka1.1, RestAPI, Python, DataProc, GCS, Erwin, Sqoop, Oracle, ODS, MDM, OLAP, Tableau.

Confidential - Minneapolis, MN

Data Modeler

Responsibilities:

Worked with Data Modeling for requirements gathering, business analysis and project coordination.
Initiated and conducted JAD sessions inviting various teams to finalize the requireddata fields and their formats.
Worked on reverse engineering on the existing data models and updates the data models.
Designed the data aggregations on Hive for ETL processing on Amazon EMR to process data as per business requirement
Created the DDL scripts using ER/Studio and source to target mappings to bring the data from Source to the warehouse.
Involved in designing and deploying AWS Solutions using EC2, S3, RDS and Redshift.
Worked on Teradata utilities like FLOAD, MLOAD and TPUMP to load data to stage and DWH.
Exported the analyzed data to the RDBMS using Sqoop for to generate reports for the BI team.
Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
Used forward engineering approach for designing and creating databases for OLAP model.
Performed Data Profiling and Data Quality Analysis to analyze and to support data quality controls
Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design
Implemented the Slowly changing dimension scheme (Type II & Type I) for most of the dimensions.
Optimized and tuned ETL processes & SQL Queries for better performance.
Created storage with Amazon S3 for storing data.
Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
Involved in workflows and monitored jobs using Informatica tools.
Used AWS S3 to store large amount of data in identical/similar repository.
Generated drill down and drill through reports using SSRS.
Extracted data from Teradata database and loaded into Data warehouse.
Created database objects like tables, views, Stored procedures, Packages using Oracle Utilities like PL/SQL and SQL* Plus.
Performed Impact Analysis and Gap Analysis.
Developed statistics and visual analysis for warranty data using MS Excel and Tableau.
Performed UAT testing before Production Phase of the database components being built.

Environment: ER/Studio, Teradata, AWS(EC2, S3,& Redshift), SQL, PL/SQL, OLAP, OLTP, Infromatica, SSRS, MS Excel, Tableau.

Confidential

Data Analyst/Data Modeler

Responsibilities:

Performed Data Analysis, Data Modeling, Data Profiling and Requirement Analysis.
Performed requirement gathering with business groups and transform into Business Requirement Document (BRDs).
Extensively usedDatavalidation by writing several complex SQL queries and Involved in back-end testing and worked withdataquality issues.
Created a Data Mapping document after each assignment and wrote the transformation rules for each field as applicable.
Introduced aDataDictionary for the process, which simplified a lot of the work around the project.
Designed class and activity diagrams using Power Designer.
Worked with developers on the database design and schema optimization and other DB2 aspects.
Implemented Star Schema methodologies in modeling and designing the logical data model into Dimensional Models.
Involved in SSIS Package Configuration, Development, Deployment and Support of SSIS.
Worked on database design for OLTP and OLAP systems.
Wrote a complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process.
Used SQL Profiler for monitoring and troubleshooting performance issues in T-SQL code and stored procedures.
Performed Verification, Validation, and Transformations on the Inputdata(Text files, XML files) before loading into target database.
Used advanced Microsoft Excel to create pivot tables.
Synthesized and translated Businessdataneeds into creative visualizations in Tableau
Built Adhoc reports using stand-alone tables.
Created UNIX shell scripts to be used in conjunction with files.

Environment: Power Designer, DB2, SSIS, OLAP, OLTP, SQL, PL/SQL, Tableau, T-SQL, MS Excel, UNIX.

We provide IT Staff Augmentation Services!

Data Engineer Resume

Pittsburgh, PennsylvaniA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship