We provide IT Staff Augmentation Services!

Data Engineer Resume

3.00/5 (Submit Your Rating)

Pittsburgh, PennsylvaniA

SUMMARY

  • Around 10+ years of Information Technology experience in Analysis, Design, Development and Implementation as a Data Engineer/Architect, Data Modeler and Data Analyst.
  • 5 years of Data Engineering and Data Architecture Experience.
  • Good understanding and knowledge with Agile and Waterfall environments.
  • Excellent knowledge of waterfall and spiral methodologies of Software Development Life Cycle (SDLC).
  • Developing data pipeline using Sqoop, and MapReduce to ingest workforce data into HDFS for analysis.
  • Development level experience in Microsoft Azure providing data movement and scheduling functionality to cloud - based technologies such as Azure Blob Storage and Azure SQL Database.
  • Hands on experience on AWS cloud services ( Amazon Redshift and Data Pipeline)
  • Extensive Python scripting experience for Scheduling and Process Automation.
  • Experience in Hadoop ecosystem experience in ingestion, storage, querying, processing and analysis of big data
  • Excellent knowledge in Migrating servers, databases, and applications from on premise to AWS and Google Cloud Platform.
  • Extensive knowledge of Big Data, Hadoop, MapReduce, Hive and other emerging technologies.
  • Experience in performing analytics on structured data using Hive queries, operations
  • Experience in implementing Real-Time streaming and analytics using various technologies i.e. Spark streaming and Kafka.
  • Strong experience in database design, writing complex SQL queries and stored procedures using PL/SQL.
  • Experience with Oozie Scheduler in setting up workflow jobs with Map/Reduce and Pig jobs
  • Knowledge and experience of architecture and functionality of NOSQL DB like HBase and Cassandra.
  • Experience in designingDB2 Architecturefor Modeling aDataWarehouse by using tools likeErwin, Power DesignerandER/Studio.
  • Extensive experience in various Teradata utilities like Fastload, Multiload and Teradata SQL Assistant.
  • Knowledge of Star Schema Modeling, and Snowflake modeling, FACT and Dimensions tables, physical and logical modeling.
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
  • Experience in writing and executing unit, system, integration and UAT scripts in adata warehouse projects.
  • Expert in generating on-demand and scheduled reports for business analysis or management decision using POWER BI.
  • Experience with data transformations utilizing SnowSQL in Snowflake.
  • Experience in developing ETL framework using Talend for extracting and processing data.
  • Sound knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
  • Experience with Tableau in analysis and creation of dashboard and user stories.
  • Strong experience in using MS Excel and MS Access to dump the data and analyze based on business needs.
  • Ability to learn and adapt quickly to the emerging new technologies.

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer

Responsibilities:

  • Working as a Data Engineer involved in the entire life cycle of the project starting from requirements gathering to end of system integration.
  • Used JIRA as an agile tool to keep track of the stories that were worked on using the Agile methodology.
  • Worked with business stakeholders to document Data Governance.
  • Involved in designing & managing the data integration architecture based on ad-hoc, continuous and scheduled requests or operations.
  • Migrated the Data using Azure database Migration Service (AMS).
  • Migrated SQL Server and Oracle database to Microsoft Azure Cloud.
  • Designed and implemented secure data pipelines into a Snowflake data warehouse from on-premises and cloud data sources.
  • Gathered and documented MDM application, conversion and integration requirements.
  • Created, maintained, and defined Data Architecture artifacts based on Enterprise Architecture requirements.
  • Worked on Data Management interface requirements.
  • Developed, implemented and managed Data Governance process.
  • Developed complete end to end Big-data processing in hadoop eco system.
  • Involved in Moving Data in and out of Windows Azure SQL Databases and Blob Storage.
  • Created ETL Pipeline using Spark and Hive for ingest data from multiple sources.
  • Automated Cube refresh using the Azure Functions.
  • Designed ad Data Store and as per OLTP archive policy, where data is reverse shared between ODS and OLTP.
  • Worked on ER/Studio for Conceptual, logical, physical data modeling and for generation of DDL scripts.
  • Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
  • Designed and developed user defined functions, stored procedures, triggers for Cosmos DB.
  • Created and maintained optimal data pipeline architecture in cloud Microsoft Azure using Data Factory and Azure Data bricks.
  • Worked with Azure SQL Database Import and Export Service.
  • Developed MapReduce Programs for data analysis and data cleaning.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Worked on Azure Cloud technologies such as Azure SQL, Azure Data Lake (ADL), Azure Data Factory, BLOB Storage, and HD Insight.
  • Involved in writing Pig scripts to transform raw data into forming baseline data.
  • Created tables, views, secure views, user defined functions in Snowflake Cloud Data Warehouse.
  • Used Azure Data Factory extensively for ingesting data from disparate source systems.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Developed workflows in Oozie for business requirements to extract the data using Sqoop.
  • Implemented row-level security policies in Azure Synapse Analytics (SQL DW) and within Power BI data models
  • Developed near real time data pipeline using spark
  • Created and Configured Azure CosmosDB.
  • Developed Python scripts to do file validations in Databricks and automated the process using ADF.
  • Worked with Data Governance, Data Quality and Reporting.
  • Involved in generating and documenting Metadata while designing OLTP and OLAP systems environment.
  • Involved in MDM process including data modeling, ETL process, and prepared data mapping documents based on graph documents.
  • Involved in Data Pipeline and ETL process and Testing.
  • Imported data from RDBMS to Hadoop using Sqoop import.
  • Implemented Disaster Recovery, backup migration and Azure deployments.
  • Worked on Data Factory editor, to create linked services, tables, data sets, and pipelines
  • Created automation and deployment templates for relational and NSQL databases.
  • Attended weekly status meetings and represented the MDM Data platform.

Environment: ER/Studio, Hadoop3.3, Agile, Azure Data Lake, ODS, Spark3.2.0, Power BI, Snowflake Data Warehouse, Azure SQL, Python3.1, Azure CosmosDB, MDM, Azure Data Factory, Azure Synapse, HDFS, Hive3.2.1, Sqoop1.4.7, MapReduce, JIRA8.19.1, Kafka2.8.

Confidential - Pittsburgh, Pennsylvania

Data Architect/Data Engineer

Responsibilities:

  • As a Data Architect/Engineer designed and deployed scalable, highly available, and fault tolerant systems on Azure.
  • Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams.
  • Worked with Azure Data Lake, Azure Data Factory, Databricks, Synapse (SQL Data Warehouse), Azure Blob storage, Azure Storage Explorer.
  • Defined and streamline Data Governance process and procedures to achieve efficiency.
  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
  • Gathered requirements for MDM tool implementation and user workflow process.
  • Extensively used Talend for Big data Integration using Hadoop
  • Integrated NoSQL database like Hbase with Map Reduce to move bulk amount of data into HBase.
  • Redesigned the Views in snowflake to increase the performance.
  • Createddesignsand process flows on how tostandardize Power BI dashboardsto meet thebusiness requirement.
  • Involved in the solution architecture and Design for data load and the migration of data to Hadoop.
  • Played a vital role in Data Analysis of the Legacy data structures and ODS data structures.
  • Created automated python scripts to convert the data from different sources and to generate the ETL pipelines.
  • Worked with Azure Databricks, Azure Data Factory and Pyspark.
  • Analyzed massive and highly complex HIVE data sets, performing ad-hoc analysis and data manipulation.
  • Generated JSON files from the JSON models created for Zip Code, Group and Claims using Snowflake DB.
  • Implemented Copy activity, custom Azure Data Factory pipeline activities.
  • Worked on Partitioning, Bucketing, Join optimizations and query optimizations in Hive
  • Used MapReduce programs for data cleaning and transformations and load the output into the Hive tables in different file formats
  • Involved in designing dashboards and key measurement metrics of Data Governance and Data Quality programs.
  • Architected and implemented ETL and data movement solutions using Azure Data Platform services (Azure Data Lake, Azure Data Factory, Databricks, Delta lake).
  • Worked on snow-flaking the Dimensions to remove redundancy.
  • Worked on Oozie workflow engine for job scheduling.
  • Created Sqoop job with incremental load to populate Hive External tables.
  • Developed MapReduce and Pig scripts to cleanse, transform the raw data into meaningful business information and uploaded it into Hive.
  • Worked on loading data into SnowflakeDB in the cloud from various sources.
  • Designed, configured and managed the backup and disaster recovery for HDFS data.
  • UsedAzurereporting services to upload and download reports
  • Implemented Apache Drill on Hadoop to join data from SQL and No SQL databases and store it in Hadoop.
  • Created and Maintained Tables and views in Snowflake
  • Extensively used Erwin as the main tool for modeling along with Visio.
  • Involved in dimensional modeling of the data warehouse to design the business process.
  • Involved in testing the XML files and checked whether data is parsed and loaded to staging tables.
  • Involved in writing T-SQL programming to implement Stored Procedures.

Environment: Erwin9.8, Agile, Azure Data Lake, MDM, ODS, Azure Data Bricks, Azure Synapse, Hadoop3.0, HBase1.2, MS Visio, Python, SnowflakeDB, Oozie4.3, Pig0.17, Talend, HDFS, Hive, XML, MapReduce, Power BI, T-SQL.

Confidential - Columbus, OH

Data Modeler/Data Architect

Responsibilities:

  • Created business requirementdocumentsandintegratedthe requirements and underlying platform functionality.
  • Used Agile Methodology of Data Warehouse development using Kanbanize.
  • Created and maintained data model/architecture standards, including Master Data Management(MDM).
  • Provided suggestion to implement multitasking for existing Hive Architecture in Hadoop.
  • Worked on POC to check various cloud offerings including Google Cloud Platform (GCP).
  • Processed and loaded bound and unbound Data from Google pub/sub topic to BigQuery using cloud Dataflow with Python.
  • Involved in Data Architecture, data mapping and Data architecture artifacts design.
  • Created dashboard for the data governance metrics, using Tableau.
  • Developed HIVE scripts to transfer data from and to HDFS.
  • Created SQL Server Configurations, Performance Tuning of stored procedures, SSIS Packages.
  • Created architecture stack blueprint for data access with NoSQL Database Cassandra.
  • Used Rest API with Python to ingest Data from and some other site to BIG QUERY.
  • Monitored BigQuery, Dataproc and cloud Data flow jobs via Stack driver for all the environment.
  • Implemented data ingestion and handling clusters in real time processing using Kafka.
  • Developed and implemented data cleansing, data security, data profiling and data monitoring processes.
  • Built, maintained and tested infrastructure to aggregate critical business data into Google Cloud Platform (GCP) Big Query and GCP Storage for analysis.
  • Built a program with Python and execute it in cloud Dataflow to run Data validation between raw source file and Big query tables.
  • Written a Python program to maintain raw file archival in GCS bucket.
  • Involved in using G-cloud function with Python to load Data in to BigQuery for on arrival csv files in GCS bucket.
  • Participated in integration of MDM (Master Data Management) Hub and data warehouses.
  • Implemented logical and physical relational database and maintained Database Objects in the data model using Erwin.
  • Worked with Sqoop commands to import the data from different databases.
  • Developed complex hive queries to support business analysis.
  • Created user, allocation of table space quotas with privileges and roles for Oracle databases.
  • Involved in building Data Models and Dimensional Modeling with 3NF, Star and Snowflake schemas for OLAP and Operational data store (ODS) applications.

Environment: GCP, BigQuery, Hadoop, Agile, Cassandra3.0, SSIS, Kafka1.1, RestAPI, Python, DataProc, GCS, Erwin, Sqoop, Oracle, ODS, MDM, OLAP, Tableau.

Confidential - Minneapolis, MN

Data Modeler

Responsibilities:

  • Worked with Data Modeling for requirements gathering, business analysis and project coordination.
  • Initiated and conducted JAD sessions inviting various teams to finalize the requireddata fields and their formats.
  • Worked on reverse engineering on the existing data models and updates the data models.
  • Designed the data aggregations on Hive for ETL processing on Amazon EMR to process data as per business requirement
  • Created the DDL scripts using ER/Studio and source to target mappings to bring the data from Source to the warehouse.
  • Involved in designing and deploying AWS Solutions using EC2, S3, RDS and Redshift.
  • Worked on Teradata utilities like FLOAD, MLOAD and TPUMP to load data to stage and DWH.
  • Exported the analyzed data to the RDBMS using Sqoop for to generate reports for the BI team.
  • Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
  • Used forward engineering approach for designing and creating databases for OLAP model.
  • Performed Data Profiling and Data Quality Analysis to analyze and to support data quality controls
  • Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design
  • Implemented the Slowly changing dimension scheme (Type II & Type I) for most of the dimensions.
  • Optimized and tuned ETL processes & SQL Queries for better performance.
  • Created storage with Amazon S3 for storing data.
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
  • Involved in workflows and monitored jobs using Informatica tools.
  • Used AWS S3 to store large amount of data in identical/similar repository.
  • Generated drill down and drill through reports using SSRS.
  • Extracted data from Teradata database and loaded into Data warehouse.
  • Created database objects like tables, views, Stored procedures, Packages using Oracle Utilities like PL/SQL and SQL* Plus.
  • Performed Impact Analysis and Gap Analysis.
  • Developed statistics and visual analysis for warranty data using MS Excel and Tableau.
  • Performed UAT testing before Production Phase of the database components being built.

Environment: ER/Studio, Teradata, AWS(EC2, S3,& Redshift), SQL, PL/SQL, OLAP, OLTP, Infromatica, SSRS, MS Excel, Tableau.

Confidential

Data Analyst/Data Modeler

Responsibilities:

  • Performed Data Analysis, Data Modeling, Data Profiling and Requirement Analysis.
  • Performed requirement gathering with business groups and transform into Business Requirement Document (BRDs).
  • Extensively usedDatavalidation by writing several complex SQL queries and Involved in back-end testing and worked withdataquality issues.
  • Created a Data Mapping document after each assignment and wrote the transformation rules for each field as applicable.
  • Introduced aDataDictionary for the process, which simplified a lot of the work around the project.
  • Designed class and activity diagrams using Power Designer.
  • Worked with developers on the database design and schema optimization and other DB2 aspects.
  • Implemented Star Schema methodologies in modeling and designing the logical data model into Dimensional Models.
  • Involved in SSIS Package Configuration, Development, Deployment and Support of SSIS.
  • Worked on database design for OLTP and OLAP systems.
  • Wrote a complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process.
  • Used SQL Profiler for monitoring and troubleshooting performance issues in T-SQL code and stored procedures.
  • Performed Verification, Validation, and Transformations on the Inputdata(Text files, XML files) before loading into target database.
  • Used advanced Microsoft Excel to create pivot tables.
  • Synthesized and translated Businessdataneeds into creative visualizations in Tableau
  • Built Adhoc reports using stand-alone tables.
  • Created UNIX shell scripts to be used in conjunction with files.

Environment: Power Designer, DB2, SSIS, OLAP, OLTP, SQL, PL/SQL, Tableau, T-SQL, MS Excel, UNIX.

We'd love your feedback!