We provide IT Staff Augmentation Services!

Data Engineer Resume

2.00/5 (Submit Your Rating)

Scottsdale, AZ

SUMMARY

  • 8 years of IT experience in a variety of industries, which includes hands on experience in Hadoop, Hive, Spark, SQOOP and experience in data quality, data governance, master data management and metadata management.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre - processing with Pig. Develop data ingestion frameworks, real-time processing solutions, and data processing/transformation frameworks.
  • Extensive experience in Big Data ecosystem and its various components such as SPARK, MapReduce, HDFS, HIVE, PIG, Sqoop, Zookeeper, Oozie and Flume. Well versed experience in Amazon Web Services (AWS) Cloud services like EC2, S3, EMR, Redshift. Experience in working with MapReduce Framework and Spark execution model.
  • Mastered in using different columnar file formats like JSON, ORC and Parquet formats. Developed the Pig UDF's to pre-process the data for analysis.
  • Used Python scripts to build a workflow in Autosys to automate the tasks in three zones in cluster. Expertise on coding in different technologies i.e., Python, shell scripting.
  • Experienced working with Business team for gathering the requirements and fully understand the business requirements. Designed and created data extracts, supporting Power BI, Tableau, or other visualization tools reporting applications.
  • Hands-on experience in programming with Resilient Distributed Datasets (RDDs), data frames and dataset API. Experience in Loading Essbase metadata with Oracle Data Integrator.
  • Optimize performance for data access requirements by choosing the appropriate native Hadoop file formats (Avro, Parquet, ORC etc) and compression codec, respectively.
  • Expertise in Data Extraction, Transformation, Loading, Data Analysis, Data Profiling, and SQL Tuning.
  • Experienced in Partitioning, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance. Experienced in writing custom Hive UDF's to in corporate business logic with Hive queries.
  • Experience in process improvement, Normalization/de-Normalization, data extraction, data cleansing, data manipulation on HIVE.
  • Experience in writing Sqoop command to import data from Relational database to Hdfs. Having experience in SQL Server and Oracle Database and in writing queries.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN
  • Experience in writing Sqoop command to import data from Relational database to Hdfs. Having experience in SQL Server and Oracle Database and in writing queries.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS. Good experience in working with concepts of Hadoop Architecture and its components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name node, and MapReduce concepts

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential, Scottsdale, AZ

Responsibilities:

  • Implemented solutions utilizing Advanced AWS S3, SNOWFLAKE integrated with Big Data/Hadoop Distribution Frameworks.
  • Used NiFi to Import data from source through API to AWS S3.
  • Experience in loading the data from AWS S3 and Snowflake to DB2 instances using snow pipe. Experienced working with Snowflake and DB2.
  • Used snowflake to perform event enrichment and to prepare various levels of user behavioral summaries.
  • Interacted with the infrastructure, network, database, application, and BA teams to ensure data quality and availability. Well experienced with Agile methodology with Bi-weekly sprint.
  • Actively participated in Sprint planning, Retrospectives, Daily Scrum, Sprint Backlog and 1on1 scrum meeting. Worked with Google Big Query to view the data.
  • Developed ETL/ELT frameworks for data transfer using NiFi. Supported warehouse management application systems and back-end Oracle 8i, 9i databases.
  • Performed fine-tuning of NiFi processors to improve the efficiency and overall processing time for the pipelines.
  • Analyzed the Business requirement for Oracle Data Integrator and mapped the architecture and used ODI for reverse engineering to retrieve metadata from data storage and load it to the repository.
  • Used Oracle Data Integrator Designer (ODI) to develop processes for extracting, cleansing, transforming, integrating, and loading data into data warehouse database.
  • Establish and roll out Data Governance standards, frameworks, documentation and tools. Worked with Airflow to trigger the jobs and scheduling them.
  • Experienced working with Snowflake queries and DB2 SQL queries. Created views in snowflake with specified fields based on the requirements.
  • Used snowflake to perform event enrichment and to prepare various levels of user behavioral summaries.
  • Interacted with the infrastructure, network, database, application, and BA teams to ensure data quality and availability.
  • Worked on Agile Spikes to define user stories and future of the project. Used Postman API to get convenient visual display of query result.
  • Well experienced in using Hashi vault to move NiFi flows to production
  • Implemented the workflows using Cron scheduler in NiFi to automate tasks.
  • Prepare, publish, and present data governance materials to increase awareness and literacy on data governance projects, leading practices, technology and partnerships.

Sr Data Engineer

Confidential, Rochester MN

Responsibilities:

  • Involved in Data Ware house design, data integration and data transformation using Apache Spark and Python.
  • Created/Setup EMR clusters for running data engineering work loads and data scientists.
  • Experience in data warehouse modeling techniques such as Kimball modeling
  • Experience in Conceptual data modeling, Logical data modeling and Physical data modeling.
  • Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.
  • Involved in design and deployment of a multitude of Cloud services on AWS stack such as Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM, EC2, EMR, RDS, Redshift while focusing on high-availability, fault tolerance, and auto-scaling in AWS CloudFormation.
  • Created data pipelines using data factory and data bricks for ETL processing.
  • Retrieved data from DBFS into Spark Data Frames, for running predictive analytics on data.
  • Used Hive Context which provides a superset of the functionality provided by SQLContext and preferred to write queries using the HiveQL parser to read data from Hive tables.
  • Hands on data engineering experience on Scala, Hadoop, EMR, spark, Kafka.
  • Experience in Exploratory Data Analysis (EDA), Feature Engineering, Data Visualization
  • Caching of RDDs for better performance and performing actions on each RDD.
  • Developed highly complex Python code, which is maintainable, easy to use, and satisfies application requirements, data processing and analytics using inbuilt libraries.
  • Developed and Implemented Data Solutions utilizing Azure Services like Event Hub, Azure Data Factory, ADLS, Databricks, Azure web apps, Azure SQL DB instances.
  • Involved in designing optimizing Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
  • Worked on Kafka REST API to collect and load the data on Hadoop file system and used Sqoop to load the data from relational databases.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.

Environment: PySpark, Hive, SQOOP, Kafka, Python, Spark streaming, DBFS, SQL Context, Spark RDD, REST API, Spark SQL, Hadoop, SQOOP, Parquet files, Oracle, SQL Server.

Data Engineer

Confidential, Patskala, Ohio

Responsibilities:

  • Responsible in developing the project from scratch using Python, Spark using python (Pyspark) in Agile model.
  • Experience in hive partitioning, bucketing and perform joins on hive tables.
  • Used bit bucket as common repositories for code sharing. Develop and deploy ETL logic with Oracle Data Integrator. Involved in building the ETL architecture and Source to Target mapping to load data into Data warehouse. Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
  • Develop our Data Governance operational strategy and corresponding frameworks, and short- and long-term roadmaps for maturing the organization's data governance position.
  • Worked on Spark SQL for table level validations and fetching meta data from static tables.
  • Created job execution engine using Hive, which tracks job status for each process level, file level and at validations level.
  • Involved in designing Hive schemas, using performance tuning techniques like partitioning, bucketing. Develop, implement and enhance the Data Warehouse data model and performing data loads using Oracle Data Integrator plan for the data owned by the Vanguard Business Applications organization.
  • Used Sqoop to import the data to Hadoop Distributed File System (HDFS) from RDBMS.
  • Used ETL to develop jobs for extracting, cleaning, transforming, and loading data into data warehouse.
  • Trained and lead team of associates and colleagues in understanding the functionality of framework and uplifted them technically in Hadoop, Spark and Bigdata technologies.
  • Own and lead the implementation of the Data Governance operational model across stakeholders.
  • Used Oozie job scheduler end to end data processing pipelines and scheduling the workflows. Maintained timely delivery in every sprint. Developed complex logics and generic code for different validations using Spark.

Data Engineer

Confidential

Responsibilities:

  • Create/Enhance the JCL Jobs, Procs, Batch & Online Programs. Preparation of Productionized Jobs.
  • Responsible for preparing Knowledge Management Docs for Team and shared across other projects which helped in reducing the efforts.
  • Responsible for helping team members by giving solutions who are into same as well as other applications of our project.
  • Experience in writing SQL scripts using GROUP, JOIN, etc. operations to transform raw data from several data sources into forming baseline data.
  • Actively participating in the code reviews, meetings and troubleshoot technical issues and solving them.
  • Map and assess current processes and structures for Data Governance and identify gaps and areas for improvement. Automate Data Governance reporting and monitoring to monitor data platform performance and adherence to data governance policies and standards.
  • Always involved in understanding of each application across the project by taking the cross functional knowledge.
  • Actively participating in the code reviews, meetings and troubleshoot technical issues and solving them. Map and assess current processes and structures for data governance and identify gaps and areas for improvement.
  • Lead and implement a quarter-year data governance roadmap. Wrote calculated columns, Measure’s query in power bi desktop to show good data analysis techniques.
  • Experience in organizing the database and tables in Oracle database as per the requirement of the incoming source file at the staging area.
  • Involved with the data ingestion team for creating different data pipeline solution for loading data from different sources (RDBMS, CSV and XML).
  • Created control M job with incremental load to store the data to snapshot area in Oracle DB.
  • Experience in data analysis in OLAP CUBES and in ORACLE databases
  • Lead and implement a quarter-year data governance roadmap. Wrote calculated columns, Measure’s query in power bi desktop to show good data analysis techniques.

Big Data Engineer

Confidential

Responsibilities:

  • Performing PoC for Big data solution using Hadoop for data loading and data querying
  • Writing Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
  • Using Sqoop to channel data from different sources of HDFS and RDBMS.
  • To meet specific business requirements wrote UDF's in Scala and Store Procedures Replaced the existing Map Reduce programs and Hive Queries into Spark application using Scala
  • Involved in Normalization and De-Normalization of existing tables for faster query retrieval.
  • Hands on experience in Airflow for orchestration and building custom Airflow operators.
  • Proficient in Azure Data Factory, Airflow 1.8 and Airflow 1.10 on multiple cloud platforms and able to understand the process of leveraging the Airflow Operators.
  • Developing and maintained data dictionary to create metadata reports for technical and business purpose.
  • Developing Json Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Cosmos Activity.
  • Experience in designing and developing Data Stage jobs to process FULL Data loads from SQL Server Source to Oracle Stage.
  • Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs
  • Extensively using MS Access to pull the data from various data bases and integrate the data.
  • Writing HiveQL as per the requirements and Processing data in Spark engine and store in Hive tables.
  • Responsible for importing data from PostgreSQL to HDFS, HIVE using SQOOP tool.
  • Experienced in migrating HiveQL into Impala to minimize query response time.
  • Responsible for performing extensive data validation using Hive.
  • Sqoop jobs, Hive scripts were created for data ingestion from relational databases to compare with historical data.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.

Environment: Erwin9.8, BigData3.0, Hadoop3.0, Oracle12c, PL/SQL, Scala, Spark-SQL, PySpark, Python, kafka1.1, SAS, SNS, SQL, MDM, Oozie4.3, SSIS, T-SQL, ETL, HDFS, Cosmos, Pig0.17, Sqoop1.4, MS Access.

We'd love your feedback!