Data Engineer Resume
2.00/5 (Submit Your Rating)
SUMMARY
- Overall 7+ years of experience in designing and developing data engineering solutions, Big Data Analytics and Development, and administering database projects that includes installing, upgrading, configuring databases, performing deployments, working on capacity planning, and tuning database to optimize the application performance.
- Work closely with business professionals and application developers to architect, design, and develop applications, create complex data models, data pipelines, and manage the dataflow between applications. Work with BI professionals in creating business reports, manage the huge data, and tune the database for optimal performance.
- In depth knowledge of building Spark applications in Python in cluster/client mode and monitor and perform application health check using spark UI. Good understanding and hands - on working with Spark SQL, RDDs, and data frames and datasets.
- Excellent knowledge and hands-on experience with Hadoop Ecosystem components such as Map-Reduce, Spark, Pig, Hive, HBase, Sqoop, Yarn, Kafka, Zookeeper, and HDFS.
- Experience with installing Hadoop cluster and configuring different components on on-prem as well as on AWS EC2 using Cloudera’s distribution.
- Excellent knowledge of ETL/ELT for data pipelining and create data flows between applications using Airflow. Experience with creating complex SQL codes, functions, and procedures. Experience in building data pipeline in Kafka to fetch the data from OLTP and store in warehouse.
- Extensive experience with UNIX/Linux Shell Scripting and in Python programming for database operations using mysqlclient, cx Oracle; building data pipelines and performing data analysis using NumPy, Pandas, and Matplotlib.
- Excellent knowledge of Google Cloud Data Engineering platform like BigQuery, Dataproc, Dataflow, Cloud Storage, CloudSQL, and Cloud Bigtable. Excellent knowledge of AWS technology and hands-on working with EC2, RDS, S3, and IAM.
- Hands on experience with Spark Streaming and learning Spark ML.
TECHNICAL SKILLS
Big Data: Spark, Hadoop MapReduce, Hive, HBase, Airflow, Ambari, Kafka, Zookeeper.
Cloud: Google Cloud (VMs, Big Query, Dataflow, Dataproc, Fusion, CloudSQL). AWS (EC2, S3, VPC, IAM, Redshift, RDS).
Databases: Oracle, Cassandra, MySQL.
Database Tools: SQL*PLUS, PLSQL, Toad Data Modeler, Toad for Oracle, Oracle APEX
Languages: Python, PLSQL, UNIX/Linux Shell scripting
Operating Systems: RHEL (4, 5, 6, 7), CentOS, Sun Solaris, HP-Unix.
TECHNICAL SKILLS
Data Engineer
Confidential
Responsibilities:
- Confidential based Telecommunication Company which offers voice, data and video services and solutions through its wireless and wireline segment.
- Performing a role of a data engineer and database engineer for Verizon’s application database. The role includes creating data pipelines from application databases to Data Lake and warehouse and create stream and batch data processing pipelines.
- Maintaining more than 16 PB, 300 nodes Cloudera’s distribution Hadoop production and dev clusters. Perform daily health checks, work on alerts, and other related tasks.
- Developed spark applications using Spark SQL and RDDs. Excellent knowledge of Spark Streaming and learning Spark ML.
- Designed and developed streaming ETL application that is auto-scheduled (cron independent) and auto-scaled (able to add data extraction job during runtime ) in Python, Bash Script, and PLSQL that extract data from OLTP application databases and send the data to Kafka brokers.
- Installed, configured, and maintaining multi-node standalone Kafka-Zookeeper cluster for building streaming data pipelines that stream database health information from 200+ Oracle OLTP databases to the warehouse database.
- Developed application in PySpark for processing databases’ health information data and store it on oracle database. In depth knowledge of monitoring spark application through spark UI and good in debugging the failed spark jobs. Maintaining existing spark jobs and make changes as per business need.
- Created Hive databases, tables and queries for data analytics as and when needed, integrate them with data processing jobs, and drop them as a part of cleanup. Also working on maintaining HBase database used by other applications.
- Designed and developed in house database visualization tool built on Oracle Application Express platform that visualize the real-time database health information as well as shows management reports.
- Experienced as AWS cloud engineer to create and manage EC2 instances, S3 buckets, and configure RDS instances. Monitoring the performance, spinning off the instances regularly, and help developers to get access to the cloud instances.
- Created Python Flask application to deploy PLSQL code automatically onto the production database. Created adaptive database alerting system in Python, SQL, and Bash scripting that generates the database alerts almost in real time depending on thresholds which can be change on the fly.
Data Engineer
Confidential
Responsibilities:
- Developed Spark ETL scripts that collect application data and store into HDFS.
- Developed Spark jobs to load the database log files from Linux file system to the HDFS for further processing. Also, developed bash scripts for importing the Oracle database tables to HDFS using sqoop.
- Experienced with creating Hive databases, tables, and writing Hive queries for data analysis to meeting the business reporting needs.
- Developed spark scripts, UDFs using both data frames/SQL and RDD/MapReduce in Spark for data aggregation, manipulation, and ordering and finally put that back into OLTP through Sqoop.
- Experienced with designing, architecting, and deploying new applications. Performed its capacity planning and database tuning for optimal performance.
- Experienced with database engineering, creating data models, functions, and procedures, understanding of data pipelining through ETL jobs and customized Shell script jobs.
- Experienced with application developers to design and architect application from scratch, write database codes, and suggest changes in their code to optimize the application performance.
- Implemented strategies for data archival and worked with application developer teams to purge unwanted data to increase the database performance.
- Extensive experience with SQL/PLSQL programming and query tuning.
- As a part of performance tuning, worked on OEM to find long running queries, analyzed AWR, ASH reports. Designed and scheduled different jobs in database to monitor the database performance, find huge tables, purge unwanted data from tables etc.
- Automated DBA monitoring tasks using bash scripts such as monitoring ASM disk space, standby database log repots, tablespace report.