We provide IT Staff Augmentation Services!

Data Engineer Resume

4.00/5 (Submit Your Rating)

SUMMARY

  • Big Data Engineer with 10 years of IT experience including 9 years of experience in the Big Data technologies. Expertise in Hadoop/Spark development experience, automation tools, and software design process. Outstanding communication skills, dedicated to maintaining up - to-date IT skills
  • Skilled in managing data analytics and data processing, database, and data-driven projects
  • Skilled in Architecture of Big Data Systems, ETL Pipelines, and Analytics Systems for diverse end-users
  • Skilled in Database systems and administration
  • Proficient in writing technical reports and documentation
  • Adept with various distributions such as Cloudera Hadoop, Hortonworks, MapR, and Elastic Cloud, Elasticsearch
  • Expert in bucketing and partitioning
  • Expert in Performance Optimization

TECHNICAL SKILLS

APACHE: Apache Ant, Apache Flume, Apache Hadoop, Apache YARN, Apache Hive, Apache Kafka, Apache MAVEN, Apache Oozie, Apache Spark, Apache Tez, Apache Zookeeper, Cloudera Impala, HDFS Hortonworks, MapR, MapReduce

SCRIPTING: HiveQL, MapReduce, XML, FTPPython, UNIX, Shell scripting, LINUX

OPERATING SYSTEMS: Unix/Linux, Windows 10, Ubuntu

FILE FORMATS: Parquet, Avro & JSON, ORC, text, csv

DISTRIBUTIONS: Cloudera, Hortonworks, AWS, Elastic, ELK, Cloudera CDH 4/5, Hortonworks HDP 2.5/2.6, Amazon Web Services (AWS)

DATA PROCESSING (COMPUTE) ENGINES: Apache Spark, Spark Streaming, Flink

DATA VISUALIZATION TOOLS: QlikView, Tableau, PowerBI, matplot

COMPUTE ENGINES: Apache Spark, Spark Streaming, Storm

DATABASE: Microsoft SQL Server Database (2005, 2008R2, 2012) Database & Data Structures, Apache Cassandra, Amazon Redshift, DynamoDB, Apache Hbase, Apache Hive, MongoDB

SOFTWARE: Microsoft Projec, VMWare, Microsoft Word, Excel, Outlook, PowerPoint; Technical Documentation Skills

PROFESSIONAL EXPERIENCE

Confidential

Data ENGINEER

Responsibilities:

  • Design and build data processing pipelines using tools and frameworks in the Hadoop ecosystem
  • PySpark streaming to receive real-time data using Kafka
  • Creating Hive tables, loading with data, and writing hive queries to process the data.
  • Split the JSON file into RDD level to be processed in parallel for better performance and fault tolerance
  • Designed Hive queries to perform data analysis, data transfer, and table design
  • Collected data using REST API, built HTTPS connection with client-server, sent GET request and collected response in Kafka producer
  • Wrote a Spark program to parse out the needed data by using Spark Context and select the columns with target information and assigned names
  • Configured Zookeeper to coordinate the servers in clusters to maintain the data consistency and to monitor services
  • Design and build ETL pipelines to automate the ingestion of structured and unstructured data
  • Design and Build pipelines to facilitate data analysis
  • Implement and configure big data technologies as well as tune processes for performance at scale
  • Working closely with the stakeholders & solution architect.
  • Ensuring architecture meets the business requirements.
  • Building highly scalable, robust & fault-tolerant systems.
  • Finding ways & methods to find the value out of existing data. Proficiency and knowledge of best practices with the Hadoop (YARN, HDFS, MapReduce)
  • AWS EMR to process big data across Hadoop clusters of virtual servers on Amazon Simple Storage Service (S3)
  • Automated AWS components like EC2 instances, Security groups, ELB, RDS, Lambda and IAM through AWS Cloud Formation templates
  • Installed, Configured and Managed AWS Tools such as ELK, Cloud Watch for Resource Monitoring
  • Work with engineering team members to explore and create interesting solutions while sharing knowledge within the team
  • Work across product teams to help solve customer-facing issues
  • Demonstrable experience designing technological solutions to complex data problems, developing & testing modular, reusable, efficient and scalable code to implement those solutions

Confidential

BIG DATA ENGINEER

Responsibilities:

  • Support, maintain and document Hadoop and MySQL data warehouse
  • Iterate and improve existing features in the pipeline as well as add new ones
  • Design, develop, document, and test new requirements in the data pipeline using BASH, FLUME, HDFS and SPARK in the Hadoop ecosystem
  • Provide full operational support - analyze code to identify root causes of production issues and provide solutions or workarounds and lead it to resolution
  • Participate in full development life cycle including requirements analysis, design, development, deployment, and operations support
  • Made and oversaw cloud VMs with AWS EC2 command-line clients and AWS administration reassure.
  • Used Spark DataFrame API over the Cloudera platform to perform analytics on Hive data.
  • Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
  • Used Ansible Python Script to generate inventory and push the deployment to AWS Instances.
  • Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets.
  • Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3) AWS Redshift
  • Implemented AWS Lambda functions to run scripts in response to events in the Amazon Dynamo DB table or S3.
  • Populating database tables via AWS Kinesis Firehose and AWS Redshift.
  • Automated the installation of ELK agent (file beat) with Ansible playbook. Developed KAFKA Queue System to Collect Log data without Data Loss and Publish to various Sources.
  • AWS Cloud Formation templates used for Terraform with existing plugins.
  • Developed AWS Cloud Formation templates to create a custom infrastructure of our pipeline
  • Implemented AWS IAM user roles and policies to authenticate and control access
  • Specified nodes and performed the data analysis queries on Amazon redshift clusters on AWS
  • Processed multiple terabytes of data stored in AWS using Elastic Map Reduce (EMR) to AWS Redshift

Confidential

BIG DATA ENGINEER

Responsibilities:

  • Work in a fast-paced agile development environment to quickly analyze, develop, and test potential use cases for the business
  • Develops and builds frameworks/prototypes that integrate Big Data and advanced analytics to make business decisions
  • Assist application development teams during application design and development for highly complex and critical data projects
  • Worked on AWS Kinesis for processing huge amounts of real-time data
  • Developed multiple Spark Streaming and batch Spark jobs using Java, Scala, and Python on AWS
  • RDS, Cloud Formation, AWS IAM and Security Group in Public and Private Subnets in VPC
  • Worked with AWS Lambda functions for event-driven processing to various AWS resources
  • Assist in Install and configuration of Hive, Sqoop, Flume, Oozie on the Hadoop cluster with latest patches
  • Created Hive queries to spot emerging trends by comparing Hadoop data with historical metrics
  • Loaded into ingested data into Hive Managed and External tables.
  • Wrote custom user define functions (UDF) for complex Hive queries (HQL)
  • Performed upgrades, patches and bug fixes in Hadoop in a cluster environment
  • Wrote shell scripts to automate workflows to pull data from various databases into Hadoop framework for users to access the data through Hive based views
  • Writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language
  • Built the Hive views on top of the source data tables, and built a secured provisioning
  • Used Cloudera Manager for installation and management of single-node and multi-node Hadoop cluster
  • Wrote shell scripts for automating the process of data loading
  • Work closely with development, test, documentation, and product management teams to deliver high-quality products and services in a fast-paced environment
  • Algorithm development on high-performance systems
  • Create data management policies, procedures, and standards
  • Working with the end-user to make sure the analytics transform data to knowledge in very focused and meaningful ways

International Paper, Data administrator

Confidential, GA

Responsibilities:

  • Executes moderately complex functional work tracks for the team
  • Work in an agile environment and continuously improve the agile processes
  • Maintain existing ETL workflows, data management, and data query components
  • Wrote database objects like Stored Procedures, Triggers for Oracle, MS SQL, Hive
  • Good knowledge in PL/SQL, HQL hands-on experience in writing medium level SQL queries
  • Good knowledge in Impala, Spark/Scala, Spark, Storm
  • Expertise in preparing the test cases, documenting and performing unit testing and Integration
  • Installed and configured Hive and also written Hive UDFs
  • Experience in Importing and exporting data into HDFS and Hive using Sqoop.
  • Developed Sqoop jobs to populate Hive external tables using incremental loads
  • Installed Oozie workflow engine to run multiple Hive jobs
  • Used Spark modules to store the data on HDFS
  • Develop automation and data collection frameworks
  • Develops innovative solutions to Big Data issues and challenges within the team
  • Known for being a smart, analytical thinker who approaches their work with logic and enthusiasm
  • Drive the optimization, testing and tooling to improve data quality

We'd love your feedback!