Data ENGINEER Resume

SUMMARY

Big Data Engineer with 10 years of IT experience including 9 years of experience in the Big Data technologies. Expertise in Hadoop/Spark development experience, automation tools, and software design process. Outstanding communication skills, dedicated to maintaining up - to-date IT skills
Skilled in managing data analytics and data processing, database, and data-driven projects
Skilled in Architecture of Big Data Systems, ETL Pipelines, and Analytics Systems for diverse end-users
Skilled in Database systems and administration
Proficient in writing technical reports and documentation
Adept with various distributions such as Cloudera Hadoop, Hortonworks, MapR, and Elastic Cloud, Elasticsearch
Expert in bucketing and partitioning
Expert in Performance Optimization

TECHNICAL SKILLS

APACHE: Apache Ant, Apache Flume, Apache Hadoop, Apache YARN, Apache Hive, Apache Kafka, Apache MAVEN, Apache Oozie, Apache Spark, Apache Tez, Apache Zookeeper, Cloudera Impala, HDFS Hortonworks, MapR, MapReduce

SCRIPTING: HiveQL, MapReduce, XML, FTPPython, UNIX, Shell scripting, LINUX

OPERATING SYSTEMS: Unix/Linux, Windows 10, Ubuntu

FILE FORMATS: Parquet, Avro & JSON, ORC, text, csv

DISTRIBUTIONS: Cloudera, Hortonworks, AWS, Elastic, ELK, Cloudera CDH 4/5, Hortonworks HDP 2.5/2.6, Amazon Web Services (AWS)

DATA PROCESSING (COMPUTE) ENGINES: Apache Spark, Spark Streaming, Flink

DATA VISUALIZATION TOOLS: QlikView, Tableau, PowerBI, matplot

COMPUTE ENGINES: Apache Spark, Spark Streaming, Storm

DATABASE: Microsoft SQL Server Database (2005, 2008R2, 2012) Database & Data Structures, Apache Cassandra, Amazon Redshift, DynamoDB, Apache Hbase, Apache Hive, MongoDB

SOFTWARE: Microsoft Projec, VMWare, Microsoft Word, Excel, Outlook, PowerPoint; Technical Documentation Skills

PROFESSIONAL EXPERIENCE

Confidential

Data ENGINEER

Responsibilities:

Design and build data processing pipelines using tools and frameworks in the Hadoop ecosystem
PySpark streaming to receive real-time data using Kafka
Creating Hive tables, loading with data, and writing hive queries to process the data.
Split the JSON file into RDD level to be processed in parallel for better performance and fault tolerance
Designed Hive queries to perform data analysis, data transfer, and table design
Collected data using REST API, built HTTPS connection with client-server, sent GET request and collected response in Kafka producer
Wrote a Spark program to parse out the needed data by using Spark Context and select the columns with target information and assigned names
Configured Zookeeper to coordinate the servers in clusters to maintain the data consistency and to monitor services
Design and build ETL pipelines to automate the ingestion of structured and unstructured data
Design and Build pipelines to facilitate data analysis
Implement and configure big data technologies as well as tune processes for performance at scale
Working closely with the stakeholders & solution architect.
Ensuring architecture meets the business requirements.
Building highly scalable, robust & fault-tolerant systems.
Finding ways & methods to find the value out of existing data. Proficiency and knowledge of best practices with the Hadoop (YARN, HDFS, MapReduce)
AWS EMR to process big data across Hadoop clusters of virtual servers on Amazon Simple Storage Service (S3)
Automated AWS components like EC2 instances, Security groups, ELB, RDS, Lambda and IAM through AWS Cloud Formation templates
Installed, Configured and Managed AWS Tools such as ELK, Cloud Watch for Resource Monitoring
Work with engineering team members to explore and create interesting solutions while sharing knowledge within the team
Work across product teams to help solve customer-facing issues
Demonstrable experience designing technological solutions to complex data problems, developing & testing modular, reusable, efficient and scalable code to implement those solutions

Confidential

BIG DATA ENGINEER

Responsibilities:

Support, maintain and document Hadoop and MySQL data warehouse
Iterate and improve existing features in the pipeline as well as add new ones
Design, develop, document, and test new requirements in the data pipeline using BASH, FLUME, HDFS and SPARK in the Hadoop ecosystem
Provide full operational support - analyze code to identify root causes of production issues and provide solutions or workarounds and lead it to resolution
Participate in full development life cycle including requirements analysis, design, development, deployment, and operations support
Made and oversaw cloud VMs with AWS EC2 command-line clients and AWS administration reassure.
Used Spark DataFrame API over the Cloudera platform to perform analytics on Hive data.
Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
Used Ansible Python Script to generate inventory and push the deployment to AWS Instances.
Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets.
Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3) AWS Redshift
Implemented AWS Lambda functions to run scripts in response to events in the Amazon Dynamo DB table or S3.
Populating database tables via AWS Kinesis Firehose and AWS Redshift.
Automated the installation of ELK agent (file beat) with Ansible playbook. Developed KAFKA Queue System to Collect Log data without Data Loss and Publish to various Sources.
AWS Cloud Formation templates used for Terraform with existing plugins.
Developed AWS Cloud Formation templates to create a custom infrastructure of our pipeline
Implemented AWS IAM user roles and policies to authenticate and control access
Specified nodes and performed the data analysis queries on Amazon redshift clusters on AWS
Processed multiple terabytes of data stored in AWS using Elastic Map Reduce (EMR) to AWS Redshift

Confidential

BIG DATA ENGINEER

Responsibilities:

Work in a fast-paced agile development environment to quickly analyze, develop, and test potential use cases for the business
Develops and builds frameworks/prototypes that integrate Big Data and advanced analytics to make business decisions
Assist application development teams during application design and development for highly complex and critical data projects
Worked on AWS Kinesis for processing huge amounts of real-time data
Developed multiple Spark Streaming and batch Spark jobs using Java, Scala, and Python on AWS
RDS, Cloud Formation, AWS IAM and Security Group in Public and Private Subnets in VPC
Worked with AWS Lambda functions for event-driven processing to various AWS resources
Assist in Install and configuration of Hive, Sqoop, Flume, Oozie on the Hadoop cluster with latest patches
Created Hive queries to spot emerging trends by comparing Hadoop data with historical metrics
Loaded into ingested data into Hive Managed and External tables.
Wrote custom user define functions (UDF) for complex Hive queries (HQL)
Performed upgrades, patches and bug fixes in Hadoop in a cluster environment
Wrote shell scripts to automate workflows to pull data from various databases into Hadoop framework for users to access the data through Hive based views
Writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language
Built the Hive views on top of the source data tables, and built a secured provisioning
Used Cloudera Manager for installation and management of single-node and multi-node Hadoop cluster
Wrote shell scripts for automating the process of data loading
Work closely with development, test, documentation, and product management teams to deliver high-quality products and services in a fast-paced environment
Algorithm development on high-performance systems
Create data management policies, procedures, and standards
Working with the end-user to make sure the analytics transform data to knowledge in very focused and meaningful ways

International Paper, Data administrator

Confidential, GA

Responsibilities:

Executes moderately complex functional work tracks for the team
Work in an agile environment and continuously improve the agile processes
Maintain existing ETL workflows, data management, and data query components
Wrote database objects like Stored Procedures, Triggers for Oracle, MS SQL, Hive
Good knowledge in PL/SQL, HQL hands-on experience in writing medium level SQL queries
Good knowledge in Impala, Spark/Scala, Spark, Storm
Expertise in preparing the test cases, documenting and performing unit testing and Integration
Installed and configured Hive and also written Hive UDFs
Experience in Importing and exporting data into HDFS and Hive using Sqoop.
Developed Sqoop jobs to populate Hive external tables using incremental loads
Installed Oozie workflow engine to run multiple Hive jobs
Used Spark modules to store the data on HDFS
Develop automation and data collection frameworks
Develops innovative solutions to Big Data issues and challenges within the team
Known for being a smart, analytical thinker who approaches their work with logic and enthusiasm
Drive the optimization, testing and tooling to improve data quality

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship