DATA ENGINEER Resume

SUMMARY

7 years of experience in development of custom Hadoop Big Data solutions, platforms, pipelines, data migration, and data visualizations.
Ability to troubleshoot and tune relevant programming languages like SQL, Java, Python, Scala, PIG, Hive, RDDs, DataFrame & MapReduce. Able to design elegant solutions through the use of problem statements.
Created classes that simulate real - life objects and write loops to perform actions on your data.
AWS tools (Redshift, Kinesis, S3, EC2, EMR, DynamoDB, Elasticsearch, Athena, Firehose, Lambda)
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems/ Non- Relational Database Systems and vice-versa Accustomed to working with large complex data sets, real-time/near real-time analytics, and distributed big data platforms.
Experience with multiple terabytes of data stored in AWS using Elastic Map Reduce (EMR) and Redshift
Developed data queries using HiveQL and optimized the Hive queries
Expertise in developing PIG Latin Scripts and Hive Query Language for data analytics. Well-versed in and implemented Partitioning, Dynamic-Partitioning and bucketing concepts in Hive to compute data metrics.
Strong knowledge of Pig and Hive's analytical functions, extending Hive and Pig core functionality by writing custom UDFs.
Experience in developing REST API's for use in single page or native applications and
Created Hive Managed and External tables with partition and bucket in Hive and loaded data in to Hives
In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce concepts and experience in working with MapReduce programs using Apache Hadoop for working with Big Data to analyze large datasets efficiently.
Excellent Knowledge in understanding Big Data infrastructure, distributed file systems -HDFS, parallel processing - MapReduce framework and complete Hadoop ecosystem - Hive, Hue, Pig, HBase, Zookeeper, Sqoop, Kafka-Storm, Spark, Flume, and Oozie.
In-depth knowledge of real-time ETL/Spark analytics using Spark SQL with visualization Hands-on experience on YARN (MapReduce 2.0) architecture and components such as Resource Manager, Node Manager, Container and Application Master and execution of a MapReduce job.

TECHNICAL SKILLS

PROJECT MANAGEMENT: Agile, Kanban, Scrum, DevOps, Continuous Integration, Test-Driven Development, Unit Testing, Functional Testing, Design Thinking, Lean, Six Sigma

DATABASE: SQL, NoSQL, Apache Cassandra, MongoDB, Hbase, RDBMS, Hive

SOFTWARE: AutoCAD • MATLAB • Revit • LTspice • PSpice • Multisim • Microsoft Office Suites

BIG DATA PLATFORMS: Amazon AWS, Microsoft Azure, Elasticsearch, Apache Solr, Lucene, Cloudera Hadoop, Cloudera Impala, Databricks, Hortonworks Hadoop

PROGRAMMING: Python, Scala, PHP • Python • Bash • LISP • SQL • JavaScript • JQuery • C • C++ • XML • HTML • CSS, Visual Basic, VBA, .Net, Spark, HiveQL, Spark API, REST API

DATA VISUALIZATION: Tableau, Microsoft Power BI

FILES: HDFS, Avro, Parquet, Snappy, Gzip, SQL, Ajax, JSON, GSON, ORC

OPERATIGN SYSTEMS: Linux, MacOS, Microsoft Windows

HADOOP ECOYSTEM COMPONENTS & TOOLS: Apache Ant, Apache Cassandra, Apache Flume, Apache Hadoop, Apache Hadoop YARN, Apache Hbase, Apache Hcatalog, Apache Hive, Apache Kafka, Apache MAVEN, Apache Oozie, Apache Pig, Apache Spark, Spark Streaming, Spark MLlib, GraphX, SciPy, Pandas, RDDs, DataFrames, Datasets, Mesos, Apache Tez, Apache ZooKeeper, Airflow and Camel, Apache Lucene, Elasticsearch, Apache SOLR, Apache Drill, Presto, Apache Hue, Sqoop, Kibana

PROFESSIONAL EXPERIENCE

DATA ENGINEER

Confidential

Responsibilities:

Used Hadoop cluster to manage and perform data ingestion from Rapid API
Created and maintained a cluster of multiple Kafka brokers to ingest data from Kafka producer
Used spark to build and process real-time data stream from Kafka producer
Used Spark DataFrame API over Cloudera platform to perform analytics on data.
Defined and implemented schema for a custom Hbase
Used SparkSQL for creating and populating hbase warehouse
Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets.
Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
Implemented advanced procedures of feature engineering for data science team using the in-memory computing capabilities like Apache Spark written in Scala
Wrote streaming applications with Spark Streaming/Kafka.
Used SparkSQL module to store data into HDFS
Configured Kafka broker for the Kafka cluster of the project and streamed the data to Spark for structured streaming to get structured data by schema.
Used Spark DataFrame API over Cloudera platform to perform analytics on data.
Handled over millions of messages per a day funneled through Kafka topics.
Worked with Jenkins CI for CICD and Git version control.
Optimized ETL jobs to reduce memory and storage consumption.
Communicated and present findings, orally and visually in a way that can be easily understood by business counterparts

DATA ENGINEER

Confidential

Responsibilities:

Engaged constructively with project teams to support project objectives through the application of sound architectural principles
Configured flume agent source, sink and channel for data stream collection on API
Used flume for collection and ingestion of data from API to HDFS
Integrated flume with Spark Streaming for real time data processing
Used spark to load and process data from HDFS
Used Sqoop to export data from HDFS to MYSQL database for deep analysis queries
Developed POC using Scala & deployed on Yarn cluster, compared the performance of Spark, with Hive and SQL
Used hive for queries and incremental imports with Spark and Spark jobs for data processing and analytics
Installed and configured Kafka cluster and monitoring the cluster; Architected a lightweight Kafka broker; integration of Kafka with Spark for real time data processing
Built a Spark proof of concept with Python using PySpark
Implemented advanced procedures of feature engineering for data science team using the in-memory computing capabilities like Apache Spark written in Scala
Extracted the needed data from the server into Hadoop file system (HDFS) and bulk loaded the cleaned data into HBase using Spark
Demonstrated ability to think strategically about business, product, and technical challenges in an enterprise environment
Used park SQL to create real-time processing of structured data with Spark Streaming processed through structured streaming

DATA CLOUD ENGINEER

Confidential

Responsibilities:

Created and maintained a cluster of multiple Kafka brokers to ingest data from Kafka producer
Used spark to build and process real-time data stream from Kafka producer
Used Spark DataFrame API over Cloudera platform to perform analytics on data.
Defined and implemented schema for a custom Hbase
Used SparkSQL for creating and populating hbase warehouse
Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets.
Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
Wrote streaming applications with Spark Streaming/Kafka.
Used sparkSQL module to store data into HDFS
Configured Kafka broker for the Kafka cluster of the project and streamed the data to Spark for structured streaming to get structured data by schema.
Used Spark DataFrame API over Cloudera platform to perform analytics on data.
Worked with Jenkins CI for CICD and Git version control.
Optimized ETL jobs to reduce memory and storage consumption.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship