Big Data Engineer Resume
Bentonville, AR
PROFESSIONAL SUMMARY:
- Possess 4+ years of work experience as software developer in the IT industry.
- Possess exclusive experience in Big Data technologies and Hadoop ecosystem components like Spark, MapReduce, Hive, Pig, YARN, HDFS, Oozie, Sqoop, Flume and Kafka, NoSQL systems like Cassandra, Couchbase Server, Elastic Search.
- Strong knowledge in on architecture of distributed systems and parallel processing, in - depth understanding of MapReduce Framework and Spark Execution Framework
- Good experience in creating data ingestion pipelines, data transformations, data management and data governance, real time steaming engines at an Enterprise level.
- Very good experience in real time data streaming solutions using Apache Spark/Spark streaming
- Possess hands on experience in deploying Spark Applications into production in a multi-node distributed cluster
- Expertise in writing end to end Data processing batch Jobs to analyze data using MapReduce, Spark
- Possess strong hands on experience in using confluents Kafka API’s like Kafka Connect API, Kafka Streams API . Possess experience in deploying Kafka applications into production
- Experience in integrating Flume and Kafka in moving the data from sources to sinks in real-time.
- Experience in working with various Hadoop distributions like Cloudera, Hortonworks
- Experience in working with Kerberos integrated Hadoop clusters.
- Possess Hands on experience in Hive Data Modeling. Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Extensive experience in importing/exporting data from/to RDBMS from the Hadoop Ecosystem using Apache Sqoop.
- Experience in building, deploying and integrating applications with Maven.
- Hands-on experience developing workflows that execute MapReduce, Sqoop, Flume, Hive and Pig scripts using Oozie.
- Possess experience in building distributed cache systems using Couch base server.
TECHNICAL SKILLS:
Bigdata Ecosystem components: HDFS, MapReduce, Hive, Pig, Zookeeper, Yarn, Spark, Kafka, Sqoop, Cassandra, Hue, Oozie, Apache Flume
Programming Languages: C, Java, PL/SQL, Scala, HiveQL, Pig Latin
Scripting Languages: PHP, Python, Shell
Distribution Platform: Cloudera, Hortonworks HDP
UI/UX Technologies: HTML, CSS3, JavaScript, JQuery, Angular-Js, Bootstrap, Ajax, JSON, XML
Databases: Couchbase server, Oracle 9i, Cassandra, Elastic Search
Cloud Platforms: Microsoft Azure, Openstack
IDE & Build Tools: Eclipse, IntelliJ IDEA, Maven
Version Control Systems: BitBucket, TFS
PROFESSIONAL EXPERIENCE:
Confidential, Bentonville, AR
Big data engineer
Responsibilities:
- Designed and developed Spark Streaming applications that consumes data from Kafka, does enrichment, validation operations and finally loads the data into HDFS sink.
- Performance tuning of the Spark Streaming applications for optimizing performance.
- Designed and developed a shell script that handled the process of merging all the small files that were resulted by spark streaming into a file of size >=128MB for optimizing the performance of all the jobs running on top of the data.
- Designed and developed Elastic Search Connector using Kafka Connect API with source as Kafka and sink as elastic search.
- Designed and developed Cassandra Sink Connector using Kafka Connect API with source as Kafka and sink as elastic search.
- Instrumentation of code with metrics and integrating with Medusa which is a visualization tool.
- Extensively used spark SQL and Data frames API in building spark applications.
- Involved in setting up the Kafka Streams Framework which is the core of enterprise inventory.
- Designed and built the item relationship cache in Couch base.
- Developed a java based application using Couch base Java Client API to load the data from a remote server into Couch base DB which is used by many other applications for various purposes.
- Configured Flume with sources as multiple Kafka topics, channel as Kafka channel, and sink as Hdfs.
- Developed MapReduce job that runs on top of encoded, encrypted json messages. This job decodes, decrypt and parses through each message and extract the necessary fields and store them in the form of text files.
- Created and loaded the data into partitioned External hive tables.
- Data Modeled and implemented partitioned hive tables of types ORC, Parquet, Text based on consumer applications.
- Developed OOZIE Coordinator Workflow that runs hourly and the workflow involves MapReduce and Hive.
- Familiar in using various file formats such as Avro,parquet, sequence based on the scenario.
- Developed shell scripts that purges the data based on the retention period
Environment: Hortonworks HDP 2.7.4, Spark-Streaming, spark SQL,HDFS, Map-Reduce, Hive, Flume, oozie, Kafka,, CouchbaseDB, Elastic Search, Cassandra, Java, Scala, BitBucket, intelliJIDEA, Microsoft Azure, Openstack
Confidential
Java/Oracle developer
Responsibilities:
- Developed the GUI of the system using JSP, HTML.
- Struts Framework in conjunction with JSP and Tag libraries used in developing user interfaces.
- Developed session beans for necessary truncations like fetching the data required.
- Designed the database using the requirements that were given by or professor in such way that is aligned with his needs.
- Once the database design is approved, I implemented it using Oracle.
- I wrote DDL statements and DML statements and other stuff required inorder to make the database come alive.
- Accessed the database using JDBC API from the Front end.
- Involved in writing PL/Sql Procedures, Packages, Triggers whenever required.
Environment: Java, SQL, Pl/Sql, Eclipse, Oracle 9i, Pl/Sql developer
Confidential, Michigan
Hadoop developer
Responsibilities:
- Migrating the data from Oracle, MySQL into HDFS using Sqoop and importing various formats of flat files into the HDFS.
- Creating partitioned Hive tables, loading the data into Hive tables and analyzing the data by performing Hive queries (HiveQL) to study the customer behavior.
- Developed shell scripts that purges the data based on the retention period.
- Performance tuning of hive queries.
- Developed simple to complex Map Reduce streaming jobs using Java language for processing and validating the data
- Extensively used Hdfs Filesystem Commands, Filesystem API, HiveQL, shell
Environment: Java, Hdfs, MapReduce, Hive, Sqoop, Eclipse, Cloudera, Shell
Confidential
Hadoop Developer
Responsibilities:
- Understanding business needs, analyzing functional specifications and map them to Hadoop ecosystem components and extract the results according to the needs
- Developed Map Reduce Jobs for cleansing, accessing and validating the data.
- Developed Sqoop scripts to import/export data from Oracle to Hdfs and into Hive tables.
- Stored the data in tabular formats using Hive tables.
- Effectively used Hive Partitioning and bucketing concepts to provide better performance with the HiveQL queries.
- Implemented Hive Generic UDF’s to in corporate business logic into the Hive Queries.
- Analyzed the web log data using the HiveQL to extract various required columns in detail.
- Involved in developing Pig scripts/Pig UDF and to store unstructured data into the HDFS.
- Used Hive Join Optimizations for improving the performance.
- Created partitioned tables and loaded data using both static partition and dynamic partition methods.
- Used different data formats (text, Avro) while loading the data into the HDFS.
- Used oozie for automating the end to end pipelines and oozie coordinators for scheduling the workflows.
Environment: HDFS, MapReduce, Pig, Java, Sqoop, Oozie, Cloudera, Eclipse, Shell, Hive, Linux.