Big Data Engineer Resume
Sanjose, CA
SUMMARY:
- Around 7 years of experience spread across Big Data and Java that includes extensive work on Big Data Technologies along with development of web applications in multi - tiered environment using Hadoop, Spark, Scala, HBase, Java, Sqoop, Kafka.
- Hands on experience with Hadoop, HDFS, MapReduce and Hadoop Ecosystem like HBase.
- Experience in working with various RDBMS.
- Good hands on experience with SQL databases MYSQL.
- Experience in writing SQL queries based on the Business requirement.
- Experience in developing a data pipeline through Kafka-Spark API.
- Experience in loading data into Spark schema RDD's and querying them using Spark-SQL.
- Good at writing custom RDD's in Scala and implemented design patterns to improve the performance.
- Excellent understanding of Hadoop distributed File system and experienced in developing efficient MapReduce jobs to process large datasets.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experienced with different file formats like fixed file Width, CSV, Text files, Sequence files, XML, JSON and Avro files.
- Worked on NoSQL databases like HBase to store structured and unstructured data.
- Experience in Object Oriented language like Java and Core Java.
- Worked with agile and Scrum software development framework for managing product development.
- Strong analytical and problem-solving skills. Willingness and ability to quickly adapt to new environments and learn new technologies.
TECHNICAL SKILLS:
Hadoop Ecosystem: Kafka, Spark, Sqoop, MapReduce, HDFS, Zoo Keeper
Databases: Oracle, MySQL
Methodologies:Agile, Scrum
NoSQL: - HBase
Languages: Java, Scala, HTML, SQL, Python.
Others: Eclipse, Maven, JIRA, Git and Linux.
PROFESSIONAL EXPERIENCE:
Big Data Engineer
Confidential, SanJose CA
Responsibilities:
- Developed spark job to consume data from kafka topic and perform validations on the data before pushing data into HBase and Oracle databases.
- Developed Spark job to perform various analytics on the Oracle database.
- Developed Spark Streaming and Spark SQL job with windowing functions to find the highest revenue of the sellers for each month.
- Developed code to handle exceptions and push the code into exception kafka topic.
- Involved in Requirement Analysis, Development and Documentation.
Environment: Scala, Apache Kafka, Apache Spark, Spring framework, Spring boot, Hive, HBase
Big Data Engineer
Confidential, Tampa, FL
Responsibilities:
- Designed kafka producer client using Confluent kafka and produced events into kafka topic.
- Subscribing the kafka topic with kafka consumer client and process the events in real time using spark.
- Developed RESTFUL API using spring framework.
- Developed automatic code using Apache Avro plug-in.
- Used Avro serializer and Avro De serializer for developing the kafka clients.
- Good knowledge on defining Avro schema.
- Good knowledge on micro service architecture.
- Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
Environment: Apache kafka, Scala, Spring framework, Spring boot, BitBucket, Spark.
Big Data Engineer
Confidential
Responsibilities:
- Involved in Requirement Analysis, Development and Documentation.
- Developing Scripts to schedule various Sqoop Jobs.
- Developed Map-Reduce programs to clean and aggregate the data
- Experienced in Importing and exporting data into HDFS and Hive using Sqoop.
- Experience in Hadoop development using HDFS, Map Reduce, Hive, Sqoop.
- Hands-on coding and scripting (automation) experience using OO languages such as Java, Python.
- Created a relational model and dimensional model for online services such as online banking and automated bill pay.
- Defined and deployed monitoring, metrics, and logging systems on AWS.
- Worked on Apache Spark along with SCALA Programming language for transferring the data in much faster and efficient way.
- Experienced in Ingesting real time into HBASE using Kafka through Spark Streaming
- Developed Spark Streaming applications for Real Time Processing.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Developed HBase data model on top of HDFS data to perform real time analytics using Java
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Worked extensively on development and maintenance of HADOOP applications using JAVA and MapReduce.
Environment: Hadoop, HDFS, HBase, Spark, Kafka, Java, Map Reduce, Python, Sqoop, Zoo Keeper, Cloudera, NoSQL
Software Engineer
Confidential
Responsibilities:
- Developed Hadoop Jobs to parse raw Hadoop logs and convert them into easier to work with Avro format
- Developed Map Reduce Jobs to read the Avro-fied data and aggregates it per hour, writing the data out in Avro format
- Collaborated with the team to tune and optimize the Map Reduce jobs
- Coordinated with team every week to review progress and update goals
- Developed MRUnit test cases for the project
Environment: Java, Python, Hadoop, HDFS, Avro