BigData Developer Resume

SUMMARY

Over 1.5 years of IT experience with BigData Hadoop & Spark Development.
Experience with Hadoop ecosystem components like HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Scala, Flume, Kafka, Oozie, Java and HBase.
Working knowledge of Architecture of Distributed systems and Parallel processing frameworks.
In - depth understanding of Spark execution model and internals of MapReduce framework.
Good working experience in developing production ready Spark applications utilizing Spark-Core, Data-frames, Spark-SQL and Spark-Streaming API’s.
Experience with Hadoop distributions like Cloudera (Cloudera distribution CDH4 and 5).
Worked extensively in fine-tuning resources for long running Spark Applications to utilize better parallelism and executor memory for more caching.
Good experience working with both batch and real-time processing using Spark frameworks.
Proficient knowledge of Apache Spark and programming Scala to analyze large datasets using Spark to process real time data.
Good working knowledge of developing Pig Latin Scripts and using Hive Query Language.
Good working experience of performance tuning Hive queries and troubleshooting various issues related to Joins, memory exceptions in Hive.
Good understanding of Partitions, bucketing concepts in Hive and designed both internal and external tables in Hive to optimize performance.
Good experience using different file formats like Avro, RCFile, ORC and Parquet formats.
Good working experience in optimizing MapReduce algorithms by using Combiners and custom partitioners.
Experience with NoSQL Column - Oriented Databases like HBase, Cassandra, MongoDB and it’s Integration with Hadoop cluster.
Experience with scripting language like Shell, Bash Scripts.
Experience with data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Name Node, Data Node, Secondary Name Node, MapReduce programming paradigm.
Worked with Sqoop to move (import / export) data from a relational database into Hadoop.
Well versed with Agile-Scrum working environment using JIRA and version control tools like GIT.
Flexible, enthusiastic and project-oriented team player with excellent communication skills.

TECHNICAL SKILLS

Big Data Ecosystems: HDFS, MapReduce, YARN, Hive, Storm, Sqoop, Pig, Spark, HBase, Impala, Scala, Flume, Zookeeper, Oozie

NO-SQL Databases: HBase, Cassandra, MongoDB

Languages: Java, Scala

Hadoop Distributions: Cloudera

IDE’s & Utilities: Eclipse, IntelliJ

Operating Systems: Windows, Linux

PROFESSIONAL EXPERIENCE:

Confidential

BigData Developer

Responsibilities:

Examined transaction data, identified outliers, inconsistencies and manipulated data to insure data quality and integration.
Developed data pipeline using Sqoop, Spark and Hive to ingest, transform and analyze operational data.
Used SparkSQL with Scala for creating data frames and performed transformations on data frames.
Implemented Spark using Scala and utilizing Data Frames and SparkSQL API for faster processing of data.
Real time streaming the data using Spark and Kafka.
Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
Wrote Kafka producers to stream the data from external rest API’s to Kafka topics.
Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
Experienced in handling large datasets using Spark in Memory capabilities using broadcasts variables in Spark, effective & efficient Joins, transformations and other capabilities.
Experience with Kafka in understanding and performing thousands of megabytes of reads and writes per second on streaming data.
Used SparkAPI over Cloudera Hadoop YARN to perform analytics on data in Hive.
Worked extensively with Sqoop for importing data from Oracle, MySQL databases.
Involved in creating Hive tables, loading and analyzing data using Hive scripts.
Created Hive tables, dynamic partitions, buckets for sampling and working on them using Hive QL.
Involved in build applications using SBT and integrated with continuous integration servers like Jenkins to build jobs.
Compiled and build the application using SBT and used GIT as version control system.
Used SBT extensively for building jar files of Spark programs & deployed to cluster.
Performing data migration from RDBMS to HDFS using Sqoop.
Worked on SparkSQL, reading/ Writing data from JSON file, text file, parquet file, schemaRDD.

Environment: Hadoop, Hive, HBase, Spark, Scala, GIT, Sqoop, Kafka, Cloudera, IntelliJ, Agile, and Jira

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship