We provide IT Staff Augmentation Services!

Bigdata Developer Resume

4.00/5 (Submit Your Rating)

SUMMARY

  • Over 1.5 years of IT experience with BigData Hadoop & Spark Development.
  • Experience with Hadoop ecosystem components like HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Scala, Flume, Kafka, Oozie, Java and HBase.
  • Working knowledge of Architecture of Distributed systems and Parallel processing frameworks.
  • In - depth understanding of Spark execution model and internals of MapReduce framework.
  • Good working experience in developing production ready Spark applications utilizing Spark-Core, Data-frames, Spark-SQL and Spark-Streaming API’s.
  • Experience with Hadoop distributions like Cloudera (Cloudera distribution CDH4 and 5).
  • Worked extensively in fine-tuning resources for long running Spark Applications to utilize better parallelism and executor memory for more caching.
  • Good experience working with both batch and real-time processing using Spark frameworks.
  • Proficient knowledge of Apache Spark and programming Scala to analyze large datasets using Spark to process real time data.
  • Good working knowledge of developing Pig Latin Scripts and using Hive Query Language.
  • Good working experience of performance tuning Hive queries and troubleshooting various issues related to Joins, memory exceptions in Hive.
  • Good understanding of Partitions, bucketing concepts in Hive and designed both internal and external tables in Hive to optimize performance.
  • Good experience using different file formats like Avro, RCFile, ORC and Parquet formats.
  • Good working experience in optimizing MapReduce algorithms by using Combiners and custom partitioners.
  • Experience with NoSQL Column - Oriented Databases like HBase, Cassandra, MongoDB and it’s Integration with Hadoop cluster.
  • Experience with scripting language like Shell, Bash Scripts.
  • Experience with data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
  • In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Name Node, Data Node, Secondary Name Node, MapReduce programming paradigm.
  • Worked with Sqoop to move (import / export) data from a relational database into Hadoop.
  • Well versed with Agile-Scrum working environment using JIRA and version control tools like GIT.
  • Flexible, enthusiastic and project-oriented team player with excellent communication skills.

TECHNICAL SKILLS

Big Data Ecosystems: HDFS, MapReduce, YARN, Hive, Storm, Sqoop, Pig, Spark, HBase, Impala, Scala, Flume, Zookeeper, Oozie

NO-SQL Databases: HBase, Cassandra, MongoDB

Languages: Java, Scala

Hadoop Distributions: Cloudera

IDE’s & Utilities: Eclipse, IntelliJ

Operating Systems: Windows, Linux

PROFESSIONAL EXPERIENCE:

Confidential

BigData Developer

Responsibilities:

  • Examined transaction data, identified outliers, inconsistencies and manipulated data to insure data quality and integration.
  • Developed data pipeline using Sqoop, Spark and Hive to ingest, transform and analyze operational data.
  • Used SparkSQL with Scala for creating data frames and performed transformations on data frames.
  • Implemented Spark using Scala and utilizing Data Frames and SparkSQL API for faster processing of data.
  • Real time streaming the data using Spark and Kafka.
  • Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
  • Wrote Kafka producers to stream the data from external rest API’s to Kafka topics.
  • Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
  • Experienced in handling large datasets using Spark in Memory capabilities using broadcasts variables in Spark, effective & efficient Joins, transformations and other capabilities.
  • Experience with Kafka in understanding and performing thousands of megabytes of reads and writes per second on streaming data.
  • Used SparkAPI over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Worked extensively with Sqoop for importing data from Oracle, MySQL databases.
  • Involved in creating Hive tables, loading and analyzing data using Hive scripts.
  • Created Hive tables, dynamic partitions, buckets for sampling and working on them using Hive QL.
  • Involved in build applications using SBT and integrated with continuous integration servers like Jenkins to build jobs.
  • Compiled and build the application using SBT and used GIT as version control system.
  • Used SBT extensively for building jar files of Spark programs & deployed to cluster.
  • Performing data migration from RDBMS to HDFS using Sqoop.
  • Worked on SparkSQL, reading/ Writing data from JSON file, text file, parquet file, schemaRDD.

Environment: Hadoop, Hive, HBase, Spark, Scala, GIT, Sqoop, Kafka, Cloudera, IntelliJ, Agile, and Jira

We'd love your feedback!