Bigdata Developer Resume
4.00/5 (Submit Your Rating)
SUMMARY
- Over 1.5 years of IT experience with BigData Hadoop & Spark Development.
- Experience with Hadoop ecosystem components like HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Scala, Flume, Kafka, Oozie, Java and HBase.
- Working knowledge of Architecture of Distributed systems and Parallel processing frameworks.
- In - depth understanding of Spark execution model and internals of MapReduce framework.
- Good working experience in developing production ready Spark applications utilizing Spark-Core, Data-frames, Spark-SQL and Spark-Streaming API’s.
- Experience with Hadoop distributions like Cloudera (Cloudera distribution CDH4 and 5).
- Worked extensively in fine-tuning resources for long running Spark Applications to utilize better parallelism and executor memory for more caching.
- Good experience working with both batch and real-time processing using Spark frameworks.
- Proficient knowledge of Apache Spark and programming Scala to analyze large datasets using Spark to process real time data.
- Good working knowledge of developing Pig Latin Scripts and using Hive Query Language.
- Good working experience of performance tuning Hive queries and troubleshooting various issues related to Joins, memory exceptions in Hive.
- Good understanding of Partitions, bucketing concepts in Hive and designed both internal and external tables in Hive to optimize performance.
- Good experience using different file formats like Avro, RCFile, ORC and Parquet formats.
- Good working experience in optimizing MapReduce algorithms by using Combiners and custom partitioners.
- Experience with NoSQL Column - Oriented Databases like HBase, Cassandra, MongoDB and it’s Integration with Hadoop cluster.
- Experience with scripting language like Shell, Bash Scripts.
- Experience with data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Name Node, Data Node, Secondary Name Node, MapReduce programming paradigm.
- Worked with Sqoop to move (import / export) data from a relational database into Hadoop.
- Well versed with Agile-Scrum working environment using JIRA and version control tools like GIT.
- Flexible, enthusiastic and project-oriented team player with excellent communication skills.
TECHNICAL SKILLS
Big Data Ecosystems: HDFS, MapReduce, YARN, Hive, Storm, Sqoop, Pig, Spark, HBase, Impala, Scala, Flume, Zookeeper, Oozie
NO-SQL Databases: HBase, Cassandra, MongoDB
Languages: Java, Scala
Hadoop Distributions: Cloudera
IDE’s & Utilities: Eclipse, IntelliJ
Operating Systems: Windows, Linux
PROFESSIONAL EXPERIENCE:
Confidential
BigData Developer
Responsibilities:
- Examined transaction data, identified outliers, inconsistencies and manipulated data to insure data quality and integration.
- Developed data pipeline using Sqoop, Spark and Hive to ingest, transform and analyze operational data.
- Used SparkSQL with Scala for creating data frames and performed transformations on data frames.
- Implemented Spark using Scala and utilizing Data Frames and SparkSQL API for faster processing of data.
- Real time streaming the data using Spark and Kafka.
- Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
- Wrote Kafka producers to stream the data from external rest API’s to Kafka topics.
- Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
- Experienced in handling large datasets using Spark in Memory capabilities using broadcasts variables in Spark, effective & efficient Joins, transformations and other capabilities.
- Experience with Kafka in understanding and performing thousands of megabytes of reads and writes per second on streaming data.
- Used SparkAPI over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Worked extensively with Sqoop for importing data from Oracle, MySQL databases.
- Involved in creating Hive tables, loading and analyzing data using Hive scripts.
- Created Hive tables, dynamic partitions, buckets for sampling and working on them using Hive QL.
- Involved in build applications using SBT and integrated with continuous integration servers like Jenkins to build jobs.
- Compiled and build the application using SBT and used GIT as version control system.
- Used SBT extensively for building jar files of Spark programs & deployed to cluster.
- Performing data migration from RDBMS to HDFS using Sqoop.
- Worked on SparkSQL, reading/ Writing data from JSON file, text file, parquet file, schemaRDD.
Environment: Hadoop, Hive, HBase, Spark, Scala, GIT, Sqoop, Kafka, Cloudera, IntelliJ, Agile, and Jira