Big Data Developer Resume

SUMMARY

1.5 Years of experience in Analysis, Development, Implementation and experience in Big Data using Hadoop, HDFS, Hive, Sqoop, Flume, Hbase, Impala, Spark, SparkSQL, Hue and Reporting.
Excellent understanding of Hadoop architecture and its components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and MapReduce programming paradigm.
Importing and exporting data into HDFS.
Involved in creating HIVE tables, Partitioning, Bucketing, loading with data and writing HIVE queries.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
Experienced in migrating map reduce programs into Spark RDD transformations, actions to improve performance

TECHNICAL SKILLS

Programming Languages: C++, Java

Big Data Technologies: Hadoop, Map-Reduce Programming, Hive, Impala, Pig, Scala, Spark, SparkSQL, Hue

Data Ingestion: Sqoop, Flume

NoSQL: HBase

Database: MySQL

Operating System: Linux, UNIX, Windows

Tools: Microsoft Office, NetBeans, putty, Visual Studio, Eclipse, SBT build path

PROFESSIONAL EXPERIENCE

Confidential

Big Data Developer

Responsibilities:

Worked on analyzing Hadoop cluster using different big data analytic tools including Flume, Hive, Sqoop, Spark, SparkSQL.
Importing and exporting data into HDFS and Hive using Sqoop.
Involved in importing and exporting data from local/external file system and RDBMS to HDFS. Load log data into HDFS using Flume.
Worked with different File formats (JSON, Avro, Parquet,etc.)
Load data into Hbase table from HDFS using Flume.
The Hive tables created as per requirements were managed or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
Implemented Partitioning, Bucketing in Hive for better organization of the data.
Analyzed the data by performing Hive queries (HiveQL)
Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context & Spark-SQL.
Load the data into Spark RDD and do in memory data computation to generate the output response.
Worked in an Agile type of methodology.