We provide IT Staff Augmentation Services!

Hadoop And Spark Developer Resume

Framingham, MA

PROFESSIONAL SUMMARY:

  • Hadoop/Spark developer 4+ years of experience in Big data application development through frameworks Hadoop, Spark
  • Excellent knowledge and understanding of Distributed Computing and Parallel processing frameworks.
  • Having Experience on Hadoop Eco System like HDFS (Hadoop File Distribution System), Map Reduce, Hive, Sqoop, oozie, pig.
  • Hands on experience in Distribution - Cloudera, Hortonworks.
  • Experience in data cleansing using Spark Map and Filter Functions.
  • Experience in creating Hive Tables and loading the data from different file formats.
  • Implemented Partitioning and Bucketing in HIVE.
  • Experience developing and Debugging Hive queries.
  • Good Experience in Data importing and exporting to Hive and HDFS with Sqoop.
  • Experience with Apache Spark, Spark SQL, Spark Streaming.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Hands on experience in in-memory data processing with Apache Spark
  • Experience dealing with the file formats like text files, Sequence files, JSON, Parquet, ORC.
  • Experienced at performing read and write operations on HDFS file system.
  • Experience working with large data sets and making performance improvements.
  • Strong knowledge on UNIX/LINUX commands.
  • Adequate Knowledge on Scala Language.
  • Adequate knowledge of Scrum, Agile and Waterfall methodologies.
  • Highly motivated and committed to the highest levels of professionalism.
  • Exhibited strong written and oral communication skills. Rapidly learn and adapt quickly to emerging new technologies and paradigms.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
  • Proficiency in using SQL to manipulate data, query expressions, join statements, subquery etc.
  • Good experience in Scala Programming.
  • Detailed oriented professional, ensuring highest level of quality in reports & data analysis.
  • Experience in processing the data using Hive QL and Pig Latin scripts for data Analytics.
  • Experience in using Spark Streaming programming model for Real-time data processing. for real time data streaming with Kafka for faster data processing
  • Experience in designing and developing application in Spark using Scala.
  • Experience in using Producer and Consumer API’s of Apache Kafka.
  • Experienced in Apache Kafka to collect the logs and error messages across the cluster.
  • Experience creating and driving large scale ETL pipelines.
  • Good with version control systems like GIT.
  • Provide batch processing solution to certain unstructured and large volume of data by using Spark.
  • Extending Hive Core functionality by writing UDF’s for Data Analysis. experience in Data Modeling, Data warehousing, ETL Processing, Database Programming ETL Testing and DBMS.
  • Advanced written and verbal communication skills.
  • Detailed understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Waterfall and Agile.
  • Substantial experience working in a fast paced, agile software development framework and Scrum Principles in an ownership and results oriented culture.

TECHNICAL SKILLS:

HADOOPDetailed Knowledge of Hadoop Components and MapReduce

HIVE

Written complex HQL queries using analytic functions such as RANK, DENSE-RANK, Cumulative Distribution etc. Developed complex join queries.

PIG

Developed optimized complex scripts using Advanced functions such as Co-group, Nested foreach etc

SQOOP

Import data from RDBMS to HDFS

SPARK

Designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.

SPARK-SQL

Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 2.1 for Data Aggregation

KAFKA

Configured deployed and maintained multi-node Dev and Test Kafka Clusters.

OOZIE

Scheduling different job using oozie script.

SCALA

Spark with Scala

PROFESSIONAL EXPERIENCE:

Hadoop and Spark Developer

Confidential, Framingham, MA

Responsibilities:

  • Involved in creating data-lake by extracting customer's data from various data sources to HDFS which include data from Excel, databases, and log data from servers.
  • Developed SQOOP jobs to import data in Avro file format from RDBMS to HDFS and created Hive tables on top of it.
  • Performed Spark jobs such as transformations and actions on RDDs using Scala.
  • Implemented SparkSQL to access hive tables into Spark for faster processing of data.
  • Worked on transforming the queries written in Hive to Spark Application.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed preprocessing job using Spark Data frames to transform JSON documents to flat file.
  • Worked with various HDFS file formats like Avro, Sequence File, Parquet and various compression formats.
  • Good understanding on NoSQL databases such as HBase and MongoDB.
  • Good understanding on Kafka architecture i.e., Topics, Consumers, Producers, Brokers, Partitions and Clusters.
  • Provide support data analysts in running Pig and Hive queries.

Environment: Cloudera (5) HDFS, Spark, Hive, Map Reduce, Hue, SQOOP, Flume, Oozie, Putty, SPARK SQL, Scala, Linux, YARN and Agile Methodology.

Hadoop Developer

Confidential, Columbus, OH

Responsibilities:

  • Designed and Developed data migration from legacy systems to Hadoop environment.
  • Imported the data from RDBMS to HDFS & Hive and performed incremental imports using Sqoop Job in various formats such as Avro, Text and Parquet formats.
  • Performed various Sqoop operations such as eval, import, export, job etc.,
  • Created External and Managed tables in Hive, loaded the data into tables and processed hive queries that will run internally in map reduce way.
  • Pre-processed log data in Pig-Latin by parsing using regular expressions.
  • Involved in processing xml & JSON file formats, created partitioning of the data and implemented bucketing in Hive for performance optimization.
  • Used JSON, XML and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents.
  • Loaded data with complex data types such as Maps, Arrays and Structs into Hive tables.
  • Experience in building Pig Latin scripts to extract, transform and load data onto HDFS.
  • Developed a workflow in Oozie to automate the task of loading the data into HDFS using Sqoop and processing it with Hive.
  • Exported analyzed data to relational databases using SQOOP for visualization to generate reports for the BI team
  • Used Cloudera manager to pull metrics on various cluster features like JVM, Running Map and reduce tasks.

Environment: Hadoop (CDH 5), Hive, Pig, Sqoop, Flume, MapReduce, HDFS, Hue

Hadoop developer

Confidential, Jersey City, NJ

Responsibilities:

  • Ingested data from Relational Database into HDFS using SQOOP and processed them using Hive jobs.
  • Created data pipeline using Hive, Spark, and HBase to ingest, transform and analyze the customer behavioral data.
  • Performed partitioning and bucketing in Hive to improve the performance.
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and Spark.
  • Implemented Spark RDD transformations and performed actions to implement business analysis.
  • Involved in creating the Data frames.
  • Created Spark jobs to write the data to HBase tables.
  • Involved in improving performances of Spark jobs at application level.
  • Involved in tuning the Hive jobs.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Used Spark for interactive queries, processing of batch data and integration with NoSQL database for huge volume of data.
  • Involved in writing the data to HBase tables using Hive and Pig.
  • Used Hive scripts to compute aggregations and store them on HBase for low latency applications.
  • Collected, aggregated, and moved log data from servers to HDFS using Flume.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers using Agile methodology.
  • Worked with Network, Database, Application, QA and BI teams to ensure data quality and availability.

Environment: Sqoop, Flume, Hive, HBase, HDFS, YARN, Spark, Cloudera (CDH5), Zookeeper, Shell Scripting, Linux

Hadoop Developer

Confidential

Responsibilities:

  • Extracted the data from MySQL into HDFS using Sqoop.
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables.
  • Collected the logs data from web servers and integrated into HDFS using Flume.
  • Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis.
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation.
  • Developed Pig-Latin scripts to extract data from the web server output files to load into HDFS.
  • Worked on Hue interface for querying the data in Hive and Pig editors.
  • Experience in using Sequence File, Avro, Text File and Parquet formats.
  • Developed Oozie workflow for scheduling and orchestrating the ETL process.
  • Exported the processed data from HDFS to RDBMS using Sqoop Export.

Environment: Hadoop, Sqoop, Hive, Pig, MapReduce, HDFS

Hire Now