We provide IT Staff Augmentation Services!

Spark/hadoop Developer. Resume

4.00/5 (Submit Your Rating)

Tukwila, Wa

SUMMARY:

  • Over all 5+ years of experience in building big data applications through Hadoop, Spark, Hive, Sqoop, Kafka frameworks.
  • Hands on experience with Spark/hadoop distributions - Cloudera, Horton works.
  • Experience working with large datasets and making performance improvements.
  • Experience in performing Read and Write operations on HDFS.
  • Experience in implementing Spark integration with Hadoop ecosystem.
  • Experience in designing and developing Spark applications using Scala.
  • Experience in migrating MapReduce programs into RDD transformations or actions to improve performance.
  • Worked with RDD for parallel processing of datasets in HDFS, My SQL and the other sources.
  • Experience in creating Hive tables and loading data from different file formats and performed Partitioning, Dynamic partitioning, Bucketing in Hive.
  • Experience developing and debugging Hive queries.
  • Experience converting HiveQL/SQL queries into Spark transformations through Spark RDD and dataframes API in Scala.
  • Good experience in Data importing and exporting to Hive and HDFS with Sqoop.
  • Used Oozie to manage and schedule Spark jobs on Hadoop Cluster.
  • Experience in using producer and consumer API's of Kafka.
  • Skilled in integrating Kafka with Spark Streaming for faster data processing.
  • Experience with EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service) setting up EMR (Elastic MapReduce).
  • Experience in creating and driving large scale ETL pipelines.
  • Experience in dealing with file formats like Text Files, Sequence Files, JSON, Parquet, ORC.
  • Worked on Tableau with Hive by using JDBC/ODBC drivers.
  • Strong Knowledge on UNIX/LINUX commands.
  • Good with version control system like GIT.
  • Adequate knowledge on python scripting language.
  • Adequate Knowledge of scrum, Agile and waterfall methodologies.
  • Highly motivated and committed to highest levels of professionalism.

TECHNICAL SKILLS:

Big Data Technologies: Apache Spark, Apache Hadoop, Map Reduce, HDFS, Apache Hive, YARN,Apache Oozie, Apache Kafka, Apache Sqoop, Apache Flume, Apache Zookeeper

Languages: Scala, Python, SQL

Databases: My SQL, Oracle 11g

Operating systems: Mac OS, Windows 7/10, Linux

Tools: IntelliJ, Eclipse, Maven, GitHub, Jenkins

PROFESSIONAL EXPERIENCE:

Confidential, Tukwila, WA.

Spark/Hadoop Developer.

Responsibilities:

  • Worked under the Cloudera distribution CDH 5.13 version.
  • Imported and transformed large scale volumes of data from various data sources to HDFS.
  • Experienced in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
  • Processed different data sets files from HDFS into Spark code using Scala and Spark-SQL for faster testing and processing of data.
  • Experiencing in Loading the data into Spark RDD's & Spark Data Frame API's and performed in-memory data computation to generate the output response.
  • Worked on writing various spark transformations using Scala for Data Validation, Cleansing and Filtering in Hadoop HDFS.
  • Developed Scala scripts using both Data frames and RDD's in Spark for Data aggregation queries.
  • Responsible in performing a sort, join, filter, and other transformations on the datasets.
  • Involved in creating Hive tables to load the transformed data and stored it in HDFS.
  • Performed various performance optimizations like using distributed cache for small datasets, partitioning, bucketing of the tables in hive and Map joins.
  • Performed analysis on the hive tables based on the business logic, by writing queries using HiveQL for faster data processing.
  • Appended the Data frames to pre-existing data in hive.
  • Performed data cleansing to meet business requirements and stored the output data to Hive and HDFS.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Performed Proof of Concept for the Spark Streaming with kafka for the real time data processing.
  • Used Git extensively as versioning tool.
  • Worked with Jenkins for continuous integration.

Environment: Cloudera 5.13, Hadoop 3.0, HDFS, Spark 2.4, Hive 3.0, Spark SQL, Scala, Sqoop, Oozie, Linux shell, GIT, Jenkins, Agile.

Confidential, Chicago, IL.

Spark/Hadoop Developer.

Responsibilities:

  • Worked under the Hortonworks Enterprise.
  • Worked on large sets of structured and semi-structured historical data.
  • Involved in working with Sqoop to import the data from RDBMS to Hive.
  • Created Hive tables to load the Data and stored as ORC files for processing.
  • Implemented Hive Partitioning and bucketing for further classification of data.
  • Worked on Performance and Tuning optimization of Hive.
  • Involved in cleansing and transforming the data.
  • Used spark SQL to perform sort, join and filter the data.
  • Copied the ORC files to amazon S3 buckets using Sqoop for further processing in amazon EMR.
  • Performed data Aggregation operations using Spark SQL queries.
  • Copied output data back to Hive from Amazon S3 buckets using Sqoop after getting the output desired by the business.
  • Automated filter and join operations to join new data with the respective Hive tables using Oozie workflows on a daily basis.
  • Used Oozie and Oozie coordinators to deploy end to end data processing pipelines and scheduling work flows.
  • Used Git as Version Control System.
  • Worked with Jenkins for continuous integration.

Environment: HDP 2.5, HDFS, Spark 2.1, Hadoop 2.8, Kafka, Amazon S3, EMR, Sqoop, Oozie, Hive 2.3, Tez, Hue, Linux shell, Git, Jenkins, Agile.

Confidential

Hadoop Developer.

Responsibilities:

  • Worked under the Cloudera distribution.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Building and sizing the Hadoop cluster based on the data extracted from all the sources.
  • Monitored Hadoop cluster job performance and capacity planning.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Importing the data from RDBMS to HDFS using Sqoop.
  • Developed Simple to complex MapReduce Jobs to transform the ingested data.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Created Hive tables to load the transformed Data.
  • Designed and Implemented Partitioning, Bucketing in HIVE for better classification.
  • Performed different joins on Hive tables and implementing Hive SerDe's like REGEX, JSON.
  • Performed transformations using Hive QL, MapReduce, and loaded data into HDFS.
  • Developed hive queries to analyses/transform the data.
  • Used Oozie workflow engine to automate MapReduce jobs.
  • Used SVN as Version Control System.

Environment: CDH 4.0, Hadoop 2.4, Map Reduce, Hive 0.13, Tez, Hue, Oozie, Flume, Sqoop, HBase, SVN, Jenkins, Agile.

Confidential

Junior Hadoop Developer

Responsibilities:

  • Worked under the Cloudera distribution.
  • Involved in loading data from LINUX file system to HDFS.
  • Monitored Hadoop cluster job performance and capacity planning.
  • Created Hive tables to load the transformed Data.
  • Importing the data from RDBMS to HDFS using Sqoop.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Developed hive queries to analyze/transform the data.
  • Used Oozie workflow engine to automate Map Reduce jobs.

Environment: CDH 4.0, Hadoop 2.4, Map Reduce, Hive 0.13, Hue, Oozie, Sqoop, HBase, Agile.

We'd love your feedback!