We provide IT Staff Augmentation Services!

Sr. Spark/hadoopdeveloper Resume

4.00/5 (Submit Your Rating)

Charlotte, NC

SUMMARY

  • Over 7+ years of IT experience, including 4 years of Hadoop/Big data Experience,3 years of Java Programming involved in entire Software Development Life Cycle which includes Design, Developing, Implementing, Testing and maintenance of various web - based applications using Java, J2EE Technologies.
  • Experience in working with Cloudera, Hortonworks, Amazon EMR Hadoop Distributions.
  • Experience in dealing with large data sets and making performance improvements
  • Experience in Implementing Spark with the integration of Hadoop Ecosystem.
  • Experience in using Spark RDD for parallel processing datasets in HDFS, MySQL and other data sources.
  • Experience in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Experience in using different build tools like SBT and Maven.
  • Implemented Spark Streaming for fast data processing.
  • Experience in designing and developing Applications in Spark using Scala.
  • Skilled in integrating Kafka with Spark streaming for high speed data processing.
  • Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
  • Experience in data cleansing using Spark Map and Filter Functions.
  • Implemented POC to migrate map reduce programs into Spark RDD transformations, actions to improve performance.
  • Experience in developing and Debugging Hive Queries.
  • Experience in performing read and write operations on HDFS filesystem.
  • Experience working on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), setting up EMR (Elastic MapReduce).
  • Good Experience in Data importing and Exporting to Hive and HDFS with Sqoop.
  • Experience in creating Hive Tables and loading the data from different file formats.
  • Experience in processing the data using Hive HQL for data Analytics.
  • Extending Hive Core functionality by writing UDF’s for Data Analysis.
  • Implemented Partitioning, Dynamic Partition, Buckets in HIVE.
  • Worked on Tableau with Hive by using JDBC/ODBC drivers.
  • Experience in dealing with the different file formats like Sequence files, Avro and Parquet.
  • Good knowledge in NOSQL databases HBASE, MongoDB.
  • Experience in working with Tableau visualization tool.
  • Experience in using Producer and Consumer API’s of Apache Kafka.
  • Experience in creating and driving large scale ETL pipelines
  • Extensively used Apache Flume to collect the logs and error messages across the cluster.
  • Good in using version control like GITHUB and SVN
  • Worked with MySQL, Oracle 11g, Maria databases.
  • Strong Knowledge on UNIX/LINUX commands.
  • Strong Knowledge on Python scripting Language.
  • Worked on Talend to Import/Export data from RDBMS to Hadoop.
  • Adequate knowledge of Scrum, Agile and Waterfall methodologies.

TECHNICAL SKILLS

Big Data Technologies: Apache Hadoop, Map Reduce, Hive, Pig, HBase, Sqoop, Spark, HDFS, Apache Kafka, Apache Flume, Apache oozie, Apache Zookeeper, Cassandra.

Hadoop Distributions: Cloudera, Hortonworks.

Programming Languages: Scala, Python, Java.

Shell Scripting: Shell Script.

Build Tools: Maven, Sbt.

Version Control Tools: Git, SVN.

Cloud: AWS, Azure.

Databases: MySQL, Oracle 10g,11g,12c, MariaDB.

NOSQL Databases: HBase, Cassandra.

Operating Systems: Windows 7/10, Linux (Cent OS, Red hat, Ubuntu), Mac OS.

Development Tools: IntelliJ IDEA, Eclipse, NetBeans.

PROFESSIONAL EXPERIENCE

Sr. Spark/HadoopDeveloper

Confidential - Charlotte, NC

Responsibilities:

  • Worked under the Cloudera distribution CDH 5.13 version.
  • Involved in Ingesting weblog data into HDFS using Kafka.
  • Processed Json Data with Spark SQL.
  • Performed Cleansing the data to get a desired format.
  • Involved in writing Spark Sql Data frames into Parquet Files.
  • Involved in Tuning Spark Jobs for optimal Efficiency.
  • Written the Scala functions, procedures, Constructors and Traits.
  • Created Hive tables to load the transformed Data.
  • Performed partitions and bucketing in hive for easy data classification.
  • Involved in Analyzing data by writing queries using HiveQL for faster data processing.
  • Involved in working with Sqoop for loading the data into RDBMS.
  • Created a data pipeline using Oozie which runs on daily basis.
  • Involved in Persisting Metadata into HDFS for further data processing.
  • Loading data from Linux Filesystems to HDFS and vice-versa.
  • Involved in creating tables, partitioning, bucketing of table and creating UDF's along with fine tuning in Hive.
  • Loaded the Cleaned Data into the hive tables and performed some analysis based on the requirements.
  • Responsible in performing sort, join, aggregations, filter, and other transformations on the datasets.
  • Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review sessions.

Environment: HDFS, Apache Spark, Apache Hive, Scala, Oozie, Flume, Kafka, Agile Methodology, Cloudera, Cassandra.

Spark/Hadoop Developer

Confidential - Plano, TX

Responsibilities:

  • Worked under the Hortonworks HDP Enterprise.
  • Worked on large sets of structured and semi-structured data.
  • Involved in copying large data from Amazon S3 buckets to HDFS using Flume.
  • Used Spark Sql with Scala for creating data frames and performed transformations on data frames.
  • Involved in working with Avro Files using Spark SQL
  • Written UDF’s in Spark SQL using Scala.
  • Performed data Aggregation operations using Spark SQL queries.
  • Configured Spark streaming to receive data from Kafka and store the streamed data to HDFS using Scala.
  • Implemented Hive Partitioning and bucketing for data analytics.
  • Worked on Performance and Tuning operations in Hive.
  • Extensively used Maven Build tool for code repository.
  • Used Git has Version Control System.
  • Involved in working with Sqoop to export the data from Hive to S3 buckets
  • Used Oozie and Oozie coordinators to deploy end to end data processing pipelines and scheduling the work flows.
Environment: Apache Spark, Apache Flume, Amazon S3, Apache Sqoop, Apache Oozie, Apache Kafka, Hive, Apache.

We'd love your feedback!