We provide IT Staff Augmentation Services!

Sr. Spark/hadoop Developer Resume

2.00/5 (Submit Your Rating)

Charlotte, NC

SUMMARY:

  • Over 7+ years of IT experience, including 4 years of Hadoop/Big data Experience,3 years of Java Programming involved in entire Software Development Life Cycle which includes Design, Developing, Implementing, Testing and maintenance of various web - based applications using Java, J2EE Technologies.
  • Experience in working with Cloudera, Hortonworks, Amazon EMR Hadoop Distributions.
  • Experience in dealing with large data sets and making performance improvements
  • Experience in Implementing Spark with the integration of Hadoop Ecosystem.
  • Experience in using Spark RDD for parallel processing datasets in HDFS, MySQL and other data sources.
  • Experience in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Experience in using different build tools like SBT and Maven.
  • Implemented Spark Streaming for fast data processing.
  • Experience in designing and developing Applications in Spark using Scala.
  • Skilled in integrating Kafka with Spark streaming for high speed data processing.
  • Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
  • Experience in data cleansing using Spark Map and Filter Functions.
  • Implemented POC to migrate map reduce programs into Spark RDD transformations, actions to improve performance.
  • Experience in developing and Debugging Hive Queries.
  • Experience in performing read and write operations on HDFS filesystem.
  • Experience working on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), setting up EMR (Elastic MapReduce).
  • Good Experience in Data importing and Exporting to Hive and HDFS with Sqoop.
  • Experience in creating Hive Tables and loading the data from different file formats.
  • Experience in processing the data using Hive HQL for data Analytics.
  • Extending Hive Core functionality by writing UDF’s for Data Analysis.
  • Implemented Partitioning, Dynamic Partition, Buckets in HIVE.
  • Worked on Tableau with Hive by using JDBC/ODBC drivers.
  • Experience in dealing with the different file formats like Sequence files, Avro and Parquet.
  • Good knowledge in NOSQL databases HBASE, MongoDB.
  • Experience in working with Tableau visualization tool.
  • Experience in using Producer and Consumer API’s of Apache Kafka.
  • Experience in creating and driving large scale ETL pipelines
  • Extensively used Apache Flume to collect the logs and error messages across the cluster.
  • Good in using version control like GITHUB and SVN
  • Worked with MySQL, Oracle 11g, Maria databases.
  • Strong Knowledge on UNIX/LINUX commands.
  • Strong Knowledge on Python scripting Language.
  • Worked on Talend to Import/Export data from RDBMS to Hadoop.
  • Adequate knowledge of Scrum, Agile and Waterfall methodologies.

TECHNICAL SKILLS:

Big Data Technologies: Apache Hadoop, Map Reduce, Hive, Pig, HBase, Sqoop, Spark, HDFS, Apache Kafka, Apache Flume, Apache oozie, Apache Zookeeper, Cassandra.

Hadoop Distributions: Cloudera, Hortonworks.

Programming Languages: Scala, Python, Java.

Shell Scripting: Shell Script.

Build Tools: Maven, Sbt.

Version Control Tools: Git, SVN.

Cloud: AWS, Azure.

Databases: MySQL, Oracle 10g,11g,12c, MariaDB.

NOSQL Databases: HBase, Cassandra.

Operating Systems: Windows 7/10, Linux (Cent OS, Red hat, Ubuntu), Mac OS.

Development Tools: IntelliJ IDEA, Eclipse, NetBeans.

WORK EXPERIENCE:

Sr. Spark/Hadoop Developer

Confidential - Charlotte, NC

Responsibilities:

  • Worked under the Cloudera distribution CDH 5.13 version.
  • Involved in Ingesting weblog data into HDFS using Kafka.
  • Processed Json Data with Spark SQL.
  • Performed Cleansing the data to get a desired format.
  • Involved in writing Spark Sql Data frames into Parquet Files.
  • Involved in Tuning Spark Jobs for optimal Efficiency.
  • Written the Scala functions, procedures, Constructors and Traits.
  • Created Hive tables to load the transformed Data.
  • Performed partitions and bucketing in hive for easy data classification.
  • Involved in Analyzing data by writing queries using HiveQL for faster data processing.
  • Involved in working with Sqoop for loading the data into RDBMS.
  • Created a data pipeline using Oozie which runs on daily basis.
  • Involved in Persisting Metadata into HDFS for further data processing.
  • Loading data from Linux Filesystems to HDFS and vice-versa.
  • Involved in creating tables, partitioning, bucketing of table and creating UDF's along with fine tuning in Hive.
  • Loaded the Cleaned Data into the hive tables and performed some analysis based on the requirements.
  • Responsible in performing sort, join, aggregations, filter, and other transformations on the datasets.
  • Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review sessions.

Environment: HDFS, Apache Spark, Apache Hive, Scala, Oozie, Flume, Kafka, Agile Methodology, Cloudera, Cassandra.

Spark/Hadoop Developer

Confidential -Plano, TX

Responsibilities:

  • Worked under the Hortonworks HDP Enterprise.
  • Worked on large sets of structured and semi-structured data.
  • Involved in copying large data from Amazon S3 buckets to HDFS using Flume.
  • Used Spark Sql with Scala for creating data frames and performed transformations on data frames.
  • Involved in working with Avro Files using Spark SQL
  • Written UDF’s in Spark SQL using Scala.
  • Performed data Aggregation operations using Spark SQL queries.
  • Configured Spark streaming to receive data from Kafka and store the streamed data to HDFS using Scala.
  • Implemented Hive Partitioning and bucketing for data analytics.
  • Worked on Performance and Tuning operations in Hive.
  • Extensively used Maven Build tool for code repository.
  • Used Git has Version Control System.
  • Involved in working with Sqoop to export the data from Hive to S3 buckets
  • Used Oozie and Oozie coordinators to deploy end to end data processing pipelines and scheduling the work flows.

Environment: Apache Spark, Apache Flume, Amazon S3, Apache Sqoop, Apache Oozie, Apache Kafka, Hive, Apache.

Hadoop Developer

Confidential - San Francisco, CA

Responsibilities:

  • Used Flume as a data pipeline system to ingest the unstructured events from various web servers to HDFS.
  • Worked on altering the unstructured events from web servers on the fly using various flume interceptors.
  • Wrote various spark transformations using Scala to perform data cleansing, validation and summarization activities on user behavioral data.
  • Parsed the unstructured data into semi-structured format by writing complex algorithms in spark.
  • Developed generic parser to transform any format of unstructured data into a consisted data model.
  • Configured Flume with the Spark Streaming to transfer the data into HDFS at regular intervals of time from web servers to process the data.
  • Implemented the persistence of frequently used transformed data from data frames for faster processing.
  • Build hive tables on the transformed data and used different SERDE’s to store the data in HDFS in different formats.
  • Loaded the transformed Data into the hive tables and perform some analysis based on the requirements.
  • Implemented portioning on the Hive data to increase the performance of the processing of data.
  • Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
  • Created Pig Latin scripts to sort, group, join and filter to transform the data.
  • Worked on various performance optimizations like using distributed cache for small datasets, Partitioning, Bucketing in Hive and Map Side joins.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • Implemented custom workflow to automate the jobs on daily basis.
  • Created custom workflows to automate Sqoop jobs weekly and monthly.

Environment: HDFS, Hive, Sqoop, Flume, Spark, Scala, MapReduce, Oracle 11g, YARN, UNIX Shell Scripting, Agile Methodology, Cloudera.

Java Developer

Confidential

Responsibilities:

  • Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, modeling, analysis, design and development.
  • Used Struts tag libraries in the JSP pages.
  • Worked with JDBC and Hibernate.
  • Used SVN as a version control
  • Developed Web Services using XML messages that use SOAP.
  • Configured Development Environment using Tomcat and Apache Web Server.
  • Develop and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, SOAP, XML, JMS, JBoss and Web Services.
  • Worked with Complex SQL queries, Functions and Stored Procedures.
  • Developed Test Scripts using JUnit and JMockit.
  • Worked with ANT and Maven to develop build scripts.
  • Worked with Hibernate, JDBC to handle data needs.
  • Configured Development Environment using Tomcat and Apache Web Server.

Environment: Java, J2EE, XML, oracle 11g, XML, MySQL, Apache Tomcat.

We'd love your feedback!