We provide IT Staff Augmentation Services!

Hadoop And Spark Developer Resume

5.00/5 (Submit Your Rating)

Charlotte, NC

SUMMARY:

  • Having 6+ years of relevant experience in IT Industry in Designing, Development and Implementation of Big data, Hadoop, MapReduce, Pig, Hive, Sqoop, NiFi, Oozie, Zookeeper, Flume with CDH4&5 distributions
  • Replaced existing jobs with Spark data transformations for efficient data processing and performance
  • Developed the Hive UDF’s (User Defined Functions) to preprocess the data for analysis.
  • Wrote Hive Queries, Pig Scripts for data analysis to meet the requirements
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
  • Having experience on developing RDD, DATA Frames and SQL Queries in Spark SQL
  • Worked with different flavors of Hadoop distributions which includes Cloudera and Hortonworks.
  • Having good knowledge of Spark SQL and Spark using Scala.
  • Hands on experience with Spark batch process and did POC on kafka - spark streaming process
  • Experienced in using IDEs and Tools like Eclipse,NetBeans, GitHub, Maven and IntelliJ
  • Monitored and managed Hadoop cluster using the Cloudera Manager web-interface.
  • Hands Experience with NOSQL Databases like HBASE and MongoDB.
  • Having knowledge about Hadoop architecture and its different components such as HDFS, Job tracker, Task tracker, Resource Manager, Name Node, Data Node and Map Reduce concepts.
  • Experience with Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
  • Implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
  • Experience in analyzing the data using HQL, Pig Latin, HBase and custom Map Reduce programs in Java
  • Experienced in data formats like JSON, PARQUET, AVRO, RC and ORC formats
  • Utilized Flume to analyze log files and write into HDFS.
  • Experience in importing and exporting data by Sqoop between HDFS and RDBMS and migrating according to client's requirement
  • Having experience in working with data ingestion, storage, processing and analyzing the big data.
  • Worked on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Used GitHub version control tool to push and pull functions to get the updated code from repository

TECHNICAL SKILLS:

Hadoop Ecosystem: HDFS, MapReduce, Hive, Sqoop, Pig, HBase, Flume, Zookeeper, Oozie, Impala, Kafka.

Spark Components: Spark Core, Spark SQL (RDD And Data Frames), Scala.

Programming Languages: SQL, Core Java, Scala, Shell, Pig Latin, Hive-QL

Databases: Oracle 12c, 11g, 10g, MySQL, SQL Server 2014/2012/2008 R2, MongoDB HBase

Java/J2EE Technologies: Java, J2EE, JSP, JDBC, RESTFUL API

IDES & Command Line Tools: Eclipse, NetBeans, Jenkins, IntelliJ, MavenWeb Technologies: HTML, XML, JavaScript, jQuery, AJAX, SOAP, and WSDL.

PROFESSIONAL EXPERIENCE:

Confidential, Charlotte, NC

Hadoop and Spark Developer

Responsibilities:

  • Worked with large-scale Hadoop YARN cluster for distributed data processing and analysis using Connectors, Spark core, Spark SQL, Sqoop, Pig, Hive and NoSQL database
  • Created Partitions and Bucketing on the hive tables for improving the performance of the process and faster access of data from hive table.
  • Data is exported by sqoop export to RDBMS from HDFS which is useful to BI team to analyze and for generating report.
  • Performed Coalesce and repartition on the Data Frames for optimization the spark queries.
  • Used Snappy Technique for Saving the Storage in the HDFS.
  • Created RDD’S, Data Frame, Dataset to process the data using spark.
  • Used Impala for faster and better processing of data.
  • For the utilization of Spark engine in Hive. Used Hive On Spark for the better performance of hive queries.
  • Worked with ORC, Parquet, Avro, Json File formats
  • Used Hive joins for joining the multiple tables to achieve the business requirements.
  • Replacing Spark SQL with Hive QL for good performance.
  • Worked with Agile Methodology.
  • Hands on experience with Azure HDInsight/Spark/Event hub
  • Worked with Kafka message queue for Spark streaming
  • Enhanced the functionalities of SparkSql by writing UDF’S using Scala.
  • Migrate the data from all existing jobs too Spark for better performance and to save the time of execution.
  • Created Hive UDF’s for hive table depending on the business requirement.
  • After processing the data with hive and Spark will send the data to BI people for analysis.
  • Used Oozie for Automation of jobs in Hadoop.
  • Created nifi flows to trigger spark jobs. In case if we have any failures, we got email notifications regarding the failures.
  • Having Experience working with NOSQL database like HBASE for loading the huge amount of semi-structure coming from different sources.
  • Monitoring all the nifi flows to get notifications in case if there is no data flow through the flow more than the specific time.
  • Exported the data using Sqoop to RDBMS and processed the data for ETL operations.
  • Import the data from different sources like HDFS/HBase into SparkRDD
  • Having experience in developing POC on streaming data using Apache Kafka, Spark Streaming
  • Importing and exporting data into HDFS and HIVE, PIG using Sqoop.

Environment: Hadoop, HDFS, Oozie, Pig, Hive, Impala, HBase, Spark, Intellij, Linux, Java, Horton Works, Nifi, MapReduce, Sqoop, Shell Scripting, Apache Kakfa, Scala, HortonWorks

Confidential, California

Hadoop Developer

Responsibilities:

  • Developed automated scripts to install Hadoop clusters
  • Worked on loading and transforming of large sets of structured, semi structured and unstructured data
  • Worked on User Defined Functions in Hive to load the data from HDFS to run aggregation function on multiple rows.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Used Bzip2 compression technique to compress the files before loading it to Hive.
  • Worked on User Defined Functions in Hive to load the data from HDFS to run aggregation function on multiple rows.
  • Used Zookeeper to manage Hadoop clusters and Oozie to schedule job workflows
  • Used Sqoop to export the analyzed data to relational database for analysis by data analytics team.
  • Implemented the workflows using Apache Oozie framework to automate tasks
  • Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Used Impala for faster processing of Data.
  • Good experience in developing Hive DDLs to create, alter and drop Hive TABLES
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Responsible for building scalable distributed data pipelines using Hadoop.
  • Assisted in exporting analyzed data to relational databases using Sqoop
  • Implemented Partitioning, bucketing to hive table to meet the business requirements
  • Load and Transform large datasets of structured and semi-structured which includes different file formats like Avro, Parquet and Sequence file

Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Oozie, Java, UNIX, Flume, Spark, Scala, HBase, Impala, Java. Eclipse.

Confidential

QL Developer/ETL developer

Responsibilities:

  • Developed ER diagrams/ Dimensional diagrams for business requirements using ERWIN.
  • Involved in the data analysis and data discrepancy reduction for the source and target schemas.
  • Developed SQL Scripts to perform nested querying, join, subqueries, Insert, Update and Delete data in My Sql database tables.
  • Developed Etl frame work generic components using column import, column export stages with enabled RCP
  • Involved in the data analysis and data discrepancy reduction for the source and target schemas.
  • Worked with developing and implementing Stored Procedures, packages and triggers.
  • Worked to extract the data from xml file to Sql Table and with the help of SQLServer 2008 data file reporting is produced.
  • With the help of MS SQL SERVER data connection was builts to the database
  • Worked for the advance SQL queries, procedure, cursor and triggers.

Environment: My SQL, SQL Server 2008(SSRS & SSIS), PL\SQL, Visual studio 2000/2005

Confidential

Java Developer

Responsibilities:

  • Developed the application using Struts Framework that leverages classical Model View Controller (MVC) architecture.
  • Designed the user interfaces using JSPs, developed custom tags, and used JSTL Taglib.
  • Developed various java business classes for handling different functions.
  • Developed controller classes using Struts and tiles api
  • Involved in Documentation and Use case design using UML modeling include development of Class diagrams, Sequence diagrams, and Use Case Transaction diagrams.
  • Participated in design and code reviews
  • Developed User Interface using AJAX in JSP and also performed client-side svalidation
  • Developed JUnit test cases for all the developed modules. Used SVN as version control

Environment: Java, J2EE, JSP, Struts 1.x, JNDI, DB2, HTML, XML, DOM, SAX, ANT, AJAX, Rational Rose, Eclipse Indigo 3.5, SOAP, Apache Tomcat, Oracle 10g, LOG4J, SVN.

We'd love your feedback!