We provide IT Staff Augmentation Services!

Hadoop Spark Developer Resume

5.00/5 (Submit Your Rating)

Irvine, CA

SUMMARY

  • Around 7+ years of experience in Analysis, Architecture, Design, Development, Testing, Maintenance and User training of software application which includes over 3+ Years in Big Data, Hadoop and HDFS environment and experience in JAVA/J2EE.
  • Hands on experience on Hadoop (HDFS, Map Reduce, PIG, HIVE, and SQOOP etc).
  • Hands on experience on Spark (1.5, 1.6) & Scala Full stack developer.
  • Seasoned Hadoop/Spark/Scala/Java developer experience in Object Oriented programming.
  • Experience in Installing, Configuring, Testing Hadoop Ecosystem components, experience on Hadoop clusters using major Hadoop Distributions - Cloudera (CDH3, CDH4 and CDH5) and HortonWorks.
  • Experienced in building highly scalable Big-data solutions using Hadoop and multiple distributions i.e., Cloudera, Hortonworks and NoSQL platforms (Hbase& Cassandra).
  • Experience in analyzing data using HiveQL, Pig Latin and writing custom mapreduce programs in Java and Python.
  • Experienced in converting HiveQL queries into Spark transformations using Spark RDDs and Scala.
  • Hands on experience in Apache Sqoop, Apache Storm and Apache Hive integration.
  • Hands on experience working with different File Formats like TEXTFILE, JSON, AVROFILE, ORC for HIVE querying and processing.
  • Experience Data Modelling using Star and Snow Flake Schema and also worked with Metadata.
  • Experience in AWS cloud environment and on S3 storage and EC2 instances.
  • Experience on Apache Kafka, used for Messaging broker, Log Aggregation and Stream processing.
  • Expertise in migration data from different databases (i.e. Oracle, DB2, Teradata) to HDFS.
  • Experience in designing and coding web applications using Core Java & Web Technologies- JSP, Servlets and JDBC, full Understanding of utilizing J2EE technology Stack, including Java related frameworks like spring, ORM Frameworks (Hibernate).
  • Have good interpersonal, communicational skills, strong problem solving skills, Strong analytical and judgment techniques.

TECHNICAL SKILLS

Hadoop/Big Data Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Storm, Zookeeper, Kafka, Impala, HCatalog, Apache Spark, Spark Streaming, Spark SQL, Hbase and Cassandra, AWS, Horton works, Cloudera

Web technologies: JSP, Servlets, JDBC, Java Script, CSS

Application Servers: IBM Web sphere, Tomcat

Development and BI Tools: TOAD Visio,, Rational Rose, Endure Informatica 9.1

Databases: Oracle9g/10g & MySQL 4.x/5.x, Hbase, NoSQL

Programming Languages: Java (JDK 5/JDK 6), C/C++, Python, Scala, HTML, SQL

Operating Systems: UNIX, Windows, LINUX, Mac OS X

Development Methodologies: Agile Methodology -SCRUM, Hybrid

PROFESSIONAL EXPERIENCE

Confidential, Irvine, CA

Hadoop spark developer

Responsibilities:

  • Evaluated Business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Worked on analyzing Hadoop cluster and different big data analytical and processing tools including Pig, Hive, Spark, Spark Streaming.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Migrating various Hive UDF’s and queries into Spark SQL for faster requests.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and exported the data from HDFS to MYSQL using Sqoop.
  • Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
  • Hands on experience in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
  • Used HIVE to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Experience in using Apache Kafka for log aggregations.
  • Experience on working with different data types like FLATFILES, ORC, AVRO and JSON.
  • Developed Talend jobs for reading log files.
  • Involved in implementing Cluster for Cassandra to address HBase limitations.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems and suggested some solution.

Environment: MapReduce, HDFS, Hive, Pig, Spark, Spark-Streaming, Spark SQL, Apache Kafka, Sqoop, Java, Scala, CDH4, CDH5, AWS, Eclipse, Oracle, Git, Shell Scripting and Cassandra.

Confidential, Fort Worth, TX.

Hadoop developer

Responsibilities:

  • Involved in Various Stages of Software Development Life Cycle (SDLC) deliverables of the project using the AGILE Software development methodology.
  • Worked on importing data from various sources and performed transformations using Cloudera MapReduce, hive to load data into HDFS.
  • Loading the data from the different data sources like (Teradata, DB2, Oracle and Flat files) into HDFS using Sqoop and load into Hive tables, which are partitioned.
  • Created different PIG Scripts and converted them as shell command to provide aliases for common operation for project business flow.
  • Hands on experience in Apache Sqoop, Apache Storm and Apache Hive integration as part of the project implementation.
  • Expereince in using Apache Storm to build real-time data integration systems, to analyze clean, normalize, and resolve large amounts of non-unique data points with low latency and high throughput.
  • Experience in working on log files using Apache Storm.
  • Implemented various Hive queries for analysis and call them from java client engine to run on different nodes.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
  • Developed scripts to bring the log files from FTP Server and then processing it to load into Hive tables.
  • Experience in developing Hive UDFs using Java programming language.
  • Experience in creating statistics of logs and extracts useful information from the statistics in real-time using Apache Storm.
  • Moved data from HDFS to Hbase using Map Reduce and Bulk Output Format class.
  • Experience in Implementing Rack Topology scripts to the Hadoop Cluster.
  • Developed Helper class for abstracting Hbase cluster connection act as core toolkit.
  • Participated day-to-day meeting, status meeting, and effective communication with team members.

Environment: MapReduce, HDFS, Hive, Pig, Hbase, Apache Storm, HDP, Sqoop, Java, Eclipse, Oracle, Linux, Shell Scripting, Maven, Git.

Confidential, Denver, CO

Hadoop Developer

Responsibilities:

  • Responsible for architecting Hadoop clusters with CDH3.
  • Extensively involved in Installation and configuration of Cloudera distribution Hadoop 2, 3, NameNode, Secondary NameNode, JobTracker, TaskTrackers and DataNodes.
  • Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
  • Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
  • Managed and reviewed Hadoop Log files.
  • Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Designed a data warehouse using Hive. Created partitioned tables in Hive.
  • Mentored analyst and test team for writing Hive Queries.
  • Installed and configured Hive and also written Hive UDFs.
  • Involved in HDFS maintenance and WEBUI it through Hadoop-Java API.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Created HBase tables to store various data formats of PII data coming from different portfolios.
  • Extensively used Pig for data cleansing.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Oozie, Java (jdk1.6), Oracle 11g/10g, PL/SQL, SQL*PLUS, Windows NT, UNIX Shell Scripting.

Confidential

Java Developer

Responsibilities:

  • Involved in various stages of Enhancements in the Application by doing the required analysis, development, and testing.
  • Prepared the High and Low level design document and Generating Digital Signature.
  • For the registration and validation of the enrolling customer developed logic and code.
  • Extensively used Java Multi-Threading concept for downloading files from a URL.
  • Extensively used Eclipse IDE for developing, debugging, integrating and deploying the application.
  • Developed web-based user interfaces using J2EE Technologies.
  • Handled Client side Validations used JavaScript.
  • Used Validation Framework for Server side Validations.
  • Created test cases for the Unit and Integration testing.
  • Front-end was integrated with Oracle database using JDBC API through JDBC-ODBC Bridge driver at server side.
  • Developed required stored procedures and database functions using PL/SQL.
  • Developed, Tested and debugged various components in WebLogic Application Server.
  • Used XML, XSL for Data presentation, Report generation and customer feedback documents.
  • Implemented Logging framework using Log4J.
  • Involved in code review and documentation review of technical artifacts.

Environment: Java Servlets, JSP, JavaScript, XML, HTML, UML, Apache Tomcat, Eclipse, JDBC, Oracle 11g and other basic office tools.

We'd love your feedback!