Spark and Hadoop Developer Resume Malvern, PA - Hire IT People

PROFESSIONAL SUMMARY:

6+ Years of professional experience in IT which includes around 4 years of comprehensive experience as Hadoop and Spark Developer, and related technologies.
In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Name Node, Data Node, Map Reduce, Spark and Spark SQL
Experience in converting Hive/SQL queries into Spark transformations using Spark RDDs.
Exposure on usage of Apache Kafka to develop data pipeline of logs as a stream of messages using producers and consumers.
Knowledge in handling Kafka cluster and created several topologies to support real - time processing requirements.
Experience of converting various File Formats queries into Spark transformations using Data Frames and Datasets.
Experience of developing SQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce jobs.
Experience in creating Kafka producer and Kafka consumer for Spark streaming.
Exposure to Spark, Spark Streaming, Scala and Creating the Data Frames handled in Spark with Scala.
In-depth understanding of Spark Architecture including Spark SQL , Data Frames , Spark Streaming .
Extensive experience on importing and exporting data using stream processing platforms like Flume and Kafka .
Implemented Sqoop for large dataset transfer between Hadoop and RDBMS.
Hands on experience in working on Spark SQL queries, Data frames, and import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
Worked with AWS to migrate the entire Data Centers to the cloud using EC2, S3, and EMR.
Experience in importing and exporting Multi Terabytes of data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa
Experience in writing shell scripts to dump the Shared data from MySQL, Oracle servers to HDFS.
Highly experienced in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
Good Experience working with Amazon AWS for setting up Hadoop cluster.
Hand on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Zookeeper and Flume.
Worked on custom Spark Transformations and variety of data formats such as JSON, Compressed CSV, ORC, AVRO etc. and reading data from various sources like HBase and Hive.
Performed map-side joins on RDD .
Good understanding of Amazon web services like Elastic MapReduce (EMR), EC2.
Experience in ETL operations on Hive to Spark.
Gained hands on experience in writing shell scripts in UNIX.
Experienced with processing different file formats like Avro, XML, JSON and Sequence file formats using Spark
Experience in implementing Spark using Scala and Spark SQL for faster analyzing and processing of data.
Good understanding in configuring simple to complex work flows using Oozie.
Good understanding of NoSQL databases like MongoDB and HBase.
Worked on different operating systems like UNIX/Linux, Windows.
Worked in managing VMs in Cloudera and Horton Works.
Experience as a java Developer in client/server technologies using JSP , JDBC and SQL .

TECHNICAL SKILLS:

Big data/Hadoop Ecosystem: HDFS, Map Reduce, Hive, HBase, Sqoop, Flume, Oozie, Spark, Hue, Impala, Kafka, Spark Data Frames, Spark SQL.

Hadoop Distribution Platforms: Cloudera(CDH4/CDH5), Horton works.

Programming Languages: Core Java, Scala, SQL, Linux.

Application Servers: Tomcat, WebLogic

MySQL, Oracle, No: SQL Database(HBase)

IDE Tools: Eclipse, IntelliJ, Putty, MobaXterm, Tableau.

Operating System: Centos, Windows (7,8,10), Ubuntu, UNIX

PROFESSIONAL EXPERIENCE:

Confidential, Malvern, PA

Spark and Hadoop Developer

Responsibilities:

Worked on Spark streaming using Apache Kafka for real time data processing.
Experienced with Kafka to ingest data into Spark Engine
Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's, Data frames, Spark SQL and Scala
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
Developed Scala and Spark SQL code to extract data from various databases.
Used Spark SQL to process the huge amount of structured data and Implemented Spark RDD transformations, actions to migrate Map reduce algorithms
Extensively worked with all kinds of Un-Structured, Semi-Structured and Structured data.
Developed multiple Kafka Producers and Consumers from as per the software requirement specifications.
Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.
Used Kafka extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
Developed robust set of codes that are tested, automated, structured and efficient.
Performed map-side joins on RDD, Spark SQL and Data Frames.
Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS using Sqoop.
Implemented the data backup strategies for the data in the HDFS cluster.
Developed a data pipeline using Spark and Hive to ingest, transform and analyzing data.
Developed Spark code using Scala and Spark-SQL for faster testing and data processing. .
Imported the data from relational databases into HDFS using Sqoop.
Understanding of data storage and retrieval techniques, ETL, and databases, relational databases, tuple stores, Hadoop, HUE, MySQL and Oracle databases.

Environment: Apache Spark, Cloudera (CDH 5.12), Scala, Spark SQL, Data Frames, Kafka, SBT build, HUE, Sqoop, Zookeeper, HDFS .

Confidential, Littleton, CO

Hadoop Developer

Responsibilities:

Involved in converting Map Reduce programs into Spark transformations using Spark RDD's on Scala.
Developed analytical component using Scala and Spark
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Real streaming the data using Spark with Flume and store the stream data to HDFS using Scala .
Installing configuring, troubleshooting of Hadoop with Cloudera Ecosystem, Configuring &Troubleshooting job tracker Nodes, task tracker (data) nodes, MapReduce, Spark, Hive, HBase, Sqoop and Oozie.
Involved in complete Implementation lifecycle, specialized in writing custom Map Reduce, Pig and Hive programs.
Extensively used Sqoop to get data from RDBMS sources like Teradata.
Worked on reading multiple data formats on HDFS using Scala.
Used Cloudera Manager to monitor and manage Hadoop Cluster.
Deployed a multi-node Hadoop cluster
Worked extensively with SQOOP for importing metadata from Oracle.
Extensively used Hive/HQL or Hive queries to query or search for a string in Hive tables in HDFS.
Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
Created HBase tables to store various data formats of data coming from various sources.
Used Impala to query the Hadoop data stored in HDFS.
Worked on streaming log data into HDFS from web servers using Flume .
Was responsible for importing the data (Log files) from various sources into HDFS using Flume.
Creating Hive tables on JSON data, Creating Hive tables on JSON data, Run the Hive queries in Hue and CLI.
Implemented POC for using APACHE IMPALA for data processing on top of HIVE.
Migrated ETL jobs to Spark to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
Implement Flume, Spark, Stream framework for real time data processing.

Environment: UNIX, HDFS, Hive, Spark, Scala, Flume, Sqoop, HBase, Zookeeper, CDH 5.4.

Confidential, Burns Harbor, IN

Hadoop Developer

Responsibilities:

Build and maintain scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
Handle the data exchange between HDFS and RDBMS using Sqoop.
Responsible for building scalable distributed data solutions using Hadoop Cloudera Distribution.
Developed several advanced Map Reduce programs to process data files received.
Developed Hive Scripts, Hive UDFs to load data files into Hadoop.
Close monitoring and analysis of the Map Reduce job executions on cluster at task level.
Import the data from various sources like HDFS/HBase into Hive.
Extensively worked on Hive for ETL Transformations and optimized Hive Queries.
Used Flume to collect, aggregate, and store the web log data from various sources like web servers, mobile and network Devices and pushed to HDFS.
Created tables in Impala, Created Partition tables, run the hive queries in hue, created tables HBase, Created folders in HDFS.
Importing and Exporting Data from MySQL/Oracle to HiveQL.
Importing and Exporting Data from MySQL/Oracle to HDFS.
Write the Impala queries.
Installed Oozie workflow engine to run multiple Hive jobs.

Environment: CDH5.9, HDFS, Hive, hue, Sqoop, Impala, HBase, Oozie, Hive, Flume, Linux)

Confidential,

Java developer

Responsibilities:

Involved in Analysis, Design, Development, Integration and Testing of application modules.
Involved in developing a custom framework like Spring Framework, with more features to meet the business needs.
Performed requirement analysis, design, coding and implementation, team co-ordination, code review, testing, and Installation.
Developed server side utilities using JAVA technologies Servlets, JSP.
Developed presentation layers using JSP custom tags and JavaScript.
Implemented design patterns - Business Delegate, Singleton, Flow Controller, DAO and Value Object patterns.
Developed Role Based Access Control to restrict the users to access specific modules based on their roles.
Used Oracle as the back-end application and used Hibernate Framework for ORM.
Deployed the application on WebSphere server using Eclipse as the IDE.
Used Tomcat server 5.5 and configured it with Eclipse IDE.
Performed extensive Unit Testing for the application.
Responsible for Design and development of Web pages using HTML, CSS including Ajax controls and XML .

Environment: WebSphere 5.1, Tomcat 5.0, Oracle 9i, Hibernate3.0, Eclipse 3.2, JSP, Java Script, Servlets, XML, Eclipse, Junit, Spring, plug-ins (Tomcat).

Confidential

Jr Java Developer

Responsibilities:

Involved in the analysis, design, implementation, and testing of the project.
Implemented the presentation layer with HTML and JavaScript.
Developed web components using JSP, Servlets and JDBC.
Implemented database using SQL Server .
Designed tables and indexes.
Wrote complex SQL and stored procedures .
Involved in fixing bugs and unit testing with test cases using JUnit.
Developed user and technical documentation .

Environment: Java, JSP, Servlets, JDBC, HTML, JavaScript, MySQL, JUnit, Eclipse IDE.

We provide IT Staff Augmentation Services!

Spark And Hadoop Developer Resume

Malvern, PA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship