Spark/Hadoop Developer Resume

SUMMARY

Overall 8+ years of experience in all phases of Software Application requirement analysis, design, development and maintenance of Hadoop/Big Data application and web applications using java/J2EE technologies.
Having 3+ years of hands on experience with Big Data Ecosystems including Hadoop, MapReduce, Spark, Pig, Hive, Sqoop, Flume, Oozie, Zookeeper in a range of industries.
Good understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode and MapReduce programming paradigm.
Experience in writing Hive Queries for processing and analyzing large volumes of data.
Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice - versa.
Developed Oozie workflows by integrating all tasks relating to a project and schedule the jobs as per requirements.
Automated all the jobs, for pulling data from upstream server to load data into Hive tables, using Oozie workflows.
Implemented several optimization mechanisms like Combiners, Distributed Cache, Data Compression, and Custom Partitioner to speed up the jobs.
Used Hbase in accordance with Hive as and when required for real time low latency queries.
Acute knowledge on Spark architecture and real-time streaming using Spark.
Extensively used Spark SQL, Pyspark & Scala API for querying and transformation of data residing in Hive.
Good knowledge on Amazon Web Services(AWS) cloud services like EC2, S3, EBS, RDS and VPC.

TECHNICAL SKILLS

Hadoop Eco-System: Hadoop, Mapreduce, HDFS, Kafka, Hive, Pig, Sqoop,Impala, Oozie, Flume, Yarn, Zookeeper,Hbase.

Spark components: Spark, Spark SQL, Spark Streaming, Python.

AWS Cloud Services: S3, EBS, EC2, VPC, Redshift, EMR

Programming Languages: Java, Scala, SQL, Shell scripting

Databases: Hbase, Oracle, DB2, MySQL, SQLite, MS SQL Server.

Development Processes: AGILE, Scrum

Big data Platforms: Cloudera

PROFESSIONAL EXPERIENCE

Confidential, Nashville,TN

Spark/Hadoop Developer

Responsibilities:

Interacted with multiple teams in understanding their business requirements for designing flexible and common component.
Used Sqoop for importing and exporting data from sources into HDFS and Hive.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Implemented Spark SQL to access hive tables into spark for faster processing of data.
Worked on Spark streaming using Apache Kafka for real time data processing.
Used Hive to do transformations, joins, filter and some pre-aggregations after storing the data to HDFS.
Developed Spark scripts using Scala, Spark SQL to access hive tables in spark for faster data processing.
Extensively worked on Text, ORC, Avro and Parquet file formats and compression techniques like Snappy, Gzip and Zlib.
Used the optimization techniques including partitioning and bucketing in Hive to enable query the data more efficiently.
Involved in creating Hive tables, and loading and analyzing data using hive queries.
Developed Hive queries to process the data and generate the data cubes for visualizing.
Experience in creating Kafka producer and Kafka consumer for Spark streaming.
Conceived and designed custom POCs using Kafka 0.10 and the Spark Streaming in standalone mode.
Automated the jobs with Oozie and scheduled them with Autosys.
Participated in evaluation and selection of new technologies to support system efficiency.

Environment: Hadoop, HDFS, Hive,Hbase, Spark, Autosys, Kafka, Sqoop, Java, Scala, Eclipse,, Teradata, UNIX, and Maven.

Confidential, St Louis, Missouri

Hadoop Developer

Responsibilities:

Developed Spark Programs for Batch processing.
Developed Spark scripts by using Java, and Scala shell commands as per the requirement.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Used Spark Sql with Scala for creating data frames and performed transformations on data frames.
Implemented Spark SQL to access hive tables into spark for faster processing of data.
Installed and configured Hive and also written Hive UDFs.
Involved in creating Hive tables, loading data and writing Hive queries.
Imported and exported data into HDFS using Sqoop which includes incremental loading.
Experienced in defining job flows managing and reviewing Hadoop log files.
Responsible in managing data coming from different sources.
Supported MapReduce Programs those are running on the cluster.
Jobs management using Fair scheduler and Cluster coordination services through Zoo Keeper.
Involved in loading data from UNIX file system to HDFS.
Hands on Experience in Oozie Job Scheduling.
Worked closely with AWS to migrate the entire Data Centers to the cloud using VPC, EC2, S3, EMR.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Java, Scala,Spark, Hortonworks, Hbase, Amazon EMR, EC2, S3.

Confidential

Hadoop Developer

Responsibilities:

Part of team for developing and writing PIG scripts.
Loaded the data from RDBMS SERVER to Hive using Sqoop.
Created Hive tables to store the processed results in a tabular format.
Developed the Sqoop scripts in order to make the interaction between Hive and MySQL Database.
Developed Java Mapper and Reducer programs for complex business requirements.
Developed Java custom record reader, partitioner and serialization techniques.
Created Managed tables and External tables in Hive and loaded data from HDFS.
Performed complex HiveQL queries on Hive tables and Created custom user defined functions in Hive.
Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with HiveQL queries.
Created partitioned tables and loaded data using both static partition and dynamic partition method.
Performed SQOOP import from Oracle to load the data in HDFS and directly into Hive tables.
Performed incremental data movement to Hadoop using Sqoop.
Scheduled mapreduce jobs in production environment using Oozie scheduler.
Analyzed the Hadoop logs using PIG scripts to oversee the errors caused by the team.
Experience in gathering requirements from the client, giving estimates for developing projects and delivering the projects in time.

Environment: Java, Hadoop, MapReduce, HDFS, Pig, Hive,Spark, Scala, Hortonworks, Hbase.

Confidential

Java Developer

Responsibilities:

Involved in Analysis, Design, Development and Testing of the application.
Incorporated UML diagrams (Class diagrams, Activity diagrams, Sequence diagrams) as part of design documentation and other system documentation.
Enhanced the Port search functionality by adding a VPN Extension Tab.
Created end to end functionality for view and edit of VPN Extension details.
Used AGILE process to develop the application as the process allows faster development as compared to RUP.
Used Struts MVC framework and WebLogic Application Server in this application.
Involved in creating DAO’s and used Hibernate for ORM mapping.
Implemented using Spring Framework for rapid development and ease of maintenance.
Written procedures, and triggers for validating the consistency of Metadata.
Used Cursors both implicit and explicit to capture many rows within a PL/SQL block.
Designed and developed business logic for creating hourly, daily, weekly, monthly, quarterly and yearly summary on balance sheet data (Records) using Oracle PL/SQL programs.
Implemented various PL/SQL objects like table, views, procedures, packages, triggers, functions, materialized views, global temporary tables, cursors, Bulk collect, collections, bind variables, ref cursors, sequences, synonyms, indexes etc
Written SQL code blocks using cursors for shifting records from various tables based on checks.
Fixed defects and generated input XML’s to run on SOA Client to generate output XML for testing Web services.

Environment: JAVA, JSP, servlets, J2EE, EJB, Struts Framework, JDBC, Oracle 10g, OLAP/OLTP, Toad, Windows XP, SQL*Loader, SQL Developer, Agile,Unix, Web Services, CVS, Eclipse.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship