We provide IT Staff Augmentation Services!

Sr. Hadoop, Spark Developer Resume

4.00/5 (Submit Your Rating)

Wichita, KS

SUMMARY:

  • IT Professional with 9+ years of experience in Design, Analysis, Development, Testing, Documentation, Deployment, Integration, and Maintenance of web - based and Client-Server applications using Java, Scala and Big Data Platforms
  • 4+ years of professional experience in using various BigData technologies including HDFS, MapReduce, Hive, Spark, Kafka, Sqoop, Oozie, YARN, HBase, Flume and ZooKeeper based Big Data Platforms.
  • Expertise in installing and configuring various Hadoop components like Hive, PIG, Sqoop, HBase, Zookeeper etc.
  • Experience in fetching live stream of data from DB2 to Hive tables using Spark Streaming and Apache Kafka.
  • Experience in Apache Spark for quick analytics on object relationships.
  • Expertise in data transformations, RDDs, Dataframes and Spark SQL.
  • Experience in creating databases, tables, users, views, triggers, macros, stored procedures, functions, packages in Oracle Database.
  • Experienced in writing complex query processing using Cloudera Impala.
  • Expertise in non-relational and relational data modelling and database engineering.
  • Expertise in building scalable distributed data solutions using HBase, MongoDB and Cassandra.
  • Expertise in building clusters using AWS using Amazon EC2 services and Cloudera manager.
  • Expertise in Big Data platforms like Cloudera, Hortonworks, Apache and Amazon AWS.
  • Excellent knowledge on Agile methodology and Scrum process.
  • Versatile team player, quick learner with good analytical, inter personal and problem solving skills.

TECHNICAL SKILLS:

Hadoop EcoSystem: Hadoop, Map Reduce, HDFS, Kafka, Hive, Pig, Sqoop, Oozie, storm, Yarn, Zookeeper, Spark 2.0, Spark core, Spark SQL, Solr, Hortonworks Hadoop Stack

Languages: Java, Scala, Python, SQL, Shell scripting, HTML 5 and CSS

Web Technologies: Java Script, JDBC

Operating System: Windows, Linux

Databases: Cassandra, MongoDB, HBase, Oracle, DB2, MySQL

Methodology: AGILE, Scrum

Defect Tracking: Bugzilla, HP Quality Center 9.2, HP ALM

Other Tools: SOA Client, Putty, Scrum Works 1.8.3, Stylus Studio 2008 XML Enterprise Suite

Big data Platforms: Hortonworks HDP 2.5, Cloudera CDH 5.x, Amazon AWS

Applications: JIRA, Amazon EC2, S3, EMR, MySQL, MS Office

PROFESSIONAL EXPERIENCE:

Confidential, Wichita, KS

Sr. Hadoop, Spark Developer

Responsibilities:

  • Involved in analysis, design, testing phases and responsible for documenting technical specifications.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Developed a data pipeline using Kafka and Strom to store data into HDFS.
  • Explored Spark to improve the performance and optimization using Spark context, Spark-SQL, Data Frame, pair RDDs, Spark YARN of the existing algorithms in Hadoop.
  • Installed/Configured/Maintained Hortonworks Hadoop clusters for application development.
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Responsible for building scalable distributed data solutions using Hadoop cluster environment with AWS infrastructure services Amazon Simple Storage Service (Amazon S3),EMR, and Amazon Elastic Compute Cloud (Amazon EC2).
  • Loaded the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Developed and executed shell scripts to automate the jobs and Wrote complex Hive queries and UDFs.
  • Worked on reading multiple data formats on HDFS using Spark.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Developed multiple POCs using Spark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Analyzed the SQL scripts and designed the solution to implement using Spark.
  • Involved in loading data from UNIX file system to HDFS, AWS S3.
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Handled importing of data from various data sources like AWS S3, Cassandra.
  • Performed transformations using Hive, MapReduce, Spark and load data into HDFS.
  • Manage and review Hadoop log files.
  • Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive using AWS EMR.
  • Used Atlas exchange of metadata with MariaDB to Hive.
  • Facilitating the daily scrum meetings, spring planning, spring review, and spring retrospective.
  • Worked on the core and Spark SQL modules of Spark extensively.
  • In Involved in running Hadoop streaming jobs to process terabytes data from AWS S3.
  • Implemented Oozie job for importing real-time data to Hadoop using Kafka and for daily imports.

Environment: Hadoop, HDFS, Hive, Scala, Spark, SQL, MongoDB, MariaDB, UNIX Shell Scripting, AWS S3, EMR, Hortonworks HDP 2.5, Hadoop Stack, Apache Ranger and Apache Atlas

Confidential, Irvine, CA

Sr. Hadoop Developer

Responsibilities:

  • Involved gathering requirements from the client, giving estimates for developing projects and delivering the projects in time.
  • Designed conceptual model with Spark for performance optimization.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark using Scala.
  • Managed and reviewed Hadoop log files. Used Scala for integration Spark into Hadoop.
  • Used Ambari for Managing Hortonworks Distribution of Hadoop, especially for fault-tolerant workflows & error handling etc.
  • Developing and writing PIG scripts and loaded the data from RDBMS SERVER to Hive using Sqoop.
  • Created Hive tables to store the processed results in a tabular format.
  • Developed the Sqoop scripts in order to make the interaction between Hive and MySQL Database.
  • Developed Java Mapper and Reducer programs for complex business requirements.
  • Developed Java custom record reader, partitioner and serialization techniques.
  • Used different data formats (Text format and Avro format) while loading the data into HDFS.
  • Created Managed tables and External tables in Hive and loaded data from HDFS.
  • Performed complex HiveQL queries on Hive tables and created custom user-defined functions in Hive.
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with HiveQL queries.
  • Created partitioned tables and loaded data using both static partition and dynamic partition method.
  • Performed SQOOP import from MongoDB to ingest the data in HDFS and directly into Hive tables.
  • Performed incremental data movement to Hadoop using Sqoop.
  • Scheduled map reduce jobs in the production environment using Oozie scheduler.
  • Analyzed the Hadoop logs using PIG scripts to oversee the errors caused by the team.

Environment: HDFS, Map Reduce, Apache Spark Hive, Sqoop, Pig, Flume, HBase, Oozie Scheduler, Java, Oracle, Shell Scripts, Hortonworks, AWS S3, EMR, EC2

Confidential, Gaithersburg, MD

Hadoop Developer

Responsibilities:

  • Involved in loading data from LINUX file system to HDFS.
  • Implemented test scripts to support test driven development and continuous integration.
  • Responsible to manage data coming from different sources.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Worked on managing and reviewing Hadoop log files, managing and scheduling Jobs on a Hadoop cluster.
  • Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS
  • Wrote Hive queries for data analysis to meet the business requirements
  • Involved in writing Hive scripts to extract, transform and load the data into Database.
  • Used JIRA for bug tracking and used CVS for version control.

Environment: Hadoop, Hive, Linux, MapReduce, HDFS, Pig, Sqoop, Shell Scripting, Python, Java (JDK 1.6), Java 6, Eclipse, Control-M Scheduler, Oracle 10g, PL/SQL, SQL*PLUS, Toad 9.6, Linux, JIRA 5.1, CVS, JIRA 5.2

Confidential. Charlotte, NC

Java Developer

Responsibilities:

  • Involved in Analysis, Design, Development and Testing of the applications.
  • Incorporated UML diagrams (Class diagrams, Activity diagrams, Sequence diagrams) as part of design documentation and other system documentation.
  • Enhanced the Port search functionality by adding a VPN Extension Tab.
  • Created end to end functionality for view and edit of VPN Extension details.
  • Used Hibernate for persistence framework
  • Used Struts MVC framework and WebLogic Application Server in this application.
  • Involved in creating DAO’s and used Hibernate for ORM mapping.
  • Wrote procedures, and triggers for validating the consistency of metadata.
  • Wrote SQL code blocks using cursors for shifting records from various tables based on checks.
  • Fixed defects and generated input XML’s to run on SOA Client to generate output XML for testing Web services.
  • Wrote Java classes to test UI and Web services through JUnit and JWebUnit.
  • Extensively involved in release/deployment related critical activities.
  • Performed functional and integration testing and also tested the entire application using JUnit and JWebUnit.
  • Log4J was used to log both User Interface and Domain Level Messages and used CVS for version control.

Environment: JAVA, JSP, servlets, J2EE, EJB, Struts Framework, JDBC, WebLogic Application Server, Hibernate, Spring Framework, Oracle 9i, Unix, Web Services, CVS, Eclipse, JUnit, JWebUnit

Confidential

Java Developer

Responsibilities:

  • Configured data to provide persistence services and persistent objects to the application from the database using Hibernate ORM tool as persistence layer.
  • Developed DAO layer using Spring MVC configuration XML’s for Hibernate and to manage CRUD operations like insert, update and delete.
  • Implemented reusable services using BPEL to transfer data.
  • Developed dependency injection for Spring framework.
  • Developed Junit classes and created Junit test cases.
  • Configured logging (enable/disable) using log4j for the application.
  • Created user interface using HTMP, CSS, JSP, JQuery, AJAX, JavaScript and JSTL.
  • Implemented database operations using PL/SQL procedures and queries.
  • Developed shell scripts for UNIX environment to deploy EAR and read log files.
  • Implemented log4j for logging.

Environment: Java, Jest, SQL, Junit, PL/SQL, SOA Suite 10g BPEL, Struts, Spring, Hibernate, Web services JAX-WS, JMS, EJB, Web logic 10.1 Server, JDeveloper, HTML, LDAP, Maven, XML, CSS, JavaScript, JSON, Oracle, CVS and UNIX

We'd love your feedback!