We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

3.00/5 (Submit Your Rating)

Chesterbrook, PA

PROFESSIONAL SUMMARY:

  • 8+ years of overall IT experience in a variety of industries, which includes hands on experience of 4+ years in Big Data Analytics and development
  • Expertise wif the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Spark, Kafka, Yarn, Oozie, and Zookeeper.
  • Excellent noledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm
  • Experience in designing and developing applications in Spark using Scala to compare the performance of Spark wif Hive and SQL/Oracle.
  • Experience in Real - time processing and Batch processing using Spark. Developed Spark Streaming Applications for Real Time Processing.
  • Experience in using Spark-SQL wif various data sources like JSON, Parquet and Hive.
  • Experience in analyzing large datasets and finding patterns and insights wifin structured and unstructured data.
  • Strong experience on Hadoop distributions like Cloudera, MapR and HortonWorks.
  • Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra.
  • Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa according to client's requirement.
  • Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
  • Expertise in integrating the data from multiple data sources using Kafka
  • Experienced in writing complex MapReduce programs dat work wif different file formats like Text, Sequence, Xml and JSON.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct AcyclicGraph (DAG) of actions wif control flows.
  • Experienced in working wif AmazonWebServices (AWS) using EC2 for computing and S3 as storage mechanism.
  • Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
  • Key participant in all phases of software development life cycle wif Analysis, Design, Development, Integration, Implementation, Debugging, and Testing of Software Applications in client server environment, Object Oriented Technology and Web based applications.
  • Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC, SOAP and RESTful web services.
  • Excellent implementation noledge of Enterprise/Web/Client Server using Java, J2EE.
  • Strong Experience of Data Warehousing ETL concepts using Informatica Power Center, OLAP, OLTP and AutoSys.
  • Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle.
  • Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.
  • Worked in large and small teams for systems requirement, design & development.
  • Experience in using various IDEs Eclipse, IntelliJ and repositories SVN and Git.
  • Experience of using build tools Ant, Maven.
  • Preparation of Standard Code guidelines, analysis and testing documentations.

TECHNICAL SKILLS:

BigData/Hadoop Technologies: HDFS, YARN, MapReduce, Hive, Pig, Impala, HBase, Phoenix, Sqoop, Flume, Spark, Kafka, Storm, Drill, Zookeeper and Oozie

NO SQL Databases: HBase, Cassandra, MongoDB

Languages: C, Java, Scala, SQL, PL/SQL, Pig Latin, HiveQL, Shell Scripting

Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JMS, EJB, RESTful

Application Servers: Web Logic, Web Sphere

Cloud Computing Tools: Amazon AWS

Databases: Microsoft SQL Server, MySQL, Oracle, DB2

Operating Systems: UNIX, Windows, LINUX

Build Tools: Jenkins, Maven, ANT

Business Intelligence Tools: Tableau, Splunk, Qlik View

Development Tools: Microsoft SQL Studio, Eclipse, IntelliJ

Development Methodologies: Agile/Scrum, Waterfall

Version Control Tools: Git, SVN

PROFESSIONAL EXPERIENCE:

Confidential

Hadoop/Spark Developer

Responsibilities:

  • Integrated the Kafka and Spark Streaming to consume the data from the external sources and run the custom functions.
  • Installed and configured Hadoop, HDFS, Spark and Kafka Developed multiple Spark jobs in Scala for data cleaning and pre-processing.
  • Multiple Spark Jobs were written to perform Data Quality checks on data before files were moved to Data Processing Layer.
  • Using the complete advantage of the cluster environment by performance tuning of the Spark jobs using Cache.
  • Integrated Spark Streaming and HBase to ingest the real-time data into the HBase.
  • Reviewed and managed all log files using HBase.
  • Using the Scala Shell commands to develop Spark Scripts.
  • Experienced in performance tuning ofSparkApplications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Optimizing of existing algorithms in Hadoop using SparkContext, Spark-SQL, Data Frames and Pair RDD's.
  • Used Apache Phoenix on top of HBase to perform SQL queries on top of the HBase tables.
  • Connected HBase tables wif Phoenix and integrated them in Spark through Phoenix JDBC Client.
  • Loading Data into HBase using Bulk Load and Non-bulk load.
  • Involved in loading data from UNIX file system to HDFS.
  • Adding/installation of new components and removal of them through Ambari.
  • Responsible for implementation and ongoing administration of Hadoop infrastructure
  • Managed and reviewed Hadoop log files.
  • Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop.
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage review log files.
  • Day to day responsibilities includes solving developer issues, deployments moving code from one environment to another environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
  • Collaborating wif application teams to install the operating system and Hadoop updates, patches, version upgrades.
  • Monitored workload, job performance, and capacity planning
  • Involved in Analyzing system failures, identifying root causes, and recommended a course of actions.
  • Followed Agile Methodologies while working on the project.

Environment: Hadoop, Spark, Spark Streaming, HDFS, HBase, Phoenix, Kafka, Scala, AWS, Sqoop, Oracle MySQL, Java.

Confidential - Chesterbrook, PA

Hadoop Developer / Spark Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Development of Spark jobs for data cleansing and data processing of flat files.
  • Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • DevelopedSparkApplications in Scala and build them using SBT.
  • UsedSparkAPI over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDAFs using both Data frames/SQL/Data sets and RDD/MapReduce inSpark1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in performance tuning ofSparkApplications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Optimizing of existing algorithms in Hadoop using SparkContext, Spark-SQL, Data Frames and Pair RDD's.
  • Experienced in handling large datasets using Partitions,Sparkin Memory capabilities, Broadcasts inSpark, TEMPEffective & efficient Joins, Transformations and other during ingestion process itself.
  • Experienced in handling large datasets using Partitions,Sparkin Memory capabilities, Broadcasts inSpark, TEMPEffective & efficient Joins, Transformations and other during ingestion process itself
  • Designed, developed and did maintenance of data pipelines in a Hadoop and RDBMS environment wif both traditional and non-traditional source systems using RDBMS and NoSQL data stores for data access and analysis.
  • Worked on a POC to compare processing time of Impala wif Apache Hive for batch applications to implement the former in project.
  • Worked on Cluster of size 400 nodes.
  • Worked extensively wif Sqoop for importing metadata from Oracle.
  • Experience in installation & configuration of Apache Hadoopon Amazon AWS (EC2) system.
  • Involved in creating Hive tables, loading and analyzing data using hive queries
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Used Talend open studio for getting the data.
  • Good experience wif continuous Integration of application using Jenkins.
  • Used Reporting tools like Tableau to connect wif Hive for generating daily reports of data.
  • Collaborated wif the infrastructure, network, database, application and BA teams to ensure data quality and availability.

Environment: Hadoop YARN, Spark-Core, Spark Streaming, Spark SQL, Scala, Kafka, Hive, Sqoop, Amazon AWS, Impala, Cassandra, Tableau, Oozie, Jenkins, Talend, Cloudera, Oracle 12c, Linux

Confidential - Austin, TX

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Hive, and MapReduce.
  • Involved in increasing the performance of system by adding other real time components like Flume, Spark to the platform.
  • Installed and configured Spark, Flume, Zookeeper, Ganglia and Nagios on the Hadoop cluster.
  • Hands on experience on Implementing Spark wif Scala.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
  • Developed Map Reduce Programs for data analysis and data cleaning.
  • Working wif Apache Crunch library to write, test and run MapReduce pipeline jobs.
  • Developed PIG Latin scripts for the analysis of semi structured data.
  • Continuous monitoring and provisioning of Hadoop cluster through Cloudera Manager.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Worked on Impala for obtaining fast results wifout any transformation of data.
  • Worked on Kafka and Storm to ingest the real time data streams, to push the data to appropriate HDFS or HBase.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Used Tableau for visualizing and analyzing the data.
  • Experience on using Solr search engine which can be used for indexing and searching the data.

Environment: Hadoop, MapReduce, Hive, HDFS, Pig, CDH4.0, Sqoop, Kafka, Storm, Oozie, HBase, Spark, Scala, Cloudera Manager, Crunch, Tableau, Linux, Unix.

Confidential

Java/J2EE Developer

Responsibilities:

  • Involved in Requirement Analysis, Design, Development and Testing of the risk workflow system.
  • Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) dat includes Development, Testing, Implementation and Maintenance Support.
  • Applied OOAD principle for the analysis and design of the system.
  • Implemented XML Schema as part of XQuery query language
  • Applied J2EE design patterns like Singleton, Business Delegate, Service Locator, Data Transfer Object (DTO), Data Access Objects (DAO) and Adapter during the development of components.
  • Used RAD for the Development, Testing and Debugging of the application.
  • Used Websphere Application Server to deploy the build.
  • Developed front-end screens using Struts, JSP, HTML, AJAX, JQuery, Java script, JSON and CSS.
  • Used J2EE for the development of business layer services.
  • Developed Struts Action Forms, Action classes and performed action mapping using Struts.
  • Performed data validation in Struts Form beans and Action Classes.
  • Developed POJO based programming model using spring framework.
  • Used IOC (Inversion of Control) Pattern and Dependency Injection of Spring framework for wiring and managing business objects.
  • Used Oracle 10g database for data persistence and SQL Developer was used as a database client.
  • Extensively worked on Windows and UNIX operating systems.
  • Used Secure CRT to transfer file from local system to UNIX system.
  • Performed Test Driven Development (TDD) using JUnit.
  • Used Ant script for build automation.
  • SVN version control system has been used to check-in and checkout the developed artifacts. The version control system has been integrated wif Eclipse IDE.
  • Used Rational Clear quest for defect logging and issue tracking.

Environment: Windows XP, RAD7.0, Core Java, J2EE, Struts, Spring, Hibernate, Web Services, Design Patterns, Websphere, Ant, (Servlet, JSP), HTML, AJAX, JavaScript, CSS, jQuery, JSON,SOAP, WSDL, XML, Eclipse, Agile, Jira, Oracle 10g, WinSCP, Log4J, JUnit.

Confidential

Java Developer

Responsibilities:

  • Designed and developed the application using agile methodology.
  • Developed web components using JSP, Servlets and JDBC.
  • Analyzing the use-cases to understand the business requirements and to assess the technical implementation of the functionality.
  • Used Java Mail API extensively to send the automated emails whenever ticket status or workflow steps got changed.
  • Designed, Implemented, Tested and Deployed Enterprise Java Beans both Session and Entity using WebLogic as Application Server.
  • Used tools like TOAD for SQL operations on Oracle Database.
  • Development of database interaction code to JDBC API making extensive use of SQL.
  • Query Statements and advanced Prepared Statements.
  • Used connection pooling for best optimization using JDBC interface.
  • Used EJB entity and session beans to implement business logic and session handling and transactions. Developed user-interface using JSP, Servlets, and JavaScript.
  • Wrote complex SQL queries and stored procedures.
  • Used JavaScript for Client side validation.

Environment: JSPs, Servlets, Java Beans, UML, JDK 1.5, Oracle, TOAD, Java Script, HTML and CSS.

We'd love your feedback!