We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

3.00/5 (Submit Your Rating)

Richmond, VA

PROFESSIONAL SUMMARY:

  • Around 8 years of experience in Information Technology, this includes Analysis, Design, Coding, Testing, Implementation and in Java and Big Data Technologies.
  • Over more than 4.5 year of work experience on Big Data Analytics with hands on experience on writing Spark and Map Reduce jobs on Hadoop Ecosystem including Hive, Pig, Sqoop and Amazon Web Services (AWS).
  • Good Working Expertise on handling Terabytes of structured and unstructured data on huge Cluster environment.
  • Experience in using SDLC methodologies like Waterfall, Agile Scrum, and TDD for design and development.
  • Expertise in implementing Spark modules and tuning its performance.
  • Good knowledge of event - based data processing using Spark Streaming
  • Experienced in performance tuning of Spark applications using various resource allocation techniques and transformations reducing the Shuffles and increasing the Data Locality configurations .
  • Expertise in Kerberos Security Implementation and securing the cluster.
  • Expertise in creating Hive Internal/External Tables/Views using shared Meta store, writing scripts in Havel and experience in data transformation & file processing, building analytics using Pig Latin Scripts.
  • Expertise in writing custom UDFs in Pig & Hive Core Functionality.
  • Developed, deployed and supported several Map Reduce applications in Java to handle different types of data.
  • Worked with various compression techniques like Avro, Snappy, and LZO.
  • Hands on experience dealing with AVRO and Parquet file format, following best Practices and improving the performance using Partitioning, Bucketing, and Map side-joins and creating Indexes .
  • Expert in implementing advanced procedures like text analytics, processing and implementing streaming APIs using the in-memory computing capabilities like Apache Spark written in Scala, Python and Scala.
  • Hands on experience in dynamic partitioning, bucketing and Extending Hive core functionality by writing custom UDFs, UDTF and UDAFs
  • Exported data to various Databases like Teradata (Sales Data Warehouse), SQL-Server, Cassandra using Sqoop.
  • Experienced in creating shell scripts to push data loads from various sources from the edge nodes onto the HDFS.
  • Experienced in performing code reviews, involved closely in smoke testing sessions, retrospective sessions.
  • Used Kafka for reliable and asynchronous exchange of information between multiple business applications
  • Developed Kafka producers and consumers to produce and consume data from Kafka topics
  • POC's using Confluent Schema Registry and Kafka Connectors for ingesting data in S3 in AVRO format
  • Experience in testing using Junit, Mockito and Cucumber
  • Worked with cloud services like Amazon Web Services (AWS) and involved in ETL , Data Integration and Migration
  • Hands-on experience in Amazon Web Services ( AWS ) Cloud services like EC2, S3
  • Ability to spin up different AWS i nstances using cloud formation templates
  • Actively involved in requirement gathering, analysis, development, unit testing, ATDD and integration testing

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, Map Reduce, Spark Core, Spark Streaming, Spark SQL, Hive, Sqoop, Kafka and Zookeeper.

AWS Components: EC2, S3, EMR, CFT, Lambda

No SQL Databases: HBase, MongoDB

Languages: Java, Scala, Python, PL/SQL, UNIX shell scripts

Java/J2EE Technologies: JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets

EJB, JSF, jQuery Frameworks: Struts, Spring, Hibernate

Operating Systems: RedHat Linux, Ubuntu Linux and Windows /7/8

Web Technologies: HTML, XML, REST

Web/Application servers: Apache Tomcat, WebLogic

Databases: Oracle 9i/10g/11g, SQL Server, MySQL, Teradata

Tools: and IDE: Maven, Eclipse, ANT, SBT, IntelliJ Idea, PyCharm

PROFESSIONAL EXPERIENCE:

Confidential, Richmond, VA

Sr. Hadoop/Spark Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Implemented UDFs for Hive extending Generic UDF, UDTF and UDAF base classes to change the time zones implement logic actions and extract required parameters according to the business specification.
  • Extensive working knowledge of Partitioning , UDFs , Performance tuning, Compression -related properties on Hive tables.
  • Developed the UNIX shell scripts for creating the reports from Hive data.
  • Implemented Spark scripts in Python to perform extraction of required data from the data sets and storing it on HDFS.
  • Developed spark scripts and python functions that involve performing transformations and actions on data sets.
  • Configuring Spark Streaming in Python to receive real time data from the Kafka and store it onto HDFS.
  • Experienced in building analytics on top of spark using machine learningSpark.ml.
  • Involved in optimizing the Hive queries using Map-side join, Partitioning, Bucketing and Indexing.
  • Involved in tuning the Spark modules with various memory and resource allocation parameters, setting right batch interval time and varying the number of executors to meet the increasing load overtime.
  • Continuously monitored and managed the Hadoop cluster using Cloudera Manager.
  • Working on the various files format like JSON, AVRO, PARQUET
  • Implemented Spark SQL to connect to Hive to read the data and distributed processing to make highly scalable
  • Implemented RESTful Web Services to interact with Oracle/Cassandra to store/retrieve the data.
  • Generated detailed design documentation for the source-to-target transformations.
  • Wrote UNIX scripts to monitor data load/transformation.
  • Operating the cluster on AWS by using EC2, EMR, S3, LAMBDA and Elastic Search.
  • Used AWS services like EC2 and S3 for small data sets.
  • Developed Unit test using JUnit and Scala Unit Test Frameworks. Used mocking tools like Mockito and Power Mock
  • Developed load test to bench mark performance of applications using Cucumber Framework
  • Involved in planning process of iterations under the Agile Scrum methodology.
  • Wrote automation scripts in Cucumber framework for System and Product Acceptance Tests following Test Driven Development approach.

Environment: Java, Scala, Python, Hadoop, HDFS, Spark , Zookeeper, AWS, Confluent Kafka, CSV, JSON, AVRO data files.

Confidential, Dallas, TX

Hadoop/Spark Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common word2vec data model, which gets the data from Kafka in near real time and Persists into Cassandra.
  • Operating the cluster on AWS by using EC2, EMR, S3 and Elastic Search.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Migrated existing MapReduce programs to Spark using Scala and Python
  • Developed Scala scripts, UDFFs using both Data frames in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Worked on a sample POC (proof of concept) from hive to Red Shift.
  • On demand secure EMR launcher with custom spark submit steps using S3 Event, SNS, KMS and Lambda function.
  • Migrated an existing on-premises application to AWS
  • Used AWS services like EC2 and S3 for small data sets.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Created Hive tables and load the data using Sqoop and worked on them using Hive QL
  • Responsible for developing custom UDFs, UDAFs and UDTFs in Pig and Hive.
  • Optimizing the Hive Queries using the various files format like JSON, AVRO , ORC , and Parquet
  • Implemented Spark SQL to connect to Hive to read the data and distributed processing to make highly scalable
  • Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow to run multiple Spark Jobs in sequence for processing data
  • Generated diverse types of reports using HiveQL for business to analyze the data feed from sources.
  • Implemented RESTful Web Services to interact with Oracle/Cassandra to store/retrieve the data.
  • Generated detailed design documentation for the source-to-target transformations.
  • Involved in planning process of iterations under the Agile Scrum methodology.

Environment: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Spark , Oozie, Zookeeper, AWS, RDBMS/DB, MySQL, CSV, AVRO data files.

Confidential, Santa Clara, CA

Hadoop Developer

Responsibilities:

  • Gathering data from multiple sources like Teradata, Oracle and SQL Server using Sqoop and loading to HDFS
  • Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
  • Responsible for cleansing and validating data.
  • Responsible for writing Map-Reduce job which joins the incoming slices of data and pick only the fields needed for further processing.
  • Finding the right join conditions and create datasets conducive to data analysis
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and also written Hive UDFs.
  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Worked hands on with ETL process.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior like shopping enthusiasts, travelers, music lovers etc. Wrote REST Web services to expose the business methods to external services.
  • Exported the patterns analyzed back into Teradata using Sqoop.
  • Installed Oozie workflow engine to run multiple Hive.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.

Environment: Hadoop, MapReduce, HDFS, Hive, Sqoop, Cloudera, Oozie, UNIX

Confidential, Pittsburgh, PA

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce.
  • Loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
  • Implementing MapReduce programs to analyze large datasets in warehouse for business intelligence Purpose.
  • Used default MapReduce Input and Output Formats.
  • Developed HQL queries to implement the select, insert, update and operations to the database by creating HQL named queries.
  • Developed simple to complex Map/Reduce jobs using Java, and scripts using Hive and Pig.
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) for data ingestion and egress.
  • Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
  • Experienced on loading and transforming of large sets of structured and semi structured data.
  • Managing and Reviewing Hadoop Log Files, deploy and Maintaining Hadoop Cluster.
  • Export filtered data into HBase for fast query.
  • Involved in creating Hive tables, loading with data and writing Hive queries.
  • Created data-models for customer data using the Cassandra Query Language.
  • Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) And move the data files within and outside of HDFS.
  • Queried and analyzed data from Datastax Cassandra for quick searching, sorting and grouping.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.

Environment: Apache Hadoop (Cloudera), HBase, Hive, Pig, Map Reduce, Sqoop, Oozie, Eclipse, Java

Confidential

J2EE Developer

Responsibilities:

  • Interacted with business managers to transform requirements into technical solutions.
  • Transformed the Use Cases into Class Diagrams, Sequence Diagrams and State diagrams
  • Generated Domain Layer classes using DAO’s from the Database Schema.
  • Derived the Spring MVC Controllers from the Use cases and integrated with the Service Layer to carry out business logic operations and returning resultant Data Model Object if any.
  • Worked on the Service Layer implementing the core business logic and providing access to external REST based services.
  • Integrate the designed JSP pages with the View Resolvers, to display the view after carrying out the desired operations in the Service layer.
  • Worked on Spring Web Flow Design using the Sequence Diagrams and configured the flows between the pre-defined Views and Controllers.
  • Developed Validation Layer providing Validator classes for input validation, pattern validation and access control.
  • Defined set of classes for the Helper Layer which validates the Data Models from the Service Layer and prepares them to display in JSP Views.
  • Used AJAX calls to dynamically assemble the data in JSP page, on receiving user input.
  • Used Log4J to print the logging, debugging, warning, info on the server console.
  • Involved in creation of Test Cases for JUnit Testing and carried out Unit testing.
  • Used SVN as configuration management tool for code versioning and release.
  • Deployment on Oracle WebLogic Server 10.3.
  • Used ANT tool for deployment of the web application on the WebLogic Server.
  • Involved in the functional tests of the application and resolved production issues

Environment: Java 1.6, J2EE 5 Servlet, JSP, Spring 2.5, Oracle WebLogic, Log4j, Web Services, JavaScript, SQL Server 2005, SQL Management Studio, PL\SQL, UML, Rational Rose, CVS, Eclipse.

Confidential

Jr Java Developer

Responsibilities:

  • Involved in specification analysis and identifying the requirements.
  • Participated in design discussions for the methodology of requirement implementation
  • Involved in preparation of the Code Review Document & Technical Design Document
  • Designed the presentation layer by developing the jsp pages for the modules
  • Developed controllers and JavaBeans encapsulating the business logic
  • Developed classes to interface with underlying web services layer
  • Used patterns including MVC, DAO, DTO, Front Controller, Service Locator and Business Delegate.
  • Worked on Service Layer which provided business logic implementation.
  • Involved in building PL\SQL queries and stored procedures for Database operations.
  • Used Jasper Reports to provide print preview of Financial Reports and Monthly Statements.
  • Carried out integration testing & acceptance testing
  • Used JMeter to carry out performance tests on external web service calls, database connections and other dynamic resources.
  • Participated in the team meetings and discussed enhancements, issues and proposed feasible solutions.

Environment: Java1.4, J2EE 1.4 Servlet, JSP, JDBC, XML, ANT, Apache Tomcat 5.0, Oracle 8i, JUnit, PL\SQL, UML, NetBeans.

We'd love your feedback!