We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Pleasanton, CA

SUMMARY

  • 7+ years of overall software development experience on Big Data Technologies, Hadoop Eco system and Java/J2EE Technologies with experience programming in Java, Scala, Python and SQL.
  • 4+ years of strong hands - on experience on Hadoop Ecosystem including Spark, Map-Reduce, HIVE, Pig, HDFS, YARN, HBase, Oozie, Kafka, Sqoop, Flume.
  • Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance.
  • Worked with real-time data processing and streaming techniques using Spark streaming, Storm and Kafka.
  • Experience in moving data into and out of the HDFS and Relational Database Systems (RDBMS) using Apache Sqoop.
  • Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data.
  • Significant experience writing custom UDF’s in Hive and custom Input Formats in MapReduce.
  • In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLIib.
  • Good knowledge on Data Warehousing, ETL development, Distributed Computing, and largescale data processing.
  • Good knowledge on INFORMATICA for ETL tool, and stored procedures to pull data from source systems/ files, cleanse, transform and load data into databases.
  • Experience in installation, configuration, Management, supporting and monitoring Hadoop cluster using various distributions such as Apache SPARK, Cloudera and AWS Service console.
  • Designed and Implemented test environment on AWS and build data pipelines from scratch using multiple AWS services.
  • Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, CloudWatch and other services of the AWS family.
  • Strong experience productionalizing end to end data pipelines on Hadoop platform.
  • Good experience is designing and implementing end to end Data Security and Governance within Hadoop Platform using Kerberos .
  • Experience in architecting, designing, and building distributed software systems.
  • Extensively worked on UNIX shell scripts to do the batch processing.
  • Experience in using various Hadoop Distributions like Cloudera, Hortonworks and Amazon EMR.
  • Experience in developing service components using JDBC.
  • Experience working with NoSQL database technologies, including MongoDB, Cassandra and HBase.
  • Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, JSON and Avro .
  • Expertise in Database Design, Creation and Management of Schemas, writing Stored Procedures, Functions, DDL, DML SQL queries.
  • Good knowledge of No-SQL databases Cassandra, MongoDB and HBase .
  • Worked on HBase to load and retrieve data for real time processing using Rest API.
  • Experience in developing applications using waterfall and Agile ( XP and Scrum ).
  • Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, HDFS, Map Reduce, Hive, Pig, HBase, Impala, Sqoop, Flume, NoSQL (HBase, Cassandra), Spark, Kafka, Zookeeper, Oozie, Hue, Cloudera Manager, Amazon AWS, Hortonwork clusters

AWS Ecosystems: S3, EC2, EMR, Redshift, Athena, Glue, Lambda, SNS, CloudWatch

Java/J2EE & Web Technologies : J2EE, JMS, JSF, JDBC, Servlets, HTML, CSS, XML, XHTML, AJAX, JavaScript.

Languages: C, C++, Core Java, Shell Scripting, SQL, PL/SQL, Python, Pig Latin

Operating systems: Windows, Linux and Unix

DBMS / RDBMS: Oracle, Talend ETL, Microsoft SQL Server 2012/2008, MySQL, DB2 and NoSQL, Teradata SQL, MongoDB, Cassandra, HBase

IDE and Build Tools: Eclipse, NetBeans, MS Visual Studio, Ant, Maven, JIRA, Confluence

Version Control: Git, SVN, CVS

Web Services: RESTful, SOAP

Web Servers: Web Logic, Web Sphere, Apache Tomcat

PROFESSIONAL EXPERIENCE

Senior Hadoop Developer

Confidential, Pleasanton, CA

Responsibilities

  • Responsible for installation and configuration of Hive, Pig, Hbase and Sqoop on the Hadoop cluster and created hive tables to store the processed results in a tabular format.
  • Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
  • Developed the Sqoop scripts to make the interaction between Hive and vertica Database.
  • Processed data into HDFS by developing solutions and analysed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
  • Build servers using AWS: Importing volumes, launching EC2, creating security groups, auto-scaling, load balancers, Route 53, SES and SNS in the defined virtual private connection.
  • Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
  • Streamed AWS log group into Lambda function to create service now incident.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analysed them by running Hive queries and Pig scripts.
  • Created Managed tables and External tables in Hive and loaded data from HDFS.
  • Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
  • Scheduled several times based Oozie workflow by developing Python scripts.
  • Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT to extract data from data files to load into HDFS.
  • Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
  • Worked on S3 buckets on AWS to store Cloud Formation Templates and worked on AWS to create EC2 instances.
  • Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, sqoop, package and MySQL.
  • End-to-end architecture and implementation of client-server systems using Scala, Akka, Java, JavaScript and related, Linux
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Implementing Hadoop with the AWS EC2 system using a few instances in gathering and analysing data log files.
  • Involved in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
  • Created partitioned tables and loaded data using both static partition and dynamic partition method.
  • Developed custom Apache Spark programs in Scala to analyse and transform unstructured data.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
  • Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.
  • Test Driven Development (TDD) process and extensive experience with Agile and SCRUM programming methodology.
  • Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA
  • Scheduled map reduces jobs in production environment using Oozie scheduler.
  • Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
  • Designed and implemented map reduce jobs to support distributed processing using java, Hive and Apache Pig
  • Analysing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
  • Improved the Performance by tuning of HIVE and map reduce.
  • Research, evaluate and utilize modern technologies/tools/frameworks around Hadoop ecosystem.

Environment: HDFS, Map Reduce, Hive, Sqoop, Pig, Flume, Vertica, Oozie Scheduler, Java, Shell Scripts, Teradata, Oracle, HBase, MongoDB, Cassandra, Cloudera, AWS, JavaScript, JSP, Kafka, Spark, Scala and ETL, Python.

Hadoop Spark Developer

Confidential, Plano, TX

Responsibilities:

  • Examined data, identified outliers, inconsistencies and manipulated data to insure data quality and integration.
  • Developed data pipeline using Sqoop, Spark and Hive to ingest, transform and analyse operational data.
  • Used Spark SQL with Scala for creating data frames and performed transformations on data frames.
  • Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
  • Real time streaming the data using Spark and Kafka.
  • Worked on troubleshooting spark application to make them more error tolerant.
  • Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
  • Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
  • Wrote Kafka producers to stream the data from external rest API's to Kafka topics.
  • Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
  • Experienced in handling large datasets using Spark in Memory capabilities using broadcasts variables in Spark, effective & efficient Joins, transformations and other capabilities.
  • Experience with Kafka in understanding and performing thousands of megabytes of reads and writes per second on streaming data.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Worked extensively with Sqoop for importing data from Oracle.
  • Experience working for EMR cluster in AWS cloud and working with S3.
  • Involved in creating Hive tables, loading and analysing data using Hive scripts.
  • Created Hive tables, dynamic partitions, buckets for sampling and working on them using Hive QL.
  • Used MAVEN extensively for building jar files of Map Reduce programs & deployed to cluster.
  • Performing data migration from Legacy Databases RDBMS to HDFS using Sqoop.
  • Perform Tuning and Increase Operational efficiency on a continuous basis.
  • Worked on Spark SQL, reading/ Writing data from JSON file, text file, parquet file, schema RDD.
  • Worked on POC's with Apache Spark using Scala to implement spark in project.

Environment: Hadoop YARN, Spark-Core, Spark Streaming, Spark SQL, Scala, Kafka, Hive, Sqoop, Amazon AWS, HBase, Teradata, Power Centre, Tableau, Oozie, Oracle, Linux

Senior Hadoop Developer

Confidential, Houston, TX

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Written MapReduce code to process & parsing the data from various sources & storing parsed data into HBase and Hive using HBase-Hive Integration.
  • Involved in developing UML Use case diagrams, Class diagrams, and Sequence diagrams using Rational Rose.
  • Worked on moving all log files generated from various sources to HDFS for further processing.
  • Developed workflows using custom MapReduce, Pig, Hive and Sqoop .
  • Developing predictive analytic product for using Apache Spark, SQL/HiveQL, JavaScript, and High Charts.
  • Writing Spark programs to load, parse, refined and store sensor data into Hadoop and also process analyzed and aggregate data for visualizations.
  • Developing data pipeline programs with Spark Scala APIs, data aggregations with Hive, and formatting data (json) for visualization, and generating. E.g. Highcharts: Outlier, data distribution, Correlation/comparison, and 2-dimension charts using JavaScript.
  • Creating various views for HBASE tables and also utilizing the performance of Hive on top of HBASE .
  • Developed the Apache Storm, Kafka, and HDFS integration project to do a real time data analysis.
  • Designed and developed the Apache Storm topologies for Inbound and outbound data for real time ETL to find the latest trends and keywords.
  • Developed Map Reduce program for parsing and loading into HDFS information.
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying . Written Hive UDF to sort Structure fields and return complex data type.
  • Responsible for loading data from UNIX file system to HDFS .
  • Developed ETL Applications using HIVE, SPARK, IMPALA & SQOOP and Automated using Oozie
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
  • Designed and developed a distributed processing system running to process binary files in parallel and crunch the analysis metrics into a Data Warehousing platform for reporting.
  • Developed workflow in Control M to automate tasks of loading data into HDFS and pre-processing with PIG .
  • Cluster co-ordination services through ZooKeeper
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster .
  • Modelled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.

Environment: Hive QL, MySQL, HBase, HDFS, HIVE, Eclipse (Kepler), Hadoop, Oracle 11g, PL/SQL, SQL*PLUS, Toad 9.6, Flume, PIG, Oozie, Sqoop, Aws, Spark, Unix, Tableau, Cosmos.

Java Developer

Confidential

Responsibilities:

  • Involved in development of business domain concepts into Use Cases, Sequence Diagrams, Class Diagrams, Component Diagrams and Implementation Diagrams.
  • Implemented various J2EE Design Patterns such as Model-View-Controller, Data Access Object, Business Delegate and Transfer Object.
  • Responsible for analysis & design of the application based on MVC Architecture, using open source Struts Framework.
  • Involved in configuring Struts, Tiles and developing the configuration files.
  • Developed Struts Action classes and Validation classes using Struts controller component and Struts validation framework.
  • Developed and deployed UI layer logics using JSP, XML, JavaScript, HTML /DHTML.
  • Used Spring Framework and integrated it with Struts .
  • Designed a lightweight model for the product using Inversion of Control principle and implemented it successfully using Spring IOC Container.
  • Used transaction interceptor provided by Spring for declarative Transaction Management .
  • The dependencies between the classes were managed by Spring using the Dependency Injection to promote loose coupling between them .
  • Provided connections using JDBC to the database and developed SQL queries to manipulate the data.
  • Wrote stored procedure and used JAVA APIs to call these procedures.
  • Developed various test cases such as unit tests, mock tests, and integration tests using the JUNIT .
  • Experience writing Stored Procedures, Functions and Packages.

Environment: Java, J2EE, Struts MVC, Tiles, JDBC, JSP, JavaScript, HTML, Spring IOC, Spring AOP, JAX-WS, Ant, Web sphere Application Server, Oracle, JUNIT and Log4j, Eclipse

Java Developer

Confidential

Responsibilities:

  • Competency in using XML Web Services by using SOAP to transfer data to supply chain and for Domain expertise Monitoring Systems.
  • Worked on Maven to build tool for building jar files.
  • Used the Hibernate framework (ORM) to interact with the database.
  • Knowledge in struts tiles framework for layout management.
  • Worked on design, analysis, and development and testing various phases of the application.
  • Developed user interface using JSP and HTML.
  • Used JDBC for the Database connectivity.
  • Involved in projects utilizing Java, JavaEE web applications in the creation of fully integrated client management systems.
  • Executed SQL statements for searching contactors depending on Criteria.
  • Development and integration of the application using Eclipse IDE.
  • Developed Junit for server-side code.
  • Involved in building, testing and debugging of JSP pages in the system.
  • Involved in multi-tiered J2EE design utilizing spring (IOC) architecture and Hibernate.
  • Involved in the development of front-end screens using technologies like JSP, HTML, AJAX and JavaScript.
  • Configured spring managed beans.
  • Spring Security API is used for configured security.
  • Investigated, debug and fixed the potential bugs in the implementation code.

Environment: Java, J2EE, JSP, Hibernate, Struts, XML Schema, SOAP, Java Script, PL/SQL, Junit, AJAX, HQL, JSP, HTML, JDBC, Maven, Eclipse.

We'd love your feedback!