We provide IT Staff Augmentation Services!

Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Wilmington, DE

SUMMARY

  • Around 8 years of professional experience in developing, implementing, configuring, testing various systems using Hadoop technologies.
  • Good knowledge of hadoop daemons like NameNode, Secondary NameNode, DataNode, Job Tracker and Task Tracker, Yarn.
  • Excellent understanding of distributed storage systems like HDFS and batch processing systems like MapReduce and Yarn.
  • Experience in developing efficient MapReduce programs for analyzing the large data sets according to the business requirement.
  • Experience in using Hive QL for analyzing, querying and summarizing huge data sets.
  • Used Pig as ETL tool to do transformations, event joins, filter and some pre - aggregation.
  • Developed User Defined Functions (UDFs) for Apache Pig and Hive using Python and Java languages.
  • Queried both Managed and External tables created by Hive using Impala.
  • Experience in loading logs from multiple sources directly into HDFS using Flume.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Experience in processing of real-time data using Spark and Scala.
  • Experienced with SparkContext, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Good knowledge of integrating Spark Streaming with Kafka for real time processing of streaming data.
  • Hands-on experience with message brokers such as Apache Kafka and RabbitMQ.
  • Worked on Configuring Zookeeper, Kafkacluster.
  • Experience in large-scale streaming data analytics using Storm.
  • Experience in NoSQL databases like Cassandra and Hbase.
  • Knowledge in Cassandra read and write paths and internal architecture.
  • Developed various complex data flows in Apache Nifi.
  • Worked on large datasets to generate insights by using Splunk.
  • Experience in fetching data into Hadoop Datalake from various databases like MySQL, Oracle using Sqoop.
  • Hands on Evaluation of ETL (Talend) and OLAP tools and recommend the most suitable solutions based on business requirements.
  • Experience in generating reports using Tableau by connecting to Hive.
  • Actively involved in Apache Solr integration, enhancing Solr performance and migrating from REST to Solr.
  • Involved in testing the AWS Redshift to connecting with SQL database for testing and storing data in POC.
  • Knowledge on AWS (Amazon EC2) Hadoop distribution.
  • Experience in using Kerberose for authenticating the end users using hadoop in a secure mode.
  • Experience in working with file formats like Parquet, Avro, RC File, Sequence Files, and JSON Record etc.
  • Performed testing of MapReduce jobs using MRUnit.
  • Excellent knowledge on UNIX and Shell scripting.
  • Expertise in design and development of Web Applications involving J2EE technologies with Java, Spring, EJB, Hibernate, Servlets, JSP, Struts, Web Services, XML, JMS, JDBC and etc.
  • Extensive experience in using Relational databases like Oracle, SQL Server and MySQL.
  • Experience in working with different Hadoop distributions like CDH and Hortonworks.
  • Experience in using build management tools like Maven and Ant.
  • Expertise in using Tomcat server and also application servers like JBoss and WebLogic.
  • Experience in all phases of software development life cycle.
  • Support development, testing, and operations teams during new system deployments.

TECHNICAL SKILLS

Programming Languages: C, C++, Java, Python and Scala.

Hadoop Eco System: HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Flume, ZookeeperOozie, Storm, Spark, Kafka, Impala.

Databases: MySQL, Oracle, SQL Server.

NoSQL Databases: HBase, Cassandra.

Operating Systems: Windows XP/7/8, 10, Linux Red Hat/Ubuntu/CentOS.

Cluster Monitoring Tools: Hortonworks, Cloudera Manager.

Scripting Languages: HTML, CSS, JavaScript, XML.

Tools:, Technologies: Servlets, Struts, JDBC, JSP, Web Services, Spring, Hibernate.

Web/ App Servers: Apache Tomcat Server, JBoss, WebLogic.

IDEs: Eclipse, Microsoft Visual Studio, Net Beans.

Business Intelligence Tools: Tableau.

PROFESSIONAL EXPERIENCE

Confidential - Wilmington, DE

Hadoop Developer

Responsibilities:

  • Analyzing inbound and outbound data delivery requirements, design and implement data extraction, transformation and load from AWS S3 to Hadoop based Enterprise Data Hub and vice versa.
  • Data cleansing to remove the header lines and HTML tags from the data received from Salesforce Team.
  • Designing parameter files as per business requirements, creating Hive external metadata tables and use them to drive bash scripts that perform data ingestion and data delivery.
  • Implement partitioning in Hive by creating partitions based on date for the incremental data loads.
  • Creating Impala tables for faster querying of the data as Impala is Massively parallel processing SQL query engine.
  • Optimizing Impala query performance by using compute stats and metadata refresh wherever required.
  • Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDs andScala.
  • Used Spark Data frame APIs to ingest data from HDFS to S3.
  • Developing UNIX shell scripts for Oozie wokflows to automate the process of data loading.
  • Developing test cases for inbound and outbound data flows and performing data quality check on various data sets.
  • Involve in Ad Hoc stand up and architecture meetings to set up daily priorities and track the status of work as a part of highly agile work environment.
  • Technical Documentation of the project which include Standard Operating Procedure, Low Level Design and High Level Design Documents.

Environment: CDH 5.5, HDFS, Yarn, Hive, Oozie, Apache Impala, Spark, Zookeeper, Parquet, UNIX, AWS S3.

Confidential - Lyndhurst, NJ

Hadoop Developer

Responsibilities:

  • Involved in data ingestion into HDFS using Sqoop from variety of sources using the connectors like jdbc and import parameters
  • Developed MapReduce programs using Java to parse the raw data, populate staging tables and store the refined data in partitioned tables in HDFS.
  • Transformed raw data from several data sources into baseline data by developing Pig scripts and loaded the data into HBase tables.
  • Designed HBase schema to avoid Hot spotting and exposed the data from HBase tables to REST API on UI.
  • Helped market analysts by creating Hive queries to spot the emerging trends by comparing fresh data with HDFS reference tables and historical metrics.
  • Worked on Creating Kafkatopics, partitions, writing custom partitioner classes.
  • Used Kafka to store events from various system and processed using Spark Streaming to perform near real time analytics.
  • Defined the job flows in Oozie to automate the process of data loading into the HDFS and Pig.
  • Involved in creating POCs to ingest and process streaming data using Spark streaming and Kafka.
  • Performed various performance optimizations like using distributed cache for small datasets, Partitions, Bucketing in Hive and Reduce Side joins in Mapreduce.
  • Worked onTalendETL to load data from various sources to Oracle DB. Used tmap, treplicate, tfilterrow, tsort and various other features inTalend
  • Involved in verifying cleaned data using Talend tool with other department.

Environment: HDFS, Mapreduce, Pig, Hive, HBase, Spark, Scala, Sqoop, Oozie, Kafka, Talend, Linux, Java, Maven, GIT, Jenkins.

Confidential - Naples, FL

Hadoop Developer

Responsibilities:

  • Involved in Developing Hive scripts to parse the raw data, populate staging tables and store the refined data in partitioned tables in the Hive.
  • Developed Hive UDFs (User Defined Functions) where the functionality is too complex, using Python and Java languages.
  • Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
  • Wrote script for generating Flume config files for different logs types and for different environments.
  • Designed and implemented a Cassandra NoSQL based database and associated RESTful web service that persists high-volume user profile data for vertical teams.
  • Did an assessment on the Cassandra clusters for performance improvements and to knock out the existing time outs.
  • Pre-processed large sets of structured and semi-structured data, with different formatslike Text Files, Avro, Sequence Files, and JSON Record.
  • Worked on POCs and involved in fixing performance & storage issues using different Compression Techniques, Custom Combiners and Custom Partitioners.
  • Used MRUnit for testing raw data and executed performance scripts.
  • Involved in automation of Hadoop jobs using Oozie workflows and coordinators.
  • Assisted with data capacity planning and node forecasting.
  • Analyzed Migration plans for various versions of Cloudera Distribution of ApacheHadoopto draw the impact analysis, and proposed the mitigation plan for the same.

Environment: Hive, Flume, Oozie, Cassandra, MRUnit, Java, Python, UNIX, Cloudera, Maven, GIT.

Confidential

Java Developer

Responsibilities:

  • Involved in Requirement analysis and design, development of the application using Java Technologies.
  • Developed the login screen so that the application can be accessed only by authorized and authenticated administrators.
  • Used HTML, CSS, JSP's to design and develop front end and used Java Script to perform user validation.
  • Performed Designing, developing, and configuring server side J2EE components like EJB, Java Beans, and Servlets.
  • Involved in Creating tables, functions, triggers, sequences and stored procedures in PL/SQL.
  • Implemented business logic by developing Session Beans.
  • Used Hibernate as the ORM and PL/SQL for handling database processing.
  • Used JDBC-API to communicate with the Database.
  • Developed application using Waterfall model software methodology.
  • Involved In technical documentation of project.

Environment: Java, HTML, CSS, JSP, Servlets, EJB, JDBC, Hibernate, PL/SQL, Oracle8i.

Confidential

Associate Java Developer

Responsibilities:

  • Used struts MVC Architecture to develop the modules.
  • Developed The UI using JavaScript, JSP, HTML, and CSS for interactive cross browser functionality and complex user interface.
  • Used Servlets and Session beans for creating Business Logic and deployed them on Weblogic server.
  • Application design using MVC struts framework.
  • Developed complex SQL Queries and PL/SQL Stored procedures.
  • Prepared the Functional, Design and Test case specifications.
  • Performed some database side validations by writing Stored Procedures in Oracle.
  • Used JUNIT for unit testing of the application.
  • Provided Technical support for production environments and resolving the issues.

Environment: Java, HTML, Java Script, CSS, Oracle, JDBC, Swing and Eclipse.

We'd love your feedback!