Hadoop Developer Resume Wilmington, DE - Hire IT People

SUMMARY

Around 8 years of professional experience in developing, implementing, configuring, testing various systems using Hadoop technologies.
Good knowledge of hadoop daemons like NameNode, Secondary NameNode, DataNode, Job Tracker and Task Tracker, Yarn.
Excellent understanding of distributed storage systems like HDFS and batch processing systems like MapReduce and Yarn.
Experience in developing efficient MapReduce programs for analyzing the large data sets according to the business requirement.
Experience in using Hive QL for analyzing, querying and summarizing huge data sets.
Used Pig as ETL tool to do transformations, event joins, filter and some pre - aggregation.
Developed User Defined Functions (UDFs) for Apache Pig and Hive using Python and Java languages.
Queried both Managed and External tables created by Hive using Impala.
Experience in loading logs from multiple sources directly into HDFS using Flume.
Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
Experience in processing of real-time data using Spark and Scala.
Experienced with SparkContext, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Good knowledge of integrating Spark Streaming with Kafka for real time processing of streaming data.
Hands-on experience with message brokers such as Apache Kafka and RabbitMQ.
Worked on Configuring Zookeeper, Kafkacluster.
Experience in large-scale streaming data analytics using Storm.
Experience in NoSQL databases like Cassandra and Hbase.
Knowledge in Cassandra read and write paths and internal architecture.
Developed various complex data flows in Apache Nifi.
Worked on large datasets to generate insights by using Splunk.
Experience in fetching data into Hadoop Datalake from various databases like MySQL, Oracle using Sqoop.
Hands on Evaluation of ETL (Talend) and OLAP tools and recommend the most suitable solutions based on business requirements.
Experience in generating reports using Tableau by connecting to Hive.
Actively involved in Apache Solr integration, enhancing Solr performance and migrating from REST to Solr.
Involved in testing the AWS Redshift to connecting with SQL database for testing and storing data in POC.
Knowledge on AWS (Amazon EC2) Hadoop distribution.
Experience in using Kerberose for authenticating the end users using hadoop in a secure mode.
Experience in working with file formats like Parquet, Avro, RC File, Sequence Files, and JSON Record etc.
Performed testing of MapReduce jobs using MRUnit.
Excellent knowledge on UNIX and Shell scripting.
Expertise in design and development of Web Applications involving J2EE technologies with Java, Spring, EJB, Hibernate, Servlets, JSP, Struts, Web Services, XML, JMS, JDBC and etc.
Extensive experience in using Relational databases like Oracle, SQL Server and MySQL.
Experience in working with different Hadoop distributions like CDH and Hortonworks.
Experience in using build management tools like Maven and Ant.
Expertise in using Tomcat server and also application servers like JBoss and WebLogic.
Experience in all phases of software development life cycle.
Support development, testing, and operations teams during new system deployments.

TECHNICAL SKILLS

Programming Languages: C, C++, Java, Python and Scala.

Hadoop Eco System: HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Flume, ZookeeperOozie, Storm, Spark, Kafka, Impala.

Databases: MySQL, Oracle, SQL Server.

NoSQL Databases: HBase, Cassandra.

Operating Systems: Windows XP/7/8, 10, Linux Red Hat/Ubuntu/CentOS.

Cluster Monitoring Tools: Hortonworks, Cloudera Manager.

Scripting Languages: HTML, CSS, JavaScript, XML.

Tools:, Technologies: Servlets, Struts, JDBC, JSP, Web Services, Spring, Hibernate.

Web/ App Servers: Apache Tomcat Server, JBoss, WebLogic.

IDEs: Eclipse, Microsoft Visual Studio, Net Beans.

Business Intelligence Tools: Tableau.

PROFESSIONAL EXPERIENCE

Confidential - Wilmington, DE

Hadoop Developer

Responsibilities:

Analyzing inbound and outbound data delivery requirements, design and implement data extraction, transformation and load from AWS S3 to Hadoop based Enterprise Data Hub and vice versa.
Data cleansing to remove the header lines and HTML tags from the data received from Salesforce Team.
Designing parameter files as per business requirements, creating Hive external metadata tables and use them to drive bash scripts that perform data ingestion and data delivery.
Implement partitioning in Hive by creating partitions based on date for the incremental data loads.
Creating Impala tables for faster querying of the data as Impala is Massively parallel processing SQL query engine.
Optimizing Impala query performance by using compute stats and metadata refresh wherever required.
Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDs andScala.
Used Spark Data frame APIs to ingest data from HDFS to S3.
Developing UNIX shell scripts for Oozie wokflows to automate the process of data loading.
Developing test cases for inbound and outbound data flows and performing data quality check on various data sets.
Involve in Ad Hoc stand up and architecture meetings to set up daily priorities and track the status of work as a part of highly agile work environment.
Technical Documentation of the project which include Standard Operating Procedure, Low Level Design and High Level Design Documents.

Environment: CDH 5.5, HDFS, Yarn, Hive, Oozie, Apache Impala, Spark, Zookeeper, Parquet, UNIX, AWS S3.

Confidential - Lyndhurst, NJ

Hadoop Developer

Responsibilities:

Involved in data ingestion into HDFS using Sqoop from variety of sources using the connectors like jdbc and import parameters
Developed MapReduce programs using Java to parse the raw data, populate staging tables and store the refined data in partitioned tables in HDFS.
Transformed raw data from several data sources into baseline data by developing Pig scripts and loaded the data into HBase tables.
Designed HBase schema to avoid Hot spotting and exposed the data from HBase tables to REST API on UI.
Helped market analysts by creating Hive queries to spot the emerging trends by comparing fresh data with HDFS reference tables and historical metrics.
Worked on Creating Kafkatopics, partitions, writing custom partitioner classes.
Used Kafka to store events from various system and processed using Spark Streaming to perform near real time analytics.
Defined the job flows in Oozie to automate the process of data loading into the HDFS and Pig.
Involved in creating POCs to ingest and process streaming data using Spark streaming and Kafka.
Performed various performance optimizations like using distributed cache for small datasets, Partitions, Bucketing in Hive and Reduce Side joins in Mapreduce.
Worked onTalendETL to load data from various sources to Oracle DB. Used tmap, treplicate, tfilterrow, tsort and various other features inTalend
Involved in verifying cleaned data using Talend tool with other department.

Environment: HDFS, Mapreduce, Pig, Hive, HBase, Spark, Scala, Sqoop, Oozie, Kafka, Talend, Linux, Java, Maven, GIT, Jenkins.

Confidential - Naples, FL

Hadoop Developer

Responsibilities:

Involved in Developing Hive scripts to parse the raw data, populate staging tables and store the refined data in partitioned tables in the Hive.
Developed Hive UDFs (User Defined Functions) where the functionality is too complex, using Python and Java languages.
Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
Wrote script for generating Flume config files for different logs types and for different environments.
Designed and implemented a Cassandra NoSQL based database and associated RESTful web service that persists high-volume user profile data for vertical teams.
Did an assessment on the Cassandra clusters for performance improvements and to knock out the existing time outs.
Pre-processed large sets of structured and semi-structured data, with different formatslike Text Files, Avro, Sequence Files, and JSON Record.
Worked on POCs and involved in fixing performance & storage issues using different Compression Techniques, Custom Combiners and Custom Partitioners.
Used MRUnit for testing raw data and executed performance scripts.
Involved in automation of Hadoop jobs using Oozie workflows and coordinators.
Assisted with data capacity planning and node forecasting.
Analyzed Migration plans for various versions of Cloudera Distribution of ApacheHadoopto draw the impact analysis, and proposed the mitigation plan for the same.

Environment: Hive, Flume, Oozie, Cassandra, MRUnit, Java, Python, UNIX, Cloudera, Maven, GIT.

Confidential

Java Developer

Responsibilities:

Involved in Requirement analysis and design, development of the application using Java Technologies.
Developed the login screen so that the application can be accessed only by authorized and authenticated administrators.
Used HTML, CSS, JSP's to design and develop front end and used Java Script to perform user validation.
Performed Designing, developing, and configuring server side J2EE components like EJB, Java Beans, and Servlets.
Involved in Creating tables, functions, triggers, sequences and stored procedures in PL/SQL.
Implemented business logic by developing Session Beans.
Used Hibernate as the ORM and PL/SQL for handling database processing.
Used JDBC-API to communicate with the Database.
Developed application using Waterfall model software methodology.
Involved In technical documentation of project.

Environment: Java, HTML, CSS, JSP, Servlets, EJB, JDBC, Hibernate, PL/SQL, Oracle8i.

Confidential

Associate Java Developer

Responsibilities:

Used struts MVC Architecture to develop the modules.
Developed The UI using JavaScript, JSP, HTML, and CSS for interactive cross browser functionality and complex user interface.
Used Servlets and Session beans for creating Business Logic and deployed them on Weblogic server.
Application design using MVC struts framework.
Developed complex SQL Queries and PL/SQL Stored procedures.
Prepared the Functional, Design and Test case specifications.
Performed some database side validations by writing Stored Procedures in Oracle.
Used JUNIT for unit testing of the application.
Provided Technical support for production environments and resolving the issues.

Environment: Java, HTML, Java Script, CSS, Oracle, JDBC, Swing and Eclipse.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Wilmington, DE

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship