Hadoop Developer Resume
Wilmington, DE
SUMMARY
- Around 8 years of professional experience in developing, implementing, configuring, testing various systems using Hadoop technologies.
- Good knowledge of hadoop daemons like NameNode, Secondary NameNode, DataNode, Job Tracker and Task Tracker, Yarn.
- Excellent understanding of distributed storage systems like HDFS and batch processing systems like MapReduce and Yarn.
- Experience in developing efficient MapReduce programs for analyzing the large data sets according to the business requirement.
- Experience in using Hive QL for analyzing, querying and summarizing huge data sets.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre - aggregation.
- Developed User Defined Functions (UDFs) for Apache Pig and Hive using Python and Java languages.
- Queried both Managed and External tables created by Hive using Impala.
- Experience in loading logs from multiple sources directly into HDFS using Flume.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Experience in processing of real-time data using Spark and Scala.
- Experienced with SparkContext, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Good knowledge of integrating Spark Streaming with Kafka for real time processing of streaming data.
- Hands-on experience with message brokers such as Apache Kafka and RabbitMQ.
- Worked on Configuring Zookeeper, Kafkacluster.
- Experience in large-scale streaming data analytics using Storm.
- Experience in NoSQL databases like Cassandra and Hbase.
- Knowledge in Cassandra read and write paths and internal architecture.
- Developed various complex data flows in Apache Nifi.
- Worked on large datasets to generate insights by using Splunk.
- Experience in fetching data into Hadoop Datalake from various databases like MySQL, Oracle using Sqoop.
- Hands on Evaluation of ETL (Talend) and OLAP tools and recommend the most suitable solutions based on business requirements.
- Experience in generating reports using Tableau by connecting to Hive.
- Actively involved in Apache Solr integration, enhancing Solr performance and migrating from REST to Solr.
- Involved in testing the AWS Redshift to connecting with SQL database for testing and storing data in POC.
- Knowledge on AWS (Amazon EC2) Hadoop distribution.
- Experience in using Kerberose for authenticating the end users using hadoop in a secure mode.
- Experience in working with file formats like Parquet, Avro, RC File, Sequence Files, and JSON Record etc.
- Performed testing of MapReduce jobs using MRUnit.
- Excellent knowledge on UNIX and Shell scripting.
- Expertise in design and development of Web Applications involving J2EE technologies with Java, Spring, EJB, Hibernate, Servlets, JSP, Struts, Web Services, XML, JMS, JDBC and etc.
- Extensive experience in using Relational databases like Oracle, SQL Server and MySQL.
- Experience in working with different Hadoop distributions like CDH and Hortonworks.
- Experience in using build management tools like Maven and Ant.
- Expertise in using Tomcat server and also application servers like JBoss and WebLogic.
- Experience in all phases of software development life cycle.
- Support development, testing, and operations teams during new system deployments.
TECHNICAL SKILLS
Programming Languages: C, C++, Java, Python and Scala.
Hadoop Eco System: HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Flume, ZookeeperOozie, Storm, Spark, Kafka, Impala.
Databases: MySQL, Oracle, SQL Server.
NoSQL Databases: HBase, Cassandra.
Operating Systems: Windows XP/7/8, 10, Linux Red Hat/Ubuntu/CentOS.
Cluster Monitoring Tools: Hortonworks, Cloudera Manager.
Scripting Languages: HTML, CSS, JavaScript, XML.
Tools:, Technologies: Servlets, Struts, JDBC, JSP, Web Services, Spring, Hibernate.
Web/ App Servers: Apache Tomcat Server, JBoss, WebLogic.
IDEs: Eclipse, Microsoft Visual Studio, Net Beans.
Business Intelligence Tools: Tableau.
PROFESSIONAL EXPERIENCE
Confidential - Wilmington, DE
Hadoop Developer
Responsibilities:
- Analyzing inbound and outbound data delivery requirements, design and implement data extraction, transformation and load from AWS S3 to Hadoop based Enterprise Data Hub and vice versa.
- Data cleansing to remove the header lines and HTML tags from the data received from Salesforce Team.
- Designing parameter files as per business requirements, creating Hive external metadata tables and use them to drive bash scripts that perform data ingestion and data delivery.
- Implement partitioning in Hive by creating partitions based on date for the incremental data loads.
- Creating Impala tables for faster querying of the data as Impala is Massively parallel processing SQL query engine.
- Optimizing Impala query performance by using compute stats and metadata refresh wherever required.
- Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDs andScala.
- Used Spark Data frame APIs to ingest data from HDFS to S3.
- Developing UNIX shell scripts for Oozie wokflows to automate the process of data loading.
- Developing test cases for inbound and outbound data flows and performing data quality check on various data sets.
- Involve in Ad Hoc stand up and architecture meetings to set up daily priorities and track the status of work as a part of highly agile work environment.
- Technical Documentation of the project which include Standard Operating Procedure, Low Level Design and High Level Design Documents.
Environment: CDH 5.5, HDFS, Yarn, Hive, Oozie, Apache Impala, Spark, Zookeeper, Parquet, UNIX, AWS S3.
Confidential - Lyndhurst, NJ
Hadoop Developer
Responsibilities:
- Involved in data ingestion into HDFS using Sqoop from variety of sources using the connectors like jdbc and import parameters
- Developed MapReduce programs using Java to parse the raw data, populate staging tables and store the refined data in partitioned tables in HDFS.
- Transformed raw data from several data sources into baseline data by developing Pig scripts and loaded the data into HBase tables.
- Designed HBase schema to avoid Hot spotting and exposed the data from HBase tables to REST API on UI.
- Helped market analysts by creating Hive queries to spot the emerging trends by comparing fresh data with HDFS reference tables and historical metrics.
- Worked on Creating Kafkatopics, partitions, writing custom partitioner classes.
- Used Kafka to store events from various system and processed using Spark Streaming to perform near real time analytics.
- Defined the job flows in Oozie to automate the process of data loading into the HDFS and Pig.
- Involved in creating POCs to ingest and process streaming data using Spark streaming and Kafka.
- Performed various performance optimizations like using distributed cache for small datasets, Partitions, Bucketing in Hive and Reduce Side joins in Mapreduce.
- Worked onTalendETL to load data from various sources to Oracle DB. Used tmap, treplicate, tfilterrow, tsort and various other features inTalend
- Involved in verifying cleaned data using Talend tool with other department.
Environment: HDFS, Mapreduce, Pig, Hive, HBase, Spark, Scala, Sqoop, Oozie, Kafka, Talend, Linux, Java, Maven, GIT, Jenkins.
Confidential - Naples, FL
Hadoop Developer
Responsibilities:
- Involved in Developing Hive scripts to parse the raw data, populate staging tables and store the refined data in partitioned tables in the Hive.
- Developed Hive UDFs (User Defined Functions) where the functionality is too complex, using Python and Java languages.
- Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
- Wrote script for generating Flume config files for different logs types and for different environments.
- Designed and implemented a Cassandra NoSQL based database and associated RESTful web service that persists high-volume user profile data for vertical teams.
- Did an assessment on the Cassandra clusters for performance improvements and to knock out the existing time outs.
- Pre-processed large sets of structured and semi-structured data, with different formatslike Text Files, Avro, Sequence Files, and JSON Record.
- Worked on POCs and involved in fixing performance & storage issues using different Compression Techniques, Custom Combiners and Custom Partitioners.
- Used MRUnit for testing raw data and executed performance scripts.
- Involved in automation of Hadoop jobs using Oozie workflows and coordinators.
- Assisted with data capacity planning and node forecasting.
- Analyzed Migration plans for various versions of Cloudera Distribution of ApacheHadoopto draw the impact analysis, and proposed the mitigation plan for the same.
Environment: Hive, Flume, Oozie, Cassandra, MRUnit, Java, Python, UNIX, Cloudera, Maven, GIT.
Confidential
Java Developer
Responsibilities:
- Involved in Requirement analysis and design, development of the application using Java Technologies.
- Developed the login screen so that the application can be accessed only by authorized and authenticated administrators.
- Used HTML, CSS, JSP's to design and develop front end and used Java Script to perform user validation.
- Performed Designing, developing, and configuring server side J2EE components like EJB, Java Beans, and Servlets.
- Involved in Creating tables, functions, triggers, sequences and stored procedures in PL/SQL.
- Implemented business logic by developing Session Beans.
- Used Hibernate as the ORM and PL/SQL for handling database processing.
- Used JDBC-API to communicate with the Database.
- Developed application using Waterfall model software methodology.
- Involved In technical documentation of project.
Environment: Java, HTML, CSS, JSP, Servlets, EJB, JDBC, Hibernate, PL/SQL, Oracle8i.
Confidential
Associate Java Developer
Responsibilities:
- Used struts MVC Architecture to develop the modules.
- Developed The UI using JavaScript, JSP, HTML, and CSS for interactive cross browser functionality and complex user interface.
- Used Servlets and Session beans for creating Business Logic and deployed them on Weblogic server.
- Application design using MVC struts framework.
- Developed complex SQL Queries and PL/SQL Stored procedures.
- Prepared the Functional, Design and Test case specifications.
- Performed some database side validations by writing Stored Procedures in Oracle.
- Used JUNIT for unit testing of the application.
- Provided Technical support for production environments and resolving the issues.
Environment: Java, HTML, Java Script, CSS, Oracle, JDBC, Swing and Eclipse.