We provide IT Staff Augmentation Services!

Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Phoenix, Az

SUMMARY:

  • 8+ years of software development experience which includes 5 years on Big Data Technologies like Hadoop, Pig, Sqoop, Hive, HBase, Flume, Spark.
  • Excellent knowledge on Hadoop Architecture, Ecosystem, MRV1 and MRV2, HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming paradigm.
  • Experience in retrieving data from databases like MySQL , Teradata , DB2 and Oracle into HDFS using Sqoop and ingesting them into HBase and Cassandra .
  • Worked on NoSQL databases such as HBase, Cassandra, MongoDB and its Integration with Hadoop cluster.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database.
  • Expertise in working with Hive and using Partitioning , Bucketing and writing, optimizing Hive QL queries.
  • Written custom udf's in hive and pig to solve certain business requirements.
  • Experience with join patterns and implemented Map side and Reduce side joins.
  • Involved in converting Hive/SQL queries into Spark transformations using Python and Scala .
  • Good experience in integrating Kafka - Spark streaming for high efficiency throughput and reliability.
  • Experience in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as web server etc.
  • Implemented a Real-time framework to capture Streaming data and store in HDFS using Kafka, Spark.
  • Developed Kafka consumer component for near real-time and Real-Time data processing in Java and Scala.
  • Deep understanding of performance tuning, partitioning for optimizing spark applications.
  • Involved in creating, transforming and actions on RDDs, Data Frames, Datasets using Scala, Python and integrating the applications to Spark framework using SBT and Maven build automation tools.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Experience in job workflow scheduling and monitoring tools like Oozie.
  • NiFi is used for designing the workflow graphically, Created DAG using NiFi and used for further debugging.
  • Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Hands on experience on handling different file formats like Sequential files, CSV , XML , JSON , AVRO and PARQUET .
  • Strong experience in RDBMS technologies like SQL , Stored Procedures , Triggers , Functions .
  • Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
  • Experience in Data modeling and connecting Cassandra from Spark and saving summarized data frame to Cassandra .
  • Implemented ETL process to load data from different sources, perform data mining and analyze data using visualization/reporting tools to leverage the performance of Open Stack.
  • Experience in working with different scripting technologies like Python, UNIX shell scripts.
  • Expertise in design and development of various web and enterprise applications using various technologies like JSP, Servlets, Struts, Hibernate, SpringMVC, JDBC, SpringBoot, JMS, JSF, XML, AJAX, SOAP and RESTful Web Services.
  • Strong understanding of Software Development Lifecycle (SDLC) and various methodologies like Waterfall and Agile.

TECHNICAL SKILLS:

Big Data Technologies: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Hue, Ambari, Zookeeper, Kafka, Apache Spark(Spark SQL, Spark Streaming)

Hadoop Distributions: Cloudera, HortonWorks, AWS EMR

Languages: Java, PL/SQL, Python, PigLatin, HiveQL, Scala

IDE Tools: Eclipse, NetBeans, IntelliJ, RAD.

Web Technologies: HTML, CSS, JavaScript, XML, RESTful.

Operating Systems: Windows, UNIX, LINUX, Ubuntu, CentOS.

Reporting Tools/ETL Tools: Tableau, Power view for Microsoft Excel.

Methodologies: Waterfall, Agile, TDD.

Databases: Oracle, SQL Server, MySQL, MS Access, Postgress, NoSQL Database (HBase, Cassandra, MongoDB).

Build Automation tools: Ant, Maven, Gradle.

PROFESSIONAL EXPERIENCE:

Confidential, Phoenix, AZ.

Hadoop Developer

Responsibilities:

  • Responsible for building customer centric Data Lake in Hadoop which would serve as the Analysis and Data science Platform.
  • Responsible for building scalable distributed data solutions on Cloudera distributed Hadoop.
  • Used Sqoop , Kafka for migrating data and incremental import into HDFS and Hive from various other data sources.
  • Modeled and build Hive tables to combine and store structured data and unstructured sources of data for best possible access.
  • Integrated Cassandra file system to Hadoop using Map Reduce to perform analytics on Cassandra data.
  • Used Cassandra to store billions of records to enable faster & efficient querying, aggregates & reporting.
  • Implemented Real time analytics on Cassandra data using thrift API.
  • Developed Spark Jobs using Scala and Python ( Pyspark ) APIs.
  • Migrated SAS and Python programs into Spark Jobs for Various Processes.
  • Involved in Job management and Developed job processing scripts using Oozie workflow.
  • Implementing optimization techniques in hive like partitioning tables, De-normalizing data & Bucketing.
  • Used Spark SQL to create structured data by using data frame and querying from other data sources and Hive.
  • To support Data Scientists with Data and Platform Setup for their analysis and finally migrating their finished product to Production.
  • Worked on cleansing and extracting meaningful information from click stream Data using Spark and Hive .
  • Involved in performance tuning of Spark Applications for setting right level of Parallelism and memory tuning.
  • Used optimization techniques in spark like Data Serialization and Broadcasting.
  • Optimizing of existing algorithms in Hadoop using Spark , Spark - SQL and Data Frames .
  • Implemented POC in persisting click stream data with Apache Kafka
  • Implemented data pipelines to move processed data from Hadoop to MPP, RDBMS and No sql Databases.
  • Followed Agile & Scrum principles in developing the project.

Environment: Hadoop (2.6.5), HDFS, Spark (2.0.2), Spark-sql, Sqoop, Hive, Apache Kafka (0.10.1.0), Python, Scala(2.11), Pyspark, Cassandra and Oozie, Cloudera(CDH5).

Confidential, Jaksonville, Florida.

Hadoop Developer

Responsibilities:

  • Extensively involved in Design phase and delivered Design documents.
  • Worked on Horton works version HDP 2.3
  • Worked with the business team to gather the requirements and participated in the Agile planning meetings to finalize the scope.
  • Importing and exporting data into HDFS and Hive using Sqoop and Migration of huge amounts of data from different databases (i.e. Oracle, SQL Server) to Hadoop.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Load and Transform large sets of structured and semi structured data. Responsible to manage data coming from different sources.
  • Developed data pipeline using Flume , Sqoop , Pig and MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in creating Hive Tables, loading data and writing hive queries.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Developed multiple Map reduce jobs in Pig and Hive for data cleaning and pre-processing.
  • Involved in defining job flows. Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Managed and reviewed the Hadoop Log files.
  • Developed complex hive queries using Joins and partitions for huge data sets as per business requirements and load the filtered data from source to edge node hive tables and validate the data.
  • Performed Bucketing and Partitioning of data using apache Hive which saves the processing time and generating proper sample insights.
  • Moved all log/text files generated by various products into HDFS location.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.

Environment: Horton Works, Hadoop, Eclipse, java, Sqoop, Pig, Oozie, Hive, Flume, MySQL, Oracle DB.

Confidential - Mobile, AL.

Hadoop Developer

Responsibilities:

  • Gathered business requirements in meetings for successful implementation and POC (Proof-of-Concept) of Hadoop Cluster.
  • Importing data in regular basis using Sqoop into the Hive partition and controlled work flow by using apache Oozie .
  • Developed Sqoop Jobs to both import data into HDFS from Relational Database Management System like Oracle & DB2 and export data from HDFS to Oracle.
  • Developing HQL queries to implement the select, insert, update and operations to the database by creating HQL named queries.
  • Involved in data extraction that may include analyzing, reviewing, modeling based on requirements using higher level tools such as Hive and Impala .
  • Experience in migrating HiveQL into Impala to minimize query response time.
  • Involving in creating Hive tables, loading with data and writing hive queries.
  • Developed Pig functions to preprocess the data for analysis.
  • Created HBase tables to store all data.
  • Deployed the Hbase cluster in cloud ( AWS ) environment with scalable nodes as per the business requirement.
  • Analyzed identified defects and its root cause and recommended course of actions.
  • Loaded data into Hive Tables from Hadoop Distributed File System (HDFS) to provide SQL-like access on Hadoop data.
  • Worked on streaming the analyzed data to the existing relartional databases using Sqoop for making it available for visualization and report generation by the BI team.
  • Generated reports and did predictions using BI Tool called Tableau , Integrated data by using Talend.
  • Deployed the Hbase cluster in cloud (Amazon AWS ) environment with scalable nodes as per the business requirement.

Environment: HDFS, Hive, MapReduce, Sqoop, Impala, Java, Pig, SQL Server, HBase, Oracle and Tableau, AWS.

Confidential

Java Developer

Responsibilities:

  • Designed different design specifications for application development that includes front-end, back-end using design patterns.
  • Developed proto-type test screens in HTML and JavaScript .
  • Involved in developing JSP for client data presentation and, data validation on the client side with in the forms.
  • Developed the application by using the Spring MVC framework.
  • Collection framework used to transfer objects between the different layers of the application.
  • Spring IOC being used to inject the parameter values for the Dynamic parameters.
  • Actively involved in code review and bug fixing for improving the performance.
  • Documented application for its functionality and its enhanced features.
  • Created connection through JDBC and used JDBC statements to call stored procedures.
  • Created UML diagrams like use cases, class diagrams, interaction diagrams, and activity diagrams.
  • Extensively worked on User Interface for few modules using JSPs , JavaScript and Ajax .
  • Wrote complex SQL queries and stored procedures.
  • Developed the XML Schema and Web services for the data maintenance and structures.
  • Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for PostgreSQL database.
  • Wrote test cases in JUnit for unit testing of classes.
  • Involved in creating templates and screens in HTML and JavaScript.

Environment: Java, Eclipse, Java SDK 1.6, XML, JavaScript, HTML/DHTML

Confidential

Jr. Java Developer

Responsibilities:

  • Involved in various phases of Software Development Life Cycle (SDLC) of the application like Requirement gathering, design, development and documentation.
  • The application is designed using J2EE design patterns and technologies based on MVC architecture.
  • Involved in designing the user interfaces using HTML, CSS and JavaScript for client-side validation.
  • Developed Custom tags, JSTL to support custom User Interfaces.
  • Involved in writing unit testing for doing positive and negative test cases.
  • Developed the Maven scripts for preparing WAR files used to deploy J2EE components.
  • Created tables, views, triggers, stored procedures on MySQL server for data manipulation and retrieval.
  • Used JDBC to invoke Stored Procedures and for database connectivity to database server.
  • Used Log4J to capture the log that includes runtime exceptions.
  • Involved in Bug fixing and functionality enhancements.
  • Developed the project using Agile methodology.

Environment: J2EE, Java(SE 6), UNIX, red-hat, Putty, MVC, JSP, JDBC, Eclipse IDE, Apache Tomcat(6.0.26), CSS, HTML, JavaScript, SQL Server(10.50).

We'd love your feedback!