We provide IT Staff Augmentation Services!

Jr. Hadoop Developer Resume

4.00/5 (Submit Your Rating)

VA

SUMMARY:

  • 3 years of professional experience involving project development, implementation, deployment and maintenance using Java/J2EE and Big Data related technologies.
  • Hadoop Developer with 2+ years of working experience in designing and implementing complete end - to-end Hadoop based data analytical solutions using HDFS, HBase-Hive Integration, Spark, Yarn, PIG, HIVE, Sqoop, HBase, Impala, R, oozie etc.
  • Good experience in creating data ingestion pipelines, data transformations, data management, data governance and real time streaming at an enterprise level.
  • Profound experience in creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
  • Worked on Oozie to manage and schedule the jobs on Hadoop cluster
  • Experience in shell and python scripting languages
  • Experience on administration and management of large-scale Hadoop production clusters
  • Good Hands-on full life cycle implementation using CDH (Cloudera) and HDP (Hortonworks Data Platform) distributions.
  • Hands on experience in application development using Java, RDBMS and Linux Shell Scripting.
  • Experience in analyzing data using HiveQL and custom Map Reduce programs in Java.
  • In depth understanding of Hadoop Architecture and its various components such as Resource Manager, Application Master, Name Node, Data Node, HBase design principles etc., created in Confluence that integrate with JIRA projects.
  • Strong Knowledge on Architecture of Distributed systems and Parallel processing, In-depth understanding of MapReduce programing paradigm and Spark execution framework.
  • Profound understanding of Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop.
  • Extensive knowledge in programming with Resilient Distributed Datasets (RDDs).
  • Assisted in Cluster maintenance, Cluster Monitoring, Managing and Reviewing data backups and log files.
  • Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 and S3.
  • Experience writing Shell scripts in Linux OS and integrating them with other solutions.
  • Strong Experience in working with Databases like Oracle 11g/10g/9i, DB2, SQL Server 2008 and MySQL and proficiency in writing complex SQL queries.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Excellent communication, interpersonal and analytical skills and a highly motivated team player with the ability to work independently.
  • Ability to learn and adapt quickly to the emerging new technologies and paradigms.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase, Hue, Zookeeper.

Programming Languages: Java, PL/SQL, Pig Latin, Python, HiveQL, Scala, SQLJava/J2EE & Web

Technologies: J2EE, EJB, JSF, Servlets, JSP, CSS, HTML, XHTML, XML, Angular JS, AJAX, JavaScript, JQuery.

Development Tools: Eclipse, Net Beans, SVN, Git, Maven

Databases: Oracle 11g/10g/9i, Microsoft Access, MS SQL

No SQL Databases: Apache Cassandra, Mongo DB, HBase

Frameworks: Struts, Hibernate, And Spring MVC.

Web/Application servers: WebLogic, WebSphere, Apache Tomcat

Frameworks: MVC, Struts, Spring, Hibernate.

Distributed platforms: Hortonworks, Cloudera.

Operating Systems: UNIX, Ubuntu Linux and Windows 00/XP/Vista/7/8

Network protocols: TCP/IP fundamentals, LAN and WAN

PROFESSIONAL EXPERIENCE:

Confidential

Jr. Hadoop Developer

Responsibilities:

  • Worked on requirement gathering, analysis and translated business requirements into technical design with Hadoop Ecosystem.
  • Worked collaboratively to manage build outs of large data clusters and real time streaming with Spark.
  • Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Developed various complex Hive Quires as per business logic.
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive queries.
  • Created big data workflows to ingest the data from various sources to Hadoop using OOZIE and these workflows comprises of heterogeneous jobs like Hive, SQOOP and Python Script.
  • Created custom python/shell scripts to import data via SQOOP from various SQL databases such as Teradata, SQL Server, and Oracle.
  • Maintained Bit bucket repositories for DevOps environment: automation code and configuration
  • Created Technical document for integration b\w Eclipse and Bit bucket, and Source Tree and Bit bucket
  • Created JIRA projects integrating workflows, screen schemes, field configuration schemes, permission schemes, project roles, and notification schemes.
  • Built SPARK pipelines and work flows using scala
  • Experienced in Agile processes and delivered quality solutions in regular sprints.
  • Responsible in handling Streaming data from web server console logs.
  • Designed docs and specs for the near real-time data analytics using Hadoop and HBase.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Used java to develop Restful API for database Utility Project. Responsible for performing extensive data validation using Hive.
  • Optimized and tuned existing ETL scripts (SQL and PL/SQL).
  • Moving data from HDFS to RDBMS and vice-versa using SQOOP.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Work with cross functional consulting teams within the data science and analytics team to design, develop, and execute solutions to derive business insights and solve clients' operational and strategic problems.
  • Monitoring the ticketing tool for any tickets indicating an issue/incident reported and resolving with the appropriate fix in the project
  • Developed a fully automated continuous integration system using Git, Bitbucket, MySQL and custom tools developed in Python and Bash
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets.
  • Used Cloud watch logs to move app logs to S3. Create alarms based on exceptions raised by applications.
  • Experienced in writing HIVE JOIN Queries.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
  • Continuous monitoring and managing the Hadoop/spark cluster using Cloudera Manage.
  • Experience in developing custom UDF's for Hive.

Environment: Java, Scala 2.10.5, Apache Spark 1.6.0, CDH 5.8.2, Spring 3.0.4, Hive, HDFS, YARN, MapReduce, Sqoop 1.4.3, Flume, UNIX Shell Scripting, Python 2.6, Azure, AWS, Kafka, Bitbucket, Jira, Oracle 11g, Sha, HBase, MongoDB.

Confidential, VA

Hadoop Intern

Responsibilities:

  • Provided a solution using Hive, Sqoop (to export/ import data), for faster data load by replacing the traditional ETL process with HDFS for loading data to target tables.
  • Maintaining and Monitoring Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the Hive QL queries
  • Designed Pig Latin scripts to sort, group, join and filter the data as part of data transformation as per the business requirements.
  • Data files were merged and loaded into HDFS using java code and tracking history related to merge files were maintained in HBase
  • Creating Hive tables and working on them using Hiveql.
  • Written Apache PIG scripts to process the HDFS data.
  • Created Java UDFs in PIG and HIVE.
  • Involved in analysis of the specifications from the client and actively participated in SRS Documentation.
  • Developing Scripts and Scheduled Autosy’s Jobs to filter the data.
  • Involved monitoring Auto Sys’s file watcher jobs and testing data for each transaction and verified data weather it ran properly or not.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
  • Implemented Object Relational mapping in the persistence layer using Hibernate Framework in conjunction with Spring Functionality.
  • Involved in planning process of iterations under the Agile Scrum methodology.
  • Involved in writing PL/SQL, SQL queries.
  • Involved in testing the Business Logic layer and Data Access layer using JUnit.
  • Used Oracle DB for writing SQL scripts, PL/SQL code for procedures and functions.
  • Wrote JUnit test cases to test the functionality of each method in the DAO layer. Configured and deployed the WebSphere application Server.
  • Prepared technical reports and documentation manuals for efficient program development.

Environment: Java, HDP-2.2 YARN cluster, HDFS, Map Reduce, Apache Hive, Apache Pig, HBase, Sqoop, XML. Oracle8i, UNIX.

Confidential

Software Trainee

Responsibilities:

  • Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, analysis, design and development.
  • Analyze large datasets to provide strategic direction to the company.
  • Collected the logs from the physical machines and integrated into HDFS using Flume.
  • Involved in analyzing the system and business.
  • Developed SQL statements to improve back-end communications.
  • Loaded unstructured data into Hadoop File System (HDFS).
  • Created reports and dashboards using structured and unstructured data.
  • Involved in importing data from MySQL to HDFS using SQOOP.
  • Involved in writing Hive queries to load and process data in Hadoop File System.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Involved in working with Impala for data retrieval process.
  • Sentiment Analysis on reviews of the products on the client's website.
  • Developed custom Map Reduce programs to extract the required data from the logs.
  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.

We'd love your feedback!