We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Washington, DC

SUMMARY

  • Overall 6+ years of professional experience in IT in BIGDATA using HADOOP and experienced in building distributed applications, high quality software, object - oriented methods, project leadership, and rapid reliable development.
  • Good knowledge on Hadoop Architecture and its components such as HDFS, MapReduce, JobTracker, TaskTracker, NameNode, DataNode.
  • Have solid Background working on DBMS technologies such as Oracle, MY SQL, data warehousing architectures and performed migration from different databases SQL server, Oracle, MYSQL to Hadoop .
  • Extensive knowledge in J2EE technologies such as Object Oriented Programming techniques (OOPS), JSP, and JDBC.
  • In depth and extensive knowledge of analyzing data using HiveQL, PigLatin, HBase and custom MapReduce programs in Java.
  • Good understanding of Zookeeper for monitoring and managing Hadoop jobs.
  • Having extensive knowledge on RDBMS such as Oracle, MicrosoftSQLServer, MYSQL.
  • Experienced in migrating ETL transformations using Pig Latin Scripts, transformations, join operations.
  • Hands on experience in different software development approaches such as Spiral, Waterfall & Agile iterative models.
  • Extending Pig and Hive core functionality by writing customized User Defined Functions for analysis of data, file processing, by running Pig Latin Scripts.
  • Good understanding of NoSQL databases such as HBase, Cassandra and MongoDB.
  • Experience with operating systems: Linux, RedHat, and UNIX.
  • Supported MapReduce Programs running on the cluster and wrote custom MapReduce Scripts for Data Processing in Java.
  • Experience in Developing Spark jobs using Scala in test environment for faster data processing and used SparkSQL for querying.
  • Proficient in configuring Zookeeper, Cassandra&Flume to the existing Hadoop cluster.
  • Expertise in Web technologies using HTML, XML, JDBC, JSP, JavaScript, AJAX, SOAP.
  • Extensive experience in different IDEs like Eclipse, NetBeans.
  • Extensive experience in using MVC architecture, Struts, Hibernate for developing web applications using Java, JSPs, JavaScript, HTML, jQuery, AJAX, XML and JSON.
  • Excellent Java development skills using J2EE, spring, J2SE, Servlets, JUnit, MRUnit, JSP, JDBC.
  • Excellent global exposure to various work cultures and client interaction with diverse teams.

TECHNICAL SKILLS

Programming Languages: Java, J2EE, C, SQL/PLSQL, PIG LATIN, Scala, HTML, XML.

Web Technologies: JDBC, JSP, JavaScript, AJAX, SOAP, HTML, JSP, JQuery, CSS, XML.

Hadoop/Big Data: Map Reduce, Spark, SparkSQL, PySpark, Spark, Pig, Hive, Sqoop, HBase, Flume, Kafka Cassandra, Yarn, Oozie, Zookeeper, ElasticSearch.

RDBMS Languages: MySQL, PL/SQL, Mongo DB, HBase, Cassandra.

Cloud: Azure, AWS

NoSQL: MongoDB, HBase, Apache Cassandra.

Tools: /IDES: .Net Beans, Eclipse, GIT, Putty.

Operating System: Linux, Windows, Ubuntu, Red Hat Linux, UNIX.

Methodologies: Agile, Waterfall model.

PROFESSIONAL EXPERIENCE

Sr. Hadoop Developer

Confidential, Washington DC

Responsibilities:

  • Gathered User requirements and designed technical and functional specifications.
  • Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, PIG, HBase, Zookeeper and Sqoop.
  • Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into Hive tables, which are partitioned.
  • Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
  • Imported and exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked on importing and exporting data into HDFS and Hive using Sqoop.
  • Used Flume to handle streaming data and loaded the data into Hadoop cluster.
  • Developed and executed hive queries for de-normalizing the data.
  • Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analysis.
  • Responsible for executing hive queries using Hive Command Line, Web GUI HUE and Impala to read, write and query the data into HBase.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
  • Worked on Cluster of size 130 nodes.
  • Designed Apache Airflow entity resolution module for data ingestion into Microsoft SQL Server.
  • Developed batch processing pipeline to process data using python and airflow. Scheduled spark jobs using airflow.
  • Involved in writing, testing, and running MapReduce pipelines using Apache Crunch.
  • Managed, reviewed Hadoop log file, and worked in analysing SQL scripts and designed the solution for the process using Spark.
  • Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.

Environment: Hadoop, YARN, HBase, Teradata, D2, NoSQL, Kafka, Python, Zookeeper, Oozie, Tableau, Apache Crunch, Apache Storm, MySQL, SQL Server, jQuery, JavaScript, HTML, Ajax and CSS.

Hadoop Developer

Confidential, Boca Raton, FL

Responsibilities:

  • Analyzed the weblog data using the HiveQL, integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as MapReduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Transferred purchase transaction details from legacy systems to HDFS.
  • Developed Java MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
  • Developed PIG UDF'S for manipulating the data as per the business requirements and worked on developing custom PIG Loaders.
  • Collected and aggregated large amounts of weblog data from different sources such as web servers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
  • Worked on the Ingestion of Files into HDFS from remote systems using MFT (Managed File Transfer)
  • Experience in monitoring and managing Cassandra cluster.
  • Installed and configured Flume, Hive, Pig, SqoopandOozie on the Hadoop cluster.
  • Wrote the MapReduce jobs to parse the weblogs which are stored in HDFS.
  • Developed the services to run the MapReduce jobs as per the requirement basis.
  • Importing and exporting data into HDFS from Oracle 10.2 database and vice versa using SQOOP.
  • Extracted files from NoSQL database, Cassandra through Sqoop and placed in HDFS for processing.
  • Responsible to manage data coming from different sources.
  • Analyzed the data using the Pig to extract number of unique patients per day and most purchased medicine.
  • Wrote UDF's for Hive and Pig that helped spot market trends.
  • Good knowledge in running Hadoop streaming jobs to process terabytes of xml format data.
  • Analyzed the Functional Specifications.
  • Implemented the workflows using Apache Oozie framework to automate tasks.

Environment: Hadoop, HDFS, pig, Hive, Tez, Accumulo, Flume, Sqoop, Oozie, Cassandra.

Hadoop Developer

Confidential, Princeton, NJ

Responsibilities:

  • Analysed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
  • Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Scoop.
  • Hands on experience in loading data from UNIX file system and Teradata to HDFS.
  • Install, configure, and operate Apache stack i.e. Hive, HBase, Pig, Sqoop, Zookeeper, Oozie, Flume and Mahout on Hadoop cluster.
  • Created a high-level design for the Data Ingestion and data extraction Module, enhancement of Hadoop Map-Reduce job which joins the incoming slices of data and pick only the fields needed for further processing.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts
  • Tested raw data and executed performance scripts.
  • Worked with NoSQL database HBase to create tables and store data.
  • Developed and involved in the industry specific UDF (user defined functions)
  • Used Flume to collect, aggregate, and store the web log data from various sources like web servers, mobile and network devices and pushed to HDFS.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map reduce Hive, Pig, and Sqoop.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Resolve issues, answer questions, and provide support for users or clients on a day to day basis related to Hadoop and its ecosystem including the HAWQ database.
  • Install, configure, and operate data integration and analytic tools i.e. Informatica, Chorus, SQLFire, GemFireXD for business needs.

Environment: Cloudera, MapReduce, Hive, Solr, Pig, HDFS, HBase, Flume, Sqoop, Zookeeper,Python, Flat files, AWS, Teradata, Unix/Linux.

Java Developer

Confidential

Responsibilities:

  • Design and develop Servlets, Session and Entity Beans to implement business logic and deploy them on the JBoss Application Server.
  • Developed many JSP pages, used JavaScript for client side validation.
  • MVC framework for developing J2EE based web application.
  • Involved in all the phases of SDLC including Requirements Collection, Design and Analysis of the Customer specifications, Development and Customization of the Application.
  • Developed the User Interface Screens for presentation using Ajax, JSP and HTML.
  • Used the JDBC for data retrieval from the database for various inquiries.
  • Good experience in writing the stored procedures and using JDBC for the database interaction.
  • Developed stored procedures in PL/SQL for Oracle 10g.
  • Eclipse is used as an IDE tool to write and debug the application code, SQL developer is used to test and run the SQL statements.
  • Implemented client side and server side data validations using the JavaScript.

Environment: Java, Eclipse Galileo, HTML4.0, JavaScript, SQL, PL/SQL, CSS, JDBC, JBoss 4.0, Servlets 2.0, JSP 1.0, Oracle.

We'd love your feedback!