We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

2.00/5 (Submit Your Rating)

SUMMARY:

  • Overall 5+ years of professional experience in IT in BIGDATA using HADOOP and experienced in building distributed applications, high quality software, object - oriented methods, project leadership, and rapid reliable development.
  • Good knowledge on Hadoop Architecture and its components such as HDFS , MapReduce , JobTracker , TaskTracker , NameNode , DataNode .
  • Have solid Background working on DBMS technologies such as Oracle, MY SQL, data warehousing architectures and performed migration from different databases SQL server, Oracle, MYSQL to Hadoop.
  • Extensive knowledge in J2EE technologies such as Object Oriented Programming techniques ( OOPS ), JSP , and JDBC .
  • In depth and extensive knowledge of analyzing data using HiveQL , PigLatin , HBase and custom MapReduce programs in Java .
  • Good understanding of Zookeeper for monitoring and managing Hadoop jobs.
  • Having extensive knowledge on RDBMS such as Oracle , MicrosoftSQLServer , MYSQL .
  • Experienced in migrating ETL transformations using Pig Latin Scripts, transformations, join operations.
  • Hands on experience in different software development approaches such as Spiral, Waterfall & Agile iterative models.
  • Extending Pig and Hive core functionality by writing customized User Defined Functions for analysis of data, file processing, by running Pig Latin Scripts .
  • Good understanding of NoSQL databases such as HBase , Cassandra and MongoDB .
  • Experience with operating systems: Linux , RedHat , and UNIX .
  • Supported MapReduce Programs running on the cluster and wrote custom MapReduce Scripts for Data Processing in Java .
  • Experience in Developing Spark jobs using Scala in test environment for faster data processing and used SparkSQL for querying.
  • Proficient in configuring Zookeeper , Cassandra & Flume to the existing Hadoop cluster .
  • Expertise in Web technologies using HTML , XML , JDBC , JSP , JavaScript , AJAX , SOAP .
  • Extensive experience in different IDEs like Eclipse , NetBeans .
  • Extensive experience in using MVC architecture, Struts, Hibernate for developing web applications using Java, JSPs, JavaScript, HTML, jQuery, AJAX, XML and JSON .
  • Excellent Java development skills using J2EE, spring, J2SE, Servlets, JUnit, MRUnit, JSP, JDBC.
  • Excellent global exposure to various work cultures and client interaction with diverse teams.

TECHNICAL SKILLS:

Programming Languages: Java, J2EE, C, SQL/PLSQL, PIG LATIN, Scala, HTML, XML.

Web Technologies: JDBC, JSP, JavaScript, AJAX, SOAP, HTML, JSP, JQuery, CSS, XML.

Hadoop/Big Data: Map Reduce, Spark, SparkSQL, PySpark, Spark, Pig, Hive, Sqoop, HBase, Flume, Kafka Cassandra, Yarn, Oozie, Zookeeper, ElasticSearch.

RDBMS Languages: MySQL, PL/SQL, Mongo DB, HBase, Cassandra.

Cloud: Azure, AWS

NoSQL: MongoDB, HBase, Apache Cassandra.

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Tools: .Net Beans, Eclipse, GIT, Putty.

Operating System: Linux, Windows, Ubuntu, Red Hat Linux, UNIX.

Methodologies: Agile, Waterfall model.

PROFESSIONAL SUMMARY:

Sr. Hadoop Developer

Confidential

Responsibilities:

  • Defined job work flows as per their dependencies in Oozie.
  • Using the Spark framework Enhanced and optimized product Spark code to aggregate, group and run data mining tasks.
  • Developing Spark code using scala and Spark-SQL/Streaming for faster testing and processing data
  • Good experience in Oozie Framework and Automating daily import jobs.
  • Experienced in using Zookeeper and Oozie Operational Services for coordinating the cluster and scheduling workflows.
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Worked in complete Software Development Life Cycle(analysis, design, development, testing, implementation and support)using Agile Methodologies
  • Regularly tune performance of Hadoop existing spark jobs to improve data processing and retrieving
  • Experience in Map Reduce programming model for analyzing the data stored in HDFS.
  • Queried both Managed and External tables created by Hive using Impala.
  • Data Processing: Processed data using Map Reduce and Yarn. Worked on Kafka as a proof of concept for log processing.
  • Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analysis.
  • Involved in writing shell scripts, Bash scripts for Unix OS for application deployments to production region.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).

Enviro Role: Sr. Hadoop Developer

Confidential, Washington, DC

Responsibilities:

  • Gathered User requirements and designed technical and functional specifications.
  • Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, PIG, HBase, Zookeeper and Sqoop.
  • Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into Hive tables, which are partitioned.
  • Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
  • Imported and exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked on importing and exporting data into HDFS and Hive using Sqoop.
  • Used Flume to handle streaming data and loaded the data into Hadoop cluster.
  • Developed and executed hive queries for de-normalizing the data.
  • Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analysis.
  • Responsible for executing hive queries using Hive Command Line, Web GUI HUE and Impala to read, write and query the data into HBase.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
  • Worked on Cluster of size 130 nodes.
  • Designed Apache Airflow entity resolution module for data ingestion into Microsoft SQL Server.
  • Developed batch processing pipeline to process data using python and airflow. Scheduled spark jobs using airflow.
  • Involved in writing, testing, and running MapReduce pipelines using Apache Crunch.
  • Managed, reviewed Hadoop log file, and worked in analysing SQL scripts and designed the solution for the process using Spark.
  • Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
  • Worked hands on No-SQL databases like MongoDB for POC purpose in storing images and URIs.
  • Designed and implemented MongoDB and associated RESTful web service.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.

Environment : Hadoop, YARN, HBase, Teradata, D2, NoSQL, Kafka, Python, Zookeeper, Oozie, Tableau, Apache Crunch, Apache Storm, MySQL, SQL Server, jQuery, JavaScript, HTML, Ajax and CSS.

Hadoop Developer/Admin

Confidential, Boca Raton, FL

Responsibilities:

  • Administered, maintained, provisioned, patched and maintained Cloudera Hadoop clusters on Linux.
  • Designed and developed the applications on the data lake to transform the data according business users to perform analytics.
  • Developed shell scripts to perform Data Quality validations like Record count, File name consistency, Duplicate File and for creating Tables and views.
  • Creating the views by masking PHI Columns for the table, so that data in the view for the PHI columns cannot be seen by unauthorized teams.
  • Worked on Parquet File format to get a better storage and performance for publish tables.
  • Worked with NoSQL databases like HBase in creating HBase tables to store the audit data of the RAWZ and APPZ tables.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Developed a Restful API using & Scala for tracking open source projects in GitHub and computing the in-process metrics information for those projects.
  • Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream.
  • Experience in using the Docker container system with the Kubernetes integration
  • Developed a Web Application using Java with the Google Web Toolkit API with PostgreSQL
  • Used R for prototype on a sample data exploration to identify the best algorithmic approach and then wrote Scala scripts using spark machine learning module.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, Caffe, TensorFlow, MLLib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, HBase, Hive Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
  • Built Kafka-Spark-Cassandra Scala simulator for Met stream, a big data consultancy; Kafka-Spark-Cassandra prototypes.
  • Designed a data analysis pipeline in Python, using Amazon Web Services such as S3, EC2 and Elastic Map Reduce
  • Implemented applications with Scala along with Akka and Play framework.
  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • It is python and Scala based analytic system with ML Libraries.
  • Worked with NoSQL Platforms and Extensive understanding on relational databases versus No-SQL platforms. Created and worked on large data frames with a schema of more than 300 columns.
  • Ingestion of data into Amazon S3 using Sqoop and apply data transformations using Python scripts.
  • Creating Hive tables, loading and analyzing data using hive scripts. Implemented Partitioning, Dynamic Partitions in HIVE.
  • Deployed and analyzed large chunks of data using HIVE as well as HBase.
  • Worked on querying data using Spark SQL on top of pyspark engine.
  • Used Amazon EMR to perform the Pyspark Jobs on the Cloud. .
  • Created Hive tables to store various data formats of PII data coming from the raw hive tables.
  • Developed Sqoop jobs to import/export data from RDBMS to S3 data store.
  • Designed and implemented Pyspark UDF's for evaluation, filtering, loading and storing of data.
  • Fine-tuning pyspark applications/jobs to improve the efficiency and overall processing time for the pipelines.
  • Knowledge of writing Hive queries and running both scripts in tez mode to improve performance on Hortonworks Data Platform.
  • Used Microservices architecture, with Spring Boot based services interacting through a combination of REST and Spring Boot.
  • Built Spring Boot Microservices for the delivery of software products across the enterprise
  • Created the ALB, ELBs and EC2 instances to deploy the applications into cloud environment.
  • Providing service discovery for all Microservices using Spring Cloud Kubernetes project
  • Developed fully functional responsive modules based on Business Requirements using Scala with Akka.
  • Development of new listeners for producers and consumer for both Rabbit MQ and Kafka
  • Used Microservices with Spring Boot interacting through a combination of REST and Apache Kafka message brokers.
  • Worked in building servers like DHCP, PXE with kick-start, DNS and NFS and used them in building infrastructure in a Linux Environment. Automated build, testing and integration with Ant, Maven and JUnit.

Environment : Apache Hive, Hbase, Pyspark, python, Agile, Stream sets, Bitbucket, Cloudera, Shell Scripting, Amazon EMR, Amazon S3, PyCharm, Jenkins, Scala, Java.

Hadoop Developer

Confidential, Princeton, NJ

Responsibilities:

  • Analysed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
  • Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Scoop.
  • Hands on experience in loading data from UNIX file system and Teradata to HDFS.
  • Install, configure, and operate Apache stack i.e. Hive, HBase, Pig, Sqoop, Zookeeper, Oozie, Flume and Mahout on Hadoop cluster.
  • Created a high-level design for the Data Ingestion and data extraction Module, enhancement of Hadoop Map-Reduce job which joins the incoming slices of data and pick only the fields needed for further processing.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts
  • Tested raw data and executed performance scripts.
  • Worked with NoSQL database HBase to create tables and store data.
  • Developed and involved in the industry specific UDF (user defined functions)
  • Used Flume to collect, aggregate, and store the web log data from various sources like web servers, mobile and network devices and pushed to HDFS.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map reduce Hive, Pig, and Sqoop.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Resolve issues, answer questions, and provide support for users or clients on a day to day basis related to Hadoop and its ecosystem including the HAWQ database.
  • Install, configure, and operate data integration and analytic tools i.e. Informatica, Chorus, SQLFire, GemFireXD for business needs.

Environment : Cloudera, MapReduce, Hive, Solr, Pig, HDFS, HBase, Flume, Sqoop, Zookeeper,Python, Flat files, AWS, Teradata, Unix/Linux.

Jr. Java Developer

Confidential

Responsibilities:

  • Design and develop Servlets, Session and Entity Beans to implement business logic and deploy them on the JBoss Application Server.
  • Developed many JSP pages, used JavaScript for client side validation.
  • MVC framework for developing J2EE based web application.
  • Involved in all the phases of SDLC including Requirements Collection, Design and Analysis of the Customer specifications, Development and Customization of the Application.
  • Developed the User Interface Screens for presentation using Ajax, JSP and HTML.
  • Used the JDBC for data retrieval from the database for various inquiries.
  • Good experience in writing the stored procedures and using JDBC for the database interaction.
  • Developed stored procedures in PL/SQL for Oracle 10g.
  • Eclipse is used as an IDE tool to write and debug the application code, SQL developer is used to test and run the SQL statements.
  • Implemented client side and server side data validations using the JavaScript.

Environment : Java, Eclipse Galileo, HTML4.0, JavaScript, SQL, PL/SQL, CSS, JDBC, JBoss 4.0, Servlets 2.0, JSP 1.0, Oracle.

We'd love your feedback!