We provide IT Staff Augmentation Services!

Hadoop Admin Resume

3.00/5 (Submit Your Rating)

Orlando, FL

PROFESSIONAL SUMMARY:

  • Strong experience in developing applications that perform large scale Distributed Data Processing using Big Data ecosystem tools Hadoop, Hive, Pig, Sqoop, Map Reduce, YARN, Cassandra, Spark, Kafka, Oozie, Zoo Keeper and Flume.
  • Good understanding on Spark core, Spark SQL and Spark Streaming.
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, MapReduce programming paradigm and good hands - on experience in Scala and SQL queries.
  • Hands-on experience on major components in Hadoop Ecosystem including Hive, HBase and Integration between Hive and HBase, Sqoop, Flume & knowledge of Mapper/Reduce/HDFS Framework.
  • Expertise in implementing Spark and Scala application using higher order functions for both batch and interactive analysis requirement.
  • Good Knowledge on Cloudera distributions and in Amazon simple storage service (Amazon S3), AWS and Amazon EC2, Amazon EMR.
  • Set up standards and processes for Hadoop based application design and implementation.
  • Worked on NoSQL databases including HBase and MongoDB.
  • Experience on Hortonworks and Cloudera environments.
  • Setting up data in AWS using S3 bucket and configuring instance backups to S3 bucket.
  • Good experience in analysis using Pig and Hive and understanding of SQOOP.
  • Expertise in database performance tuning data modeling.
  • Experienced in providing security to Hadoop cluster with Kerberos and integration with LDAP/AD at Enterprise level.
  • Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
  • Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
  • Hands on experience in Apache Spark creating RDD’s and Data Frames applying Operations Transformation and Actions and concerting RDD’s to Data Frames.
  • Migrating various Hive UDF's and queries into Spark SQL for faster requests.
  • Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
  • Experience in using Apache Kafka for log aggregating.
  • Developed a data pipeline using Kafka and Spark Streaming to store data into HDFS and performed the real-time analytics on the incoming data.
  • Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
  • Loading the data into EMR from various sources S3 process it using Hive Scripts.
  • Exploring with Spark various modules of Spark and working with Data Frames, RDD and Spark Context.
  • Performed map-side joins on RDD and Imported data from different sources like HDFS/HBase into Spark RDD.
  • Experience working on Solr to develop search engine on unstructured data in HDFS.
  • Experience in production support and application support by fixing bugs.
  • Used HP Quality Center for logging test cases and defects.
  • Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
  • Experience working with JAVA J2EE, JDBC, ODBC, JSP, Java Eclipse, Java Beans, EJB, Servlets
  • Expert in developing web page interfaces using JSP, Java Swings, and HTML scripting languages.
  • Experience in using IDEs like Eclipse, NetBeans and Maven.
  • Good understanding of Scrum methodologies, Test Driven Development and Continuous integration.

SKILLS:

Hadoop Core Services: HDFS, Map Reduce, Spark, YARN, Hue, Hive, Scala, Kafka, Apache Tez, Infra Solr, Oozie, Zookeeper, Genie, Atlas, Elastic Search, Docker.

Hadoop Distribution: Horton works

NO SQL Databases: HBase, Mongo DB

Cloud Computing Tools: Amazon AWS, EMR, S3, EC2.

Languages: Scala, SQL, HiveQL, Unix Shell Scripting, Ansible, Python basics

Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts

Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.

Databases: Oracle 10g/11g, MySQL, SQL

Operating Systems: UNIX, Windows, MAC OS, LINUX, RHEL6/7

Build Tools: Jenkins, Maven

Development Tools: Microsoft SQL Studio, Eclipse, Visual Studio, IntelliJ

Development methodologies: Agile/Scrum, Waterfall

PROFESSIONAL WORK EXPERIENCE:

Confidential, Orlando, FL

Hadoop Admin

Responsibilities:

  • Excellent knowledge of Hadoop architecture, administration, and support, including HDFS, Zookeeper, MapReduce, YARN
  • Experience with Hadoop Ecosystem Security (Kerberos, AD integration, and/or other ecosystem components such as Ranger and Knox.
  • Responsible for upgrades, installations, systems management of Hadoop & related utilities.
  • Strong Linux System Administration skills (RHEL/CentOS preferred).
  • Experience with Amazon AWS and AWS technologies such as EMR, EC2, IAM, S3, Lambda, Data Pipeline, etc. are a strong plus.
  • Manage and monitor Hadoop cluster and platform infrastructure.
  • Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
  • Handle deployment methodologies, code and data movement between Dev., QA and Prod Environments using Jenkins pipe line (deployment groups / folder copy/ data-copy etc.)
  • Used Network Monitoring Daemons like Ganglia and Service monitoring tools like Nagios.
  • Configured inter node communication between Apache Solr nodes and client using SSL encryption.
  • Experience in Database Administration, performing tuning and backup & recovery and troubleshooting in large scale customer facing environment.
  • Involved Storm terminology created a topology that runs continuously over a stream of incoming data.
  • Expertise in Commissioning and Decommissioning of nodes in the clusters, Backup configuration and Recovery from a Name node failure.
  • Good working knowledge on importing and exporting data from different databases namely MySQL into HDFS and Hive using Scoop.
  • Strong knowledge on yarn terminology and the High-Availability Hadoop Clusters.
  • Experience with monitoring and alert notification in a Hadoop production environment
  • Configure and start Kafka Server, and Create Kafka Topic and start Consumer services which would suck the data coming from the Kafka Producer.
  • Strong Linux/Unix system administrator skills.
  • Development of Spark SQL job in Scala to read the data from KAFKA topic and store the file in ORC file format in HDFS.
  • Create Hive External tables based on the data placed in HDFS which would be used by the BO universe in order to create reports based on the business needs.
  • Implemented Repartition, Caching and broadcast concepts on RDD’s, DF’s and variables to achieve better performance on cluster.
  • Data Frames are created by reading the validated JSON Files and run the SQL queries using SQL Context to get the common transaction data from all the systems.
  • Worked with Spark core, Spark Streaming and Spark SQL module of Spark.
  • Developed POC for Apache Kafka and implementing real-time streaming ETL pipeline using Kafka Streams API.
  • Done some POC for setting up Single node cluster on EC2 Instance for different project teams for their end to end operations.
  • Good Hands on experience in writing automated Ansible scripts for patching environments and Ambari Maintenance scripts.
  • Expert understanding of ETL principles and how to apply them within Hadoop.
  • Experience in configuring and administrating NiFi Installations.
  • Hands on experience for creating custom NiFi Processors
  • Experience in building data ingestion workflows/pipeline flows using NiFi, NiFi registry
  • Created a Website for different project teams with all the links used for their daily activity which took me for about 3 days to build entire HTML code.
  • Writing yml files for different project teams for their job submissions, Moving data between environments and moving data to S3 Buckets.
  • Working with different project teams for their code fixes and planning for their releases.
  • Experience in building Search engine for Elastic Search.

Environment: AWS, Hadoop Ecosystem, Kafka, Spark, Yarn, Genie, Strom, Tableau, Elastic Search, Ansible, NiFi, Jenkins, Docker.

Confidential, Orlando, FL

Hadoop and Spark Developer

Responsibilities:

  • Development of Spark jobs to extract the data from Oracle database and write to the Kafka Topic and store the data in HDFS in ORC and JSON file format.
  • Configure and start Kafka Server, and Create Kafka Topic and start Consumer services which would suck the data coming from the Kafka Producer.
  • Development of Spark SQL job in Scala to read the data from KAFKA topic and store the file in ORC file format in HDFS.
  • Create Hive External tables based on the data placed in HDFS which would be used by the BO universe in order to create reports based on the business needs.
  • Implemented Repartition, Caching and broadcast concepts on RDD’s, DF’s and variables to achieve better performance on cluster.
  • Data Frames are created by reading the validated JSON Files and run the SQL queries using SQL Context to get the common transaction data from all the systems.
  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Used Scala to write the code for all the use cases in Spark and Spark SQL.
  • Expertise in implementing Spark and Scala application using higher order functions for both batch and interactive analysis requirement.
  • Implemented SPARK batch jobs using Spark Submit in Cluster Mode.
  • Worked with Spark core, Spark Streaming and Spark SQL module of Spark.
  • Developed Spark jobs using Scala in Sandbox environment for faster data processing and used Spark SQL for querying.
  • Created SBT folder structure and convert to jar file by using Spark submit.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Developed POC for Apache Kafka and implementing real-time streaming ETL pipeline using Kafka Streams API.
  • Populated HDFS with huge amounts of data using Apache Kafka.
  • Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
  • Worked on Genie to setup Spark Submits in Cluster mode using UC4 Scheduler and also Crontab.

Environment: Hadoop, Spark, Scala, Apache Kafka, Hive, HDFS, Sqoop, HBase, Oracle, Teradata, Genie,UC4.

Confidential, Cleveland, OH

Hadoop and Spark Developer

Responsibilities:

  • Implemented AWS solutions using EC2, S3 and load balancers.
  • Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
  • Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data.
  • Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
  • Involved in creating Hadoop streaming jobs using Python.
  • Created concurrent access for hive tables with shared and exclusive locking that can be enabled in hive with the help of Zookeeper implementation in the cluster.
  • Worked on various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.
  • Developed multiple Map Reduce jobs in Java for data cleaning.
  • Developed Hive UDF to parse the staged raw data to get the Hit Times of the claims from a specific branch for a particular insurance type code.
  • Ran many performance tests using the Cassandra-stress tool in order to measure and improve the read and write performance of the cluster.
  • Created data model for structuring and storing the data efficiently. Implemented partitioning and bucketing of tables in Cassandra.
  • Built wrapper shell scripts to hold Oozie workflow.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Worked on Distributed/Cloud Computing (Map Reduce/Hadoop, Pig, Hbase, AVRO, Zookeeper, etc.), Amazon Web Services (S3, EC2, EMR etc.)
  • Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig.
  • Worked on MRJ in querying multiple semi-structured data as per analytic needs.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Involved in using Solr Cloud implementation to provide real time search capabilities on the repository with tera bytes of data.
  • Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
  • Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka Hdfs, Hbase and Hive by integrating with Storm.
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Familiarity with NoSQL databases such as Cassandra.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, Hadoop 2, MapReduce, Hive, HDFS, Cassandra, PIG, Sqoop, Oozie, EMR, Solr, HBase, Zookeeper, CDH5, Mongo DB, Cassandra, Oracle, NoSQL and Unix/Linux, Apache Kafka, Amazon web services.

Confidential, Warsaw, IN

Bigdata/Hadoop Developer

Responsibilities:

  • Responsible to manage data coming from different sources.
  • Involved in gathering the business requirements from the Business Partners and Subject Matter Experts.
  • Involved in works including indexing data, tuning relevance, developing custom tokenizers and filters, adding functionality includes playlist, custom sorting.
  • Involved in HDFS maintenance and loading of structured and unstructured data.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS
  • Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
  • Enhanced HIVE queries performance using TEZ for Customer Attribution datasets.
  • Extracted and restructured the data into MongoDB using import and export command line utility tool.
  • Designed and Maintained Tez workflows to manage the flow of jobs in the cluster
  • Developed Scripts and Batch Job to schedule various Hadoop Program.
  • Installation of Oozie workflow engine to run multiple Hive and pig jobs.
  • Loading log data into HDFS using Flume and performing ETL Integration.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
  • Used Flume to collect, aggregate and store the web log data from different sources like web servers, mobile and network devices and pushed into HDFS.
  • Good Understanding of DAG cycle for entire Spark application flow on Spark application Web UI.
  • Developed Spark SQL scripts and involved in converting Hive UDF’s to Spark SQL UDF’s.
  • Implemented procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Developed multiple Spark jobs in Scala/Python for Data cleaning, pre-processing and aggregating.
  • Developed Spark programs using Scala, Involved in Creating Spark SQL Queries and Developed Oozie workflow for Spark jobs.
  • Worked on NiFi to automate the data movement between different Hadoop systems.

Environment: Hadoop, Hive, Horton works, Spark, Nifi, Tez, Linux, Map Reduce, HDFS, Hive, Pig, HBase, Sqoop, Flume, Shell Scripting, Storm, Java (JDK 1.6), Java 6, Eclipse, Oracle 10g, PL/SQL, SQL*PLUS.

Confidential

Java/J2EE Developer

Responsibilities:

  • Involved in Requirements analysis, design, and development and testing.
  • Involved in setting up the different roles & maintained authentication to the application.
  • Designed, deployed and tested Multi-tier application using the Java technologies.
  • Involved in front end development using JSP, HTML & CSS.
  • Implemented the Application using Servlets.
  • Deployed the application on Oracle Web logic server.
  • Implemented Multithreading concepts in java classes to avoid deadlocking.
  • Used MySQL database to store data and execute SQL queries on the backend.
  • Prepared and Maintained test environment.
  • Tested the application before going live to production.
  • Documented and communicated test result to the team lead on daily basis.
  • Involved in weekly meeting with team leads and managers to discuss the issues and status of the projects.

Environment: J2EE (Java, JSP, JDBC, Multi-Threading), HTML, Oracle Web logic server, Eclipse, MySQL.

Confidential

Java Developer

Responsibilities:

  • Used JSP pages through Servlets Controller for client-side view.
  • Created JQuery, JavaScript plug-ins for UI.
  • Always used the best practices of Java/J2EE to minimize the unnecessary object creation.
  • Implement RESTful web services with the Struts framework.
  • Verify them with the J Unit testing framework.
  • Working experience in using Oracle 10g backend Database.
  • Used JMS Queues to develop Internal Messaging System.
  • Developed the UML Use Cases, Activity, Sequence and Class diagrams using Rational Rose.
  • Developed Java, JDBC, and Java Beans using JBuilder IDE.
  • Developed JSP pages and Servlets for customer maintenance.
  • Apache Tomcat Server was used to deploy the application.
  • Involving in Building the modules in Linux environment with ant script.
  • Used Resource Manager to schedule the job in UNIX server.
  • Performed Unit testing, Integration testing for all the modules of the system.
  • Developed JAVA BEAN components utilizing AWT and SWING classes.

Environment: Java, JDK, Servlets, JSP, HTML, JBuilder, HTML, JavaScript, CSS, Tomcat, Apache HTTP Server, XML, JUNIT, EJB, RESTful, Oracle.

We'd love your feedback!