We provide IT Staff Augmentation Services!

Hadoop Admin Resume

4.00/5 (Submit Your Rating)

PA

SUMMARY

  • 9+ years of IT experience as a Big Data Admin, Developer, Designer & quality Tester with cross platform integration experience using Hadoop, Java, J2EE and Software Functional Testing.
  • Hands on experience with Hadoop, HDFS, MapReduce and Hadoop Ecosystem like Pig, Hive, Impala, Oozie, Zookeeper, Sqoop, Flume, Spark, Kafka and Hbase.
  • Hands on experience using Cloudera and Horton works Hadoop Distributions.
  • Hands on experience in installing and configuring Apache Hadoop ecosystems using Cloudera Manager, Apache Ambari, puppet and Chef.
  • Strong understanding of various Hadoopservices, MapReduce and YARN architecture.
  • Responsible for writing MapReduce programs.
  • Experienced in importing - exporting data into HDFSusing SQOOP.
  • Load log data into HDFS using Flume.
  • Experience loadingdata to Hive partitions and creating buckets in Hive
  • Logical Implementation and interaction with HBase
  • Developed MapReduce jobs to automate transfer the data fromHBase.
  • Writing Map Reduce programs in Hadoop, pig, Hive and Scala
  • Expertise in analysis using PIG, HIVE and Mapreduce.
  • Worked in Multiple Environment in installation and configuration.
  • Experienced in developing UDFs for Hive using Java.
  • Strong understanding of NoSQL databases like HBase, MongoDB& Cassandra.
  • Familiar with handling complex data processing workflows using Oozie.
  • Scheduling all hadoop/hive/sqoop/Hbase jobs using Oozie.
  • Experience in SQL and Worked on databases like Oracle and IBM DB2, MySQL, MongoDB
  • Ability to learn quickly in work environment, fluent in communication, productive interpersonal skills with the ability to understand and cooperate with group requirements efficiently
  • Dedicated to successful project completion with the ability to work in a team or as an individual, and as a liaison between different teams
  • Experience in setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud.
  • Developed core modules in large cross-platform applications using JAVA, J2EE, spring, Struts, Hibernate, JAX-WS Web Services, and JMS.
  • Worked on debugging tools such as Dtrace, Struss and Top. Expert in setting up SSH, SCP, SFTP connectivity between UNIX hosts.
  • Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
  • Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills
  • Experience in defining detailed application software test plans, including organization,participant, schedule, test and application coverage scope.
  • Experience in gathering and defining functional and user interface requirements for softwareapplications.

TECHNICAL SKILLS

Hadoop/Big Data: Hadoop, Map Reduce, HDFS, Zookeeper, Kafka,Hive, Pig, Sqoop, Oozie, Flume, Yarn,HBase, Spark with Scala.

No SQL Databases: Hbase, Cassandra, mongoDB

Languages: C, C++, Java, J2EE, PL/SQL, Pig Latin, HiveQL, UNIX shell scripts

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB,JSF, JQuery

Frameworks: MVC, Struts, Spring, Hibernate

Operating Systems: Sun Solaris, HP-UNIX,RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata

Tools: and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD,DB Visualizer

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

PROFESSIONAL EXPERIENCE

Confidential, PA

Hadoop Admin

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbasedatabase and Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from LINUX file system to HDFS.
  • Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products
  • Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
  • Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on tuning the performance of Mapreduce Jobs.
  • Implemented Map Reduce using Hadoop,Pig,Hive and Scala
  • Responsible to manage data coming from different sources.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Experience in managing and reviewing Hadoop log files.
  • Job management using Fair scheduler.
  • Exported the analyzed data to the relational databases using Sqoopfor visualization and to generate reports for the BI team.
  • Responsible for maintaining Content Management System on daily basis.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • CreatedOozie workflows to run multiple MR, Hive and pig jobs.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.

Environment: Hadoop, HDFS, Pig, Sqoop, HBase, Shell Scripting, Ubuntu, Linux Red Hat, Spark,Scala

Confidential

HadoopAdmin/Developer

Responsibilities:

  • Installed and configured HadoopMapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
  • Extensive experience in designing and implementing Data Flow pipeline from RDBMS.
  • Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, sqoop, ssis package and mysql
  • Creating Sqoop Jobs to import the data and to load into hdfs
  • Involved in setting up Multi Node cluster in Amazon Cloud by creating instances on Amazon EC2.
  • Created MapReduce Jobs on Amazon Elastic Map Reduce (Amazon EMR).
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and also written Hive UDFs.
  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Devised procedures that solve complex business problems with due considerations for hardware/software capacity and limitations, operating times and desired results.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior like shopping enthusiasts, travelers, music lovers etc.
  • Exported the patterns analyzed back into Teradata using Sqoop.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Installed Oozie workflow engine to run multiple Hive.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Process Real time data with Spark

Environment: Hadoop, MapReduce, HDFS, Hive, Ooozie, Java (jdk1.6), Cloudera, NoSQL, Oracle 11g, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting, Spark.

Confidential

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce on EC2.
  • Worked with the Data Science team to gather requirements for various data mining projects.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked on debugging, performance tuning of Hive & Pig Jobs.
  • Implemented test scripts to support test driven development and continuous integration.
  • Involved in running Hadoop jobs for processing millions of records and compression techniques.
  • Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Worked on tuning the performance of Pig queries.
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
  • Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
  • Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
  • Involved in loading data from LINUX file system to HDFS.
  • Importing and exporting data into HDFS and HBase using Sqoop from MYSQL.
  • Experience working on processing semi-structured data using Pig and Hive.
  • Supported MapReduce Programs those are running on the cluster.
  • Gained experience in managing and reviewing Hadoop log and JSON files.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
  • Designing and documenting the project use cases, writing test cases, leading offshore team, and interacting with client.
  • Experience with professional software engineering practices and best practices for the full software development life cycle including coding standards, code reviews, source control management and build processes.

Environment: Hadoop, HDFS, HBase, Pig, Hive, MapReduce, Sqoop, Oozie, LINUX, S3, EC2, AWS and Big Data.

Confidential hadoop Admin

Responsibilities:

  • Attend requirement meeting with Business Analysts/ Business Users
  • Worked on Multi node Clustered environment and set up Cloudera in Hadoop echo-System.
  • Performed basic Hadoop Administration responsibilities including software installation, configuration, software upgrades, backup and recovery, commissioning and decommissioning data nodes, cluster setup, cluster performance and monitoring on a daily basis.
  • Involved in analyzing system failures, identifying the root causes and recommending actions to be taken.
  • Created user accounts and set user’s access in the Hadoop cluster.
  • Configuring Hadoop Ecosystem tools including Pig, Hive, Hbase, Sqoop, Kafka, Oozie, Zookeeper and Spark in Cloudera Environment.
  • Performed on Capacity Planning, Performance Tuning, Cluster Monitoring as well as Troubleshooting.
  • Creating Hive tables, loading data and writing hive queries which will run internally in Map Reduce way.
  • Working experience on importing and exporting data into HDFS and Hive using Sqoop.
  • Creating and managing the database objects such as tables, indexes and views.
  • Experience on importing & exporting data using Sqoop from MySQL to Hive.
  • Troubleshooting many cloud issues such as Data Node down, Network failures and data block missing.
  • Implementing Kerberos to authenticate all services in Hadoop cluster and manage security.
  • Managed Hadoop cluster and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
  • Hands-on perform to setup data pipeline using Kafka and Spark platform.
  • Assign permission on topics to different consumers and groups, manage spark RDD, working with dataset and Dataframe, save different data as a hive table using HCatalog server.
  • Manage different file format for Hive table like text, RC, ORC, Sequence, Parquet and Avro.
  • Understanding of AWS cloud computing platform and related services

Environment: Hadoop, HDFS, Pig, Sqoop, HBase, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala

Confidential

Hadoop Developer/QA

Responsibilities:

  • Experience in Importing and exporting data into HDFS and Hive using Sqoop.
  • Developed Pig program for loading and filtering the streaming data into HDFS using Flume.
  • Experienced in handling data from different data sets, join them and pre process using Pig join operations.
  • Moving Bulk amount data into HBase using Map Reduce Integration.
  • Developed Map-Reduce programs to clean and aggregate the data
  • Developed HBase data model on top of HDFS data to perform real time analytics using Java API.
  • Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
  • Imported and exported data from Teradata to HDFS and vice-versa.
  • Strong understanding of Hadoop eco system such as HDFS, MapReduce, HBase, Zookeeper, Pig, Hadoop streaming, Sqoop, Oozie and Hive
  • Implement counters on HBasedata to count total records on different tables.
  • Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
  • Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
  • We used Amazon Web Services to perform big data analytics.
  • Implemented Secondary sorting to sort reducer output globally in map reduce.
  • Implemented data pipeline by chaining multiple mappers by using Chained Mapper.
  • Created Hive Dynamic partitions to load time series data
  • Experienced in handling different types of joins in Hive like Map joins, bucker map joins, sorted bucket map joins.
  • Created tables, partitions, buckets and perform analytics using Hive ad-hoc queries.
  • Experienced import/export data into HDFS/Hive from relational data base and Tera data using Sqoop.
  • Handling continuous streaming data comes from different sources using flume and set destination as HDFS.
  • Integrated spring schedulers with Oozie client as beans to handle cron jobs.
  • Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters
  • Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Hbase, Sqoop, RDBMS/DB, Flat files, Teradata, Mysql, CSV, Avro data files.

We'd love your feedback!