We provide IT Staff Augmentation Services!

Hadoop Admin Resume

3.00/5 (Submit Your Rating)

Cupertino, CA

PROFESSIONAL SUMMARY:

  • Having 15 years of IT Experience in Analysis, Design, Development, Implementation and Testing of enterprise wide application, Data Warehouse, Client Server Technologies and Web - based Applications.
  • Over 10+Years of experience in dealing with Apache Hadoop components like Confidential, MapReduce, HIVE,Hbase,PIG,SQOOP,NAGIOS,Spark,Impala,OOZIE, and Flume Big Data and Big Data Analytics.
  • Experienced in administrative tasks such as Hadoop installation in pseudo distribution mode, multimode cluster and installation of Apache Ambari in Hortonworks Data Platform (HDP2.5).
  • Installation, configuration, supporting and managing Hortonworks Hadoop cluster.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as Confidential, NameNode, Job Tracker, DataNode, Task Tracker and Map Reduce concepts.
  • Experience in installation, configuration, support and management of a Hadoop Cluster.
  • Experience in task automation using Oozie , cluster co-ordination through Pentaho and MapReduce job scheduling using Fair Scheduler.
  • Experience in analyzing data using HiveQL , Pig Latin and custom Map Reduce programs in Java.
  • Experience in writing custom Confidential to extend Hive and Pig core functionality.
  • Got experience in managing and reviewing Hadoop Log files.
  • Worked with Sqoop to move (import/export) data from a relational database into Hadoop and used FLUME to collect data and populate Hadoop.
  • Worked with HBase to conduct quick look ups (updates, inserts and deletes) in Hadoop.
  • Experience in working with cloud infrastructure like Amazon Web Services (AWS) and Rackspace.
  • Worked on migrating MapReduce Python programs into Spark transformations using Spark.
  • Extensive experience in Spark Streaming (version 1.5.2) through core Spark API running Scala, Java & Python Scripts to transform raw data from several data sources into forming baseline data.
  • Extensively worked on several ETL assignments to extract, transform and load data into tables as part of Data Warehouse development with high complex Data models of Relational, Star, and Snowflake schema.
  • Highly proficient in Extract, Transform and Load the data into target systems using Informatica power center Repository Manager, Designer, Workflow Manager and Workflow Monitor.
  • Extensive experience in loading data, troubleshooting, Debugging mappings, performance tuning of Informatica (Sources, Targets, Mappings and Sessions) and fine-tuned transformations to make them more efficient in terms of session performance.
  • Administration of Hadoop and Vertica clusters for structured and unstructured data warehousing
  • Administration of Hbase, Hive, Ganglia, Sqoop, Confidential, and MapReduce
  • Good working knowledge of Vertica DB architecture, column orientation, and High Availability.
  • Strong hold on Informatica powercenter, Oracle, Vertica, hive, SQL Server, Shell scripting and Qlikview.
  • Experienced in all phases of Software Development Life Cycle ( SDLC ).
  • Experience in Data Modeling, Data Extraction, Data Migration, Data Integration, Data Testing and Data Warehousing using Ab Initio.
  • Configured Informatica environment to connect to different databases using DB config, Input Table, Output Table, Update table Components.
  • Performed systems analysis for several information systems documenting and identifying performance and administrative bottlenecks.
  • Good understanding of Big Data and experience in developing predictive analytics applications using open source technologies.
  • Good understanding and extensive work experience on SQL and PL/SQL.
  • Experience in storing and managing data on H-catalog data model.
  • Experience in writing SQL queries to process some joins on Hive table and No SQL Database.
  • E xcellent Communication skills in interacting with various people of different levels on all projects and also playing an active role in Business Analysis.

TECHNICAL SKILLS:

Operating Systems: UNIX/Linux (Redhat 3/4/5/6, Ubuntu), Windows Vista/XP/07/10

Database: Mysql

Computer Skills: E Mail Software' Confidential (Outlook, Thunderbird, etc), Peripheral Devices (Scanners, Printers, etc), Personal Computers, Presentation Software (PowerPoint, Flash, etc), Spreadsheet Software (Calc, Excel, etc), Word Processing Software (Word, WordPerfect, etc) and Adobe Photoshop.

Big Data Technologies: Confidential, Hive, Map Reduce, Pig, Sqoop, Oozie, Zookeeper, YARN, Avro, Spark

BI Reporting Tools Tableau, Crystal Reporting and Power Pivot:

Tools : Quality center v11.0\ALM,HP QTP,HP UFT, Selenium, Test NG, JUnit

Programming Languages: Java, Confidential ++, Confidential, SQL,PL/SQL

QA methodologies: Waterfall, Agile, V-model.

Front End Technologies: HTML, XHTML, CSS, XML, JavaScript, AJAX, Servlets, JSP

Java Frameworks: MVC, Apache Struts2.0, Spring and Hibernate

Cassandra RDBMS: Oracle 9i, Oracle 10g, MS Access, MS SQL Server, IBM DB2, PL/SQL .

Operating Systems: Linux, UNIX, MAC, Windows NT / 98 /2000/ XP / Vista, Windows 7,Windows 8

Tools AND Techniques :

Big Data Ecosystem: Cloudera, Hortonworks, Hadoop MapR, Confidential, HBase, Zookeeper, Nagios, Hive, Pig, Ambari,Spark,Impala

Data Modeling: Star-Schema Modeling, Snowflakes Modeling, Erwin 4.0, Visio

RDBMS: Oracle … 13.0, Teradata V2R6, Teradata 4.6.2, DB2, MS SQL Server 2000Programming: UNIX Shell Scripting, Confidential ++, Java, Korn Shell, SQL*Plus, PL/SQL,HTML

WORK EXPERIENCE:

Hadoop Admin

Confidential, Cupertino, CA

Responsibilities:

  • Worked on importing and exporting data from Oracle and DB2 into Confidential and HIVE using Sqoop.
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, Cassandra and slots configuration.
  • Installed and managed Hadoop production cluster with 50+ nodes with storage capacity of 10PB with HDP distribution using 1.7 Ambari and 2.1.3 HDP
  • Upgraded Production cluster from Ambari1.7 to 2.1 and HDP 2.1 to 2.2.6.
  • Designing and development of ETL Jobs using HP Vertica, Datastage Teradata and Linux shells scripting.
  • Provide high level recommendations and implementation of ETL standards.
  • Developed Spark scripts by using Scala and Python shell commands as per the requirement.
  • Responsible for developing data pipeline using HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in Confidential .
  • Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Managed log files, backups and capacity.
  • Found and troubleshot Hadoop errors
  • Created Ambari Views for Tez, Hive and Confidential .
  • Architecture and designed Hadoop 30 nodes Innovation Cluster with SQRRL, SPARK, Puppet, HDP 2.2.4.
  • Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing Confidential, Hive.
  • Managed 350+ Nodes HDP 2.2.4 cluster with 4 petabytes of data using Ambari 2.0 and Linux Cent OS 6.5 .
  • Maintained and administrated Confidential through Hadoop - Java API, shell scripting, Python
  • Exported data to Teradata using sqoop data is stored in Vertica database table and Spark was used to load the data from Vertica table in to Data
  • Worked with 50+ source systems and got batch files from heterogeneous systems like Unix/windows/oracle/Teradata/mainframe/db2.
  • Migrated 1000+ tables from Teradata to HP vertica.
  • Loaded the data from vertica to Hive using Sqoop.
  • Created Hive tables to store the processed results in a tabular format. Created 25+ Linux Bash scripts for users, groups, data distribution, capacity planning, and system monitoring.
  • Upgraded the Hadoop cluster from CDH4.7 to CDH5.2.
  • Supported MapReduce Programs and distributed applications running on the Hadoop cluster.

Environment: Hive, Pig, HBase Apache Nifi, PL/SQL, Hive, Java, Unix Shell scripting, Sqoop, ETL, Python, Ambari 2.0, Linux Cent OS, HBase, MongoDB, Cassandra, Ganglia and Cloudera Manager.

Hadoop Admin

Confidential, Irvine, CA

Responsibilities:

  • Analyzed Hadoop cluster and other big data analysis tools including Pig
  • Implemented multiple nodes on CDH3 Hadoop cluster on Red hat Linux
  • Built a scalable distributed data solution
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Imported data from Linux file system to Confidential
  • Transmitted data from SQL to HBase using Sqoop
  • Worked with a team to successfully tune Pig' Confidential performance queries
  • Excelled in managing and reviewing Hadoop log file
  • Worked on evaluating, architecting, installation/setup of Hortonworks 2.1/1.8 Big Data ecosystem which includes Apache Hadoop Confidential, Pig, Hive and Sqoop.
  • Used Apache Spark API over Hortonworks Hadoop YARN cluster to perform analytics on data in Hive.
  • Fitted Oozie to run multiple Hive and Pig job
  • Used data integration tools like Flume and Sqoop
  • Setup automated processes to analysis the system and find errors
  • Supported IT department in cluster hardware upgrades
  • Contributed to building hands-on tutorials for the community to learn how to use Hortonworks Data Platform (powered by Hadoop) and Hortonworks Dataflow (powered by NiFi) covering categories such as Hello World, Real-World use cases, Operations.
  • Installing, Upgrading and Managing Hadoop Cluster on Hortonworks
  • Setup, configured, and managed security for the Cloudera Hadoop cluster.
  • Loaded log data into Confidential using Flume
  • Created multi-cluster test to test the system' Confidential performance and failover
  • Created ETL test data for all ETL mapping rules to test the functionality of the Informatica Mapping.
  • Tested the ETL Informatica mappings and other ETL Processes (Data Warehouse Testing).
  • Developed Scala, Python scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark
  • Worked on migrating MapReduce Python programs into Spark transformations using Spark.
  • Built a scalable Hadoop cluster for data solution
  • Used Python for writing script to move the data across clusters.
  • Expertise in designing Python scripts to interact with middleware/back end services.
  • Worked on python scripts to analyze the data of the customer.
  • Help design of scalable Big Data clusters and solutions.
  • Commissioning and Decommissioning Nodes from time to time
  • Extensively used mapping parameters and variables, post-Sql, pre-Sql, Sql overrides, lookup overrides in Informatica objects.
  • Adding the build notes for Salesforce deployment and also add the changes in the deployment document for Tibco and ISS task template for Informatica code movement.
  • Used mapping Parameters and Variables for parameterizing the connections in workflow manager.
  • Tuned the performance of Informatica objects to load faster.
  • Created UNIX scripts to handle data quality issues and also to invoke the Informatica workflows.
  • Experience on creating and performance tuning of Vertica, Hive scripts.
  • Strong hold on Informatica powercenter, Oracle, Vertica, hive, SQL Server, Shell scripting and Qlikview. Azure Cloud Infrastructure design and implementation utilizing ARM templates.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Designed the Data Warehousing ETL procedures for extracting the data from all source systems to the target system.
  • Worked with developers to setup a full Hadoop system on AWD

Environment: Confidential CDH3, CDH4, Hbase, Confidential, Python, RHEL 4/5/6, Hive, Pig, Perl Scripting and AWS S3, EC2, Hadoop, Confidential, Pig, Sqoop, HBase, Shell Scripting, Ubuntu, Linux Red Hat

Hadoop Admin

Confidential, Newark, CA

Responsibilities:

  • Installed, configured and Administrated of all UNIX/LINUX servers, includes the design and selection of relevant hardware to Support the installation/upgrades of Red Hat (5/6), CentOS 5/6, Ubuntu operating systems.
  • Network traffic control, IPsec, Quos, VLAN, Proxy, Radius integration on Cisco Hardware via Red Hat Linux Software.
  • Developed the Sqoop scripts in order to make the interaction between Hive and vertica Database
  • Used Teradata Load Utilities Like MLOAD, FastLoad. Also used fast export.
  • Written JCL, PARM, PROC for new processes and written Teradata BTEQ in JCL.
  • Experience on creating and performance tuning of Vertica, Hive scripts
  • Used Agile/scrum Environment and used Jenkins, GitHub for Continuous Integration and Deployment
  • Provisioning, building and support of Linux servers both Physical and Virtual using VMware for Production, QA and Developers environment.
  • Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
  • Deployed Datalake cluster with Hortonworks Ambari on AWS using EC2 and S3.
  • Hands on experience in installing, configuring Cloudera, MapR, Hortonworks clusters and using Hadoop ecosystem components like Hadoop Pig, Hive, HBase, Sqoop, Kafka, Oozie, Flume, Zookeeper
  • Created customized BI tool for manager team that perform Query analytics using HiveQL.
  • Projects also have other application integration to BI-DARTT.
  • Worked on importing and exporting data from Oracle and DB2 into Confidential and HIVE using Sqoop.
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, Cassandra and slots configuration.
  • Installed, configured, and administered a small Hadoop clusters consisting of 10 nodes. Monitored cluster for performance and, networking and data integrity issues.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Expertise with Hortonworks Hadoop platform( Confidential, Hive, Oozie, Sqoop, Yarn)
  • Responsible for reviewing all open tickets, resolve and close any existing tickets.
  • Performing Linux systems administration on production and development servers (Red Hat Linux, CentOS and other UNIX utilities)
  • Participates in meetings with Client to understand business requirements in detail
  • Design and implement ETL frameworks and concepts Hadoop Admin
  • Installation, Configuration, Upgradation and administration of Sun Solaris, Red hat Linux.
  • Automate administration tasks through the use of scripting and Job Scheduling using CRON Fragile.

Hadoop Admin

Confidential, Naperville, IL

Responsibilities:

  • Tested raw data and executed performance scripts
  • Shared responsibility for administration of Hadoop, Hive and Pig
  • Aided in developing Pig scripts to report data for the analysis
  • Moved data between Confidential and RDBMS using Sqoop
  • Analyzed MapReduce jobs for data coordination
  • Setup, configured, and managed security for the Cloudera Hadoop cluster.
  • Found and troubleshot Hadoop errors
  • Experience in installing Hadoop cluster using different distributions of Apache Hadoop, Cloudera, Hortonworks Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
  • Deploy and management of multi node HDP clusters (Hortonworks).
  • Installed, upgraded, and patched ecosystem products using Cloudera Manager
  • Balanced and tuned Confidential, Hive, Impala, MapReduce, and Oozie work flows
  • Excellent Experience in Designing, Developing, Documenting, Testing of ETL jobs and mappings in Server and Parallel jobs using Data Stage to populate tables in Data Warehouse and Data marts.
  • Developed monitoring and notification tools using Python.
  • Wrote Python routines to log into the websites and fetch data for selected options.
  • Used Collections in Python for manipulating and looping through different user defined objects.
  • Diligently teaming with the infrastructure, network, database, application, and business intelligence teams to guarantee high data quality and availability.
  • Configured Fair scheduler to share the resources of the cluster.
  • Experience designing data queries against data in the Confidential environment using tools such as Apache Hive.
  • Responsible for reconciliation between hive and Informatica using qlikview.
  • Coordinate the request for data restores from tape library to online metadata. Utilization of Teradata SQL assistant for performing SQL queries within Teradata
  • Experience with Informatica Power Center tool for performing data extraction, transform and load. Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files
  • Used Oozie scripts for deployment of the application and perforce as the secure versioning software.
  • Performed unit testing, system testing and integration testing.
  • Provided input to the documentation team.
  • Scripting Hadoop package installation and configuration to support fully-automated deployments.

Environment: Linux, Map Reduce, Confidential, Hive, Python, Pig, Shell Scripting

Hadoop Admin

Confidential, Piscataway, NJ

Responsibilities:

  • Involved in installing, configuring and using Hadoop Ecosystems (Hortonworks).
  • Involved in Importing and exporting data into Confidential and Hive using Sqoop.
  • Experienced in managing and reviewing Hadoop log files.
  • This project involves File transmission and electronic data interchange trades capture, verify, process and routing operations, Banking Reports Generation, Operational management.
  • Developed and Modified Oracle Packages, Procedures, functions, Triggers as per the business requirements.
  • Wrote and tested Python scripts to create new data files for Linux sever configuration using a Python template tool.
  • Used AWS remote computing services such as S3, EC2.
  • Involved in upgrading Hadoop Cluster from HDP 1.3 to HDP 2.0.
  • Involved in loading data from UNIX file system to Confidential .
  • Shared responsibility for administration of Hadoop, Hive and Pig.
  • Experience in DW concepts and technologies using Vertica application.
  • Administration of Hadoop and Vertica clusters for structured and unstructured data warehousing
  • Experience in DW concepts and technologies using Vertica application. Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDD' Confidential, and Scala Python.

Environment: Confidential, Hive, Sqoop, Zookeeper and HBase, Unix Linux Java, Python, Confidential Map Reduce, Pig, Hive, HBase, Flume, Kafka, Sqoop, Shell Scripting.

Hadoop Admin

Confidential, Boston, MA

Responsibilities:

  • Engineer in Big Data team, worked with Hadoop, and its Ecosystem.
  • Used Flume to collect, aggregate and store the web log data onto Confidential .
  • Wrote Pig scripts to run ETL jobs on the data in Confidential .
  • Used Hive to do analysis on the data and identify different correlations.
  • Having knowledge on Installation and configuration of cloudera hadoop on single or cluster environment.
  • Worked on setting up of environment and re-configuration activities.
  • Development and maintenance of the Hive-QL, Pig Scripts.
  • Participates in meetings with Client to understand business requirements in detail
  • Design and implement ETL frameworks and concepts Hadoop Admin
  • Facilitating testing in different dimensions
  • This project involves File transmission and electronic data interchange trades capture, verify, process and routing operations, Banking Reports Generation, Operational management.
  • Developed and Modified Oracle Packages, Procedures, functions, Triggers as per the business requirements.
  • DBMS developments include building data migration scripts using Oracle SQL LOADER.
  • Used Crontab for automation of scripts.
  • Wrote and modified stored procedures to load and modifying of data according to business rule changes.
  • Worked on production support environment.

Environment: Apache Hadoop, Pig, Hive, SQOOP, Flume, Python, Java/J2EE, Oracle 11GJboss 5.1.0Application Server, Linux OS, Windows OS, etc

Hadoop Admin

Confidential, Edison, NJ

Responsibilities:

  • Installed and configured Hadoop and Ecosystem components in Cloudera and Hortonworks environments.
  • Installed and configured Hadoop, Hive and Pig on Amazon EC2 servers
  • Upgraded the cluster from CDH4 to CDH5 The tasks were first performed on the staging platform, before doing it on production cluster.
  • Enabled Kerberos and AD security on the Cloudera cluster running CDH 5.4.4. Implemented Sentry for the Dev Cluster
  • Configured MySQL Database to store Hive metadata.
  • Involved in managing and reviewing Hadoop log files.
  • Involved in running Hadoop streaming jobs to process terabytes of text data.
  • Worked with Linux systems and MySQL database on a regular basis.
  • Supported Map Reduce Programs those ran on the cluster.
  • Involved in loading data from UNIX file system to Confidential .
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • As a admin followed standard Back up policies to make sure the high availability of cluster.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Installed and configured Hive, Pig, Sqoop and Oozie on the HDP 2.2 cluster.
  • Managed backups for key data stores
  • Supported configuring, sizing, tuning and monitoring analytic clusters
  • Implemented security and regulatory compliance measures Streamlined cluster scaling and configuration
  • Monitoring cluster job performance and involved capacity planning
  • Works with application teams to install operating system and Hadoop updates, patches, Version upgrades as required.
  • Documented technical designs and procedures

Environment: Confidential, Hive, Pig, sentry, Kerberos, LDAP, YARN, Cloudera Manager, and Ambari

Linux/ Hadoop Admin

Confidential, Los Angeles, CA

Responsibilities:

  • Working on multiple projects spanning from Architecting, Installation, Configuration and Management of Hadoop Clusters.
  • Implemented authentication and authorization service using Kerberos authentication protocol.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (like Map Reduce, Pig, Hive, Sqoop) as well as system specific jobs.
  • Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into Confidential for analysis.
  • Migrated data from SQL Server to HBase using Sqoop.
  • Log data Stored in HBase DB is processed and analyzed and then imported into Hive warehouse, which enabled end business analysts to write HQL queries.
  • Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink.
  • Analyzed the alternatives for Confidential Data stores and intensive documentation for HBASE vs. Accumulo data stores.
  • Involved in setup, installation, configuration of OBIEE 11g in Linux operating system also integrating with the existing environment. Involved in trouble shooting of errors encountered. And worked with Oracle support to analyze the issue.
  • Maintained multiple application servers with latest version of Red hat Enterprise Linux 4.x-7.x.
  • Involved in migrating applications from Solaris to Linux(Red hat Enterprise Linux).
  • Developed various workflows using custom MapReduce, Pig, Hive and scheduled them using Oozie.
  • Responsible for Installing, setup and Configuring Apache Kafka and Apache Zookeeper.
  • Extensive knowledge in troubleshooting code related issues.
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.

Environment: Hadoop, Confidential, Map Reduce, Shell Scripting, Spark, Splunk, Solr, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zoo keeper, cluster health, monitoring security, RedHat Linux, Cloudera Manager.

Hadoop Administrator

Confidential, Minneapolis, MN

Responsibilities:

  • Monitored workload job performance and capacity planning using Cloudera Manager.
  • Involved in Analyzing system failures, identifying root cause and recommended course of actions.
  • Imported logs from web servers with Flume to ingest data into Confidential .
  • Retrieved data from Confidential into relational databases with Sqoop. Parsed cleansed and mined useful and meaningful data in Confidential using Map-Reduce for further analysis
  • Fine tuning Hive jobs for optimized performance. Installed, configured and deployed a Hadoop cluster for development, production and testing.
  • As a admin followed standard Back up policies to make sure the High availability of cluster.
  • Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
  • Partitioned and queried the data in Hive for further analysis by the BI team.
  • Involved in extracting the data from various sources into Hadoop Confidential for processing.
  • Effectively used Sqoop to transfer data between databases and Confidential . Worked on streaming data into Confidential from web servers using Flume.
  • Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
  • Involved in Map-Reduce programs to cleanse the data in Confidential obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
  • Used Hive data warehouse tool to analyze the unified historic data in Confidential to identify issues and behavioral patterns.
  • Automated workflows using shell scripts pull data from various databases into Hadoop
  • Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java Map-Reduce, Hive and Sqoop as well as system specific jobs.
  • Experienced with different kind of compression techniques like LZO, GZIP and Snappy.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Provisioning, building and support Linux servers both Physical and Virtual using VMware for Production, QA and Developers environment.

Environment: Cloudera Manager, Hadoop Confidential, MapReduce, HBase, Hive, Pig, Flume, Oozie, Sqoop, Shell Scripting, Hue, MySQL, Mango DB, AWS EC2, ETL, Bash, Service Now, JIRA.

Hadoop Administrator

Confidential, Albany, NY

Responsibilities:

  • Currently working as Hadoop administrator in Hortonworks distribution for 3 clusters ranges from POC clusters to PROD clusters.
  • Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
  • Monitoring systems and services, architecture design and implementation of hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Working knowledge on hive and its components and troubleshooting if any issues with hive.
  • Responsible for cluster availability and experienced on ON-call support.
  • Experienced in Setting up the project and volume setups for the new hadoop projects.
  • Involved in snapshots and Confidential data backup to maintain the backup of cluster data and even remotely.
  • Implementing the SFTP for the projects to transfer data from External servers to hadoop servers.
  • Installation of various Hadoop Ecosystems and Hadoop Daemons.
  • Experienced in managing and reviewing Hadoop log files.
  • Good knowledge on maintain MySQL databases creation and setting up the users and maintain the backup of databases.
  • Extensive knowledge on Confidential and YARN implementation in cluster for better performance.
  • Experienced in production support which involves solving the user incidents varies from sev1 to sev5.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
  • Worked with systems engineering team to plan and deploy new hadoop environments and expand existing hadoop clusters.
  • Monitored multiple hadoop clusters environments using Nagios. Monitored workload, job.

Environment: Hortonworks 2.1.7, Hive, pig, Sqoop, Flume, Zookeeper and HBase, MYSQL Shell Scripting, Redhat Linux

Linux Admin

Confidential

Responsibilities:

  • The project plan is to build and setup Big data environment and support operations. Effectively manage and monitor the Hadoop cluster (152 nodes) through Cloudera Manager.
  • Implement and test integration of BI (Business Intelligence) tools with Hadoop stack.
  • Monitor and manage 152 nodes Hadoop Cluster in production with 24x7 on-call support.
  • Automated scripts for on board access to new users to Hadoop applications and setup Sentry Authorization.
  • Completed Talend and Pentaho training and installed the tools for POC in R&D line.
  • Installed and configured Confluent Kafka in R&D line. Validated the installation with Confidential connector and Hive connectors.
  • Install and configure Data science tools like R, H20 and Dataiku for Analytics.
  • Part of POC for Oracle Golden gate big data connector to Hadoop.
  • Worked on importing and exporting data from Oracle and DB2 into Confidential and HIVE using Sqoop.
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, Cassandra and slots configuration.
  • Installed, configured, and administered a small Hadoop clusters consisting of 10 nodes. Monitored cluster for performance and, networking and data integrity issues.
  • Automation scripts/python code for better Hadoop monitoring and maintenance.
  • Proactively address Bigdata application performance issues for Business critical applications like Npevent (Network Platform events)
  • Configure Cloudera Manager Triggers and Nagios plugins to automate alerts for critical Hadoop services.
  • Configured YARN Dynamic resource management and fair scheduler allocation for better resource management and allocation in Hadoop.
  • Added 76 new nodes to the existing Hadoop cluster and load balanced the cluster for data distribution across nodes

Environment: Hadoop 1x, Hive, Pig, HBASE, Sqoop, Flume, Python, Zookeeper, Pig, Confidential, Ambary.

Linux Admin

Confidential

Responsibilities:

  • Installing and updating packages using YUM.
  • Installing and maintaining the Linux servers.
  • Created volume groups logical volumes and partitions on the Linux servers and mounted file systems and created partitions.
  • Deep understanding of monitoring and troubleshooting mission critical Linux machines.
  • Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
  • Used Oozie scripts for deployment of the application and perforce as the secure versioning software.
  • Performed unit testing, system testing and integration testing.
  • Provided input to the documentation team.
  • Hands on experience with working on Spark using both Scala and Python. Performed various actions and transformations on spark RDD' Confidential and DataFrames.
  • Test Driven Development (TDD) process is used in developing the application.
  • Developed User interface using Struts MVC frame work. Implemented JSP' Confidential using struts tag libraries and developed action classes.
  • Monitored cluster for performance and, networking and data integrity issues.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files
  • Involved in implementing different J2EE design patterns like Session Facade, Message Facade, and Service Locator.
  • Developed Hibernate configuration files and java persistence classes for Mapping DB.
  • Ensured data recovery by implementing system and application level backups.
  • Monitoring System Metrics and logs for any problems.
  • Adding, removing, or updating user account information, resetting passwords, etc.
  • Analyze existing automation scripts and tools for any missing efficiencies or gaps.
  • Support internal and external teams in relation to information security initiatives

Environment: Java, J2EE, Struts 1.2, JSP, Hibernate 3.0, Spring 2.0, Servlets, JMS, XML, Python, SOAP, JDBC, ANT, HTML, JavaScript, CSS, UML, JAXP, CVS, Log 4J, JUnit, WebLogic 10.3, Eclipse 3.4.1, Oracle 10g.

We'd love your feedback!