We provide IT Staff Augmentation Services!

Hadoop Engineer Resume

4.00/5 (Submit Your Rating)

Santa, Clara-cA

SUMMARY:

  • 8 years of experience in design, development and implementations of robust technology systems, with specialized in Hadoop Administration(Cloudera and Hortonworks distributions), Big Data, Linux Administration
  • Experience including Hadoop Development and Ecosystem Analytics, Development and Design of Java based enterprise applications.
  • Experience in using various Hadoop infrastructures such as Map Reduce, Pig, Hive, Zookeeper, HBase, Sqoop, YARN, Spark, Kafka, Oozie, and Flume for data storage and analysis.
  • Expertise in Commissioning and Decommissioning the nodes in Hadoop Cluster.
  • Collecting and aggregating a large amount of Log Data using Apache Flume and storing data in HDFS for further analysis, Job/workflow scheduling and monitoring tools like Oozie.
  • Experience in designing both time driven and data driven automated workflows using Oozie
  • Worked in complete Software Development Life Cycle (analysis, design, development, testing, implementation and support) using Agile Methodologies.
  • Hands on experience in developing and deploying enterprise based applications using major components in Hadoop ecosystem like Hadoop 2.x, YARN, Hive, Pig, Map Reduce, Spark, Impala, Kafka, Storm, Oozie, HBase, Flume, Sqoop and Zookeeper.
  • Installed and configured various Hadoop distributions like CDH - 5.7 and HDP 2.2 and higher versions.
  • Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop cluster using Nagios and Ganglia.
  • Worked independently with Cloudera support and Hortonworks support for any issue/concerns with Hadoop cluster.
  • Extensive experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
  • Strong experience in designing and developing Business Intelligence solutions in Data Warehouse/Decision Support Systems using ETL tools, Informatica Power Center 8.x, 9.x.
  • Working noledge of a variety of Relational DBMS products, with experience in designing and programming for relational databases, including Oracle, SQL Server, Teradata, DB2
  • Experience in Sentry, Ranger, Knox configuration to provide the security for Hadoop components.
  • Good experience on Design, configure and manage the backup and disaster recovery for Hadoop data.
  • Experience in setting up of Hadoop cluster in cloud services like AWS and Azure.
  • Knowledge on AWS services such as EC2, S3, Glaciers, IAM, EBS, SNS, SQS, RDS, VPC, Load Balancers, Auto scaling, Cloud Formation, Cloud Front and Cloud Watch.
  • Experience in Linux System Administration, Linux System Security, Project Management and Risk Management in Information Systems.
  • Involved in the functional usage and deployment of applications to Oracle WebLogic, JBOSS, Apache Tomcat, Nginx and WebSphere servers.
  • Experience on working with VMware Workstation and Virtual Box.

PROFESSIONAL EXPERIENCE

Confidential . SANTA CLARA-CA

HADOOP ENGINEER

Responsibilities:

  • Worked as Engineer for Hadoop Cluster (180 nodes).
  • Performed Requirement Analysis, Planning, Architecture Design and Installation of the Hadoop cluster
  • Experience in Upgrades and Patches and Installation of Ecosystem Products through Ambari.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Spark, Kafka, Impala, Zookeeper, Hue and Sqoop using both Cloudera and Hortonworks.
  • Automated the configuration management for several servers using Chef and Puppet.
  • Monitored job performances, file system/disk-space management, cluster & database connectivity, log files, management of backup/security and troubleshooting various user issues.
  • Responsible for day-to-day activities which include HDFS support and maintenance, Cluster maintenance, creation/removal of nodes, Cluster Monitoring/Troubleshooting, Manage and review Hadoop log files, Backup restoring and capacity planning.
  • Provided guidance to users on re-writing their queries to improve performance and reduce cluster usage.
  • Provided regular user and application support for highly complex issues involving multiple components such as Hive, Impala, Spark, Kafka, MapReduce.
  • Design and deployment of clustered HPC monitoring systems, including a dedicated monitoring cluster.
  • Develop and document best practices, HDFS support and maintenance, Setting up new Hadoop users.
  • Real time streaming the data using Spark with Kafka.
  • Implementing Hadoop Security on Hortonworks Cluster using Kerberos and Two-way SSL
  • Included DBA Responsibilities like data modeling, design and implementation, software installation and configuration, database backup and recovery, database connectivity and security.
  • Built data platforms, pipelines, and storage systems using the Apache Kafka, Apache Storm and search technologies such as Elastic search.
  • Created Kafka topics, provide ACLs to users and setting up rest mirror and mirror maker to transfer the data between two Kafka clusters.
  • Installed gridgain for caching the data and read the data instead of Accumulo tables.
  • Implemented concepts of Hadoop eco system such as YARN, MapReduce, HDFS, HBase, Zookeeper, Pig and Hive.
  • Designed the high level and low level design for the data transformation process using PIG scripts and UDFs
  • Written test scripts for test driven development and continuous integration.
  • Migrated data across clusters using DISTCP.
  • In charge of installing, administering, and supporting Windows and Linux operating systems in an enterprise environment.
  • Involved in Installing and configuring ranger for the autantication of users and Hadoop daemons.
  • Experience in methodologies such as Agile, Scrum, and Test driven development.
  • Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration, datawarehouse, and Migration, and installation on Kafka.
  • Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
  • Responsible for ETL technical design discussions and prepared ETL high level technical design document.
  • Translate requirements and high-level design into detailed functional design specifications.
  • Involved in Data Validating, Data integrity, Performance related to DB, Field Size Validations, Check Constraints and Data Manipulation

ENVIRONMENT: Hadoop, Map Reduce, Cassandra, HDFS, Hortonworks Cluster,Pig, GIT, Jenkins, kafka, Puppet, Ansible, Maven Spark, Yarn, HBase, Oozie, MapR, NoSQL, ETL, MYSQL, agile, Windows, UNIX Shell Scripting

Confidential, SAN MATEO-CA

HADOOP ADMIN

Responsibilities:

  • Installed and configured Hadoop and Ecosystem components in Cloudera and Hortonworks environments.
  • Configured Hadoop, Hive and Pig on Amazon EC2 servers, Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
  • Installed and configured Hive, Pig, Sqoop and Oozie on the HDP 2.2 cluster and Implemented Sentry for the Dev Cluster.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop. Worked on tuning the performance Pig queries.
  • Converted ETL operations to Hadoop system using Pig Latin Operations, transformations and functions.
  • Implemented best income logic using Pig scripts and UDFs Capturing data from existing databases dat provide SQL interfaces using Sqoop.
  • Worked on YARN capacity scheduler by creating queues to allocate resource guarantee to specific groups.
  • Implemented Hadoop stack and different bigdata analytic tools, migration from different databases to Hadoop (Hdfs).
  • Developed backup policies for HADOOP systems and action plans for network failure.
  • Involved in the User/Group Management in Hadoop with AD/LDAP integration.
  • Resource management and load management using capacity scheduling and appending changes according to requirements.
  • Implemented strategy to upgrade entire cluster nodes OS from RHEL5 to RHEL6 and ensured cluster remains up and running.
  • Developed scripts in shell and python to automate lot of day to day admin activities.
  • Implemented HCatalog for making partitions available for Pig/Java MR and established Remote Hive metastore using MySQL.
  • Installed several projects on Hadoop servers and configured each project to run jobs and scripts successfully.

ENVIRONMENT: Cloudera Manager 4&5, Hortonworks, Ganglia, Tableau, Shell Scripting, Oozie, Pig, Hive, Flume, bash scripting, Teradata, Kafka, Impala, Oozie, Sentry, CentOS, Terradata.

Confidential, SAN ANTONIO-TX

SAN/NAS ADMINISTRATOR

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Managing and Scheduling Jobs on a Hadoop cluster.
  • Involved in taking up the Backup, Recovery and Maintenance.
  • Shuffle algorithm, direct access to the disk, built-in compression, and code written in Java.
  • Worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
  • Developed PIG scripts to extract the data from the web server output files to load into HDFS.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Involved in commissioning and decommissioning at the time of node failure.
  • Regular user and application support, Used Cloudera connectors for improving the performance when importing and exporting data.
  • Running transformation on the data sources using Hive and Pig.
  • Written scripts for automating the processes, Scheduled the jobs with Oozie.
  • Good understanding of hive partitions & understanding of types of file formats in HDFS.
  • Processed XML's files using Pig.
  • Install and maintain CDH cluster, Installation include HDFS, MR, Hive, Pig, Oozie, Sqoo p.
  • Worked on Hadoop Development cluster maintenance including metadata backups and upgrades.
  • Worked actively with the DevOps team to meet the specific business requirements for individual customers and proposed Hadoop solutions.
  • Involved in migration of Informatica ETL Code, Database objects and flat files through from development to different environments.
  • Setting up ODBC, Relational, Native and FTP connections for Oracle, DB2, SQL server, VSAM and flat file.

ENVIRONMENT: Hadoop, HDFS, Map Reduce, Sqoop, Oozie, Pig, Hive, Flume, LINUX, Java, C++, Eclipse,Terradata.

Confidential

System Administrator

Responsibilities:

  • Administer and maintain the Windows 2003 and 2008 Active Directory Infrastructure for Production.
  • Migration/Move multiple application and print servers including data, shares and printers from Windows 2003 to Windows 2008
  • Created multiple Device Groups depending the application and requirements of the environment.
  • Provide user account administration for the distributed server environment and infrastructure Applications.
  • Knowledge in Installing and configuring ESX 3.0/3.5 server,Configuring DRS and HA in Vsphere Knowledge in Fault tolerance, Migrating Virtual machines using Vmotion.
  • Performing Storage V-Motion,Installing Virtual Centre and managing ESX hosts through VC.
  • Deploying Vm’s with clones and Templates,Hot adding devices to virtual machines
  • Providing high availability to Vm’s.
  • Responsible for the general operation of the distributed server environment, including performance, reliability and efficient use of network resources.
  • In-charged overseeing at the systems Allocated storage on EMC DMX-3, DMX-4, DMX 3000's/2000's and CX600/700's.
  • Data Replication (BCV, SRDF)Used migration techniques like Array based me.e. SRDF/Open replicator to migrate the data from Old S DMX
  • 3000/DMX 2000 to DMX-4/DMX-3 storage systems in UNIX, Windows, Linux and AIX environment for online/offline data migration.
  • Implemented Business Continuance features like EMC Time Finder in Symmetrix/DMX arrays.
  • Worked on NetApp SnapMirror, Flexvol, Snapshots, Netapp Filer view, Netapp Management console, Implementations of Aggregates, FAS 3200, Vseries 3200.
  • Analyzed and maintained performance data to ensure optimal usage of the storage resources available.
  • Created Raid groups, storage groups and bound the Clariion luns to the hosts using navisphere manager and navicli.
  • Created larger luns (metas) to support the application needs using SYMCLI.
  • Planned and configured File systems with CIFS and NFS protocol and implementation in multiprotocol environment
  • Worked on NetApp SnapMirror, Flexvol, Snapshots, Netapp Filer view, Netapp Management console, Implementations of Aggregates, FAS 3200, Vseries 3200.
  • Configured CIFS servers and VDM's for Windows only environment. Protected through DPM 2006,2007, SP1 2012

ENVIRONMENT: Symmetrix DMX 3000, VNX, Clariion CX3-80, CX3-20, CX3-10c, CX700, CX300, CX500,Maintained NetApp FAS 270, 960 and 3040 series, Brocade 5300 and 4800.

We'd love your feedback!