Hadoop Engineer Resume Houston, TX - Hire IT People

PROFESSIONAL SUMMARY:

Experience in deploying and managing the multi - node Hadoop cluster with different Hadoop components (Hive, Pig, Sqoop, Oozie, Tez, Flume, Hcatalog, Hbase, Zookeeper)on Cloudera and Horton works clusters .
Hands on experience in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS)-EC2, Rackspace and on private cloud infrastructure - OpenStack cloud platform
Maintained Hadoop clusters for dev/staging/production
Creation of Volumes, Security group rules, Key pairs, Floating IPs, Images and Snapshots and Deploying Instances on OpenStack
Hands-on experience with “Productionalizing” Hadoop applications such as administration, configuration management, debugging and performance tuning
Worked with application team via scrum to provide operational support, install Hadoop updates, patches and version upgrades as required
Experience in performance tuning for Mapreduce, Hive andSqoop
Monitor Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager
Worked with system engineering team to plan and deploy Hadoop hardware and software environments
Experience with cluster maintenance tasks such as Commissioning and decommissioning of nodes, cluster monitoring and troubleshooting. Manage and review Hadoop log files
Worked on Disaster management with Hadoop cluster
Experience installing and implementing security using SSL encryption for Hadoop clusters with Kerberos
Strong knowledge in configuring Namenode high availability and Namenode federation
Designed ingestion framework using flume for streaming logs and aggregated data into HDFS. Built data transform framework using MapReduce and Pig
Designed, delivered and helped manage a device data analytics at a very large storage vendor
Experience in benchmarking, performing backup and recovery of Namenode metadata and data residing on the cluster
Well experienced in managing DHCP, PXE with Kickstart, DNS and NFS and used them in building infrastructure in Linux Environment and working with Puppet for application deployment
Strong knowledge on work automation using Chef and Puppet
Hands on experience in developing Sqoop jobs to import the data from RDBMS sources like MySQL, PostgreSQL into HDFS as well as exporting vice versa.
Experience in writing workflows using ApacheOozie with job controllers like Hive and Sqoop.
Excellent working knowledge of HBase and data pre-processing using FLUME-ng.
Strong knowledge on Nosql databases Like Hbase, Cassandra and MongoDB
Strong knowledge on implementing Kafka .
Knowledge on Apache Tez, a MapReduce alternative to Hive for speedy execution of queries in HiveQL.
Worked with big data teams to move ETL tasks to Hadoop
Hands on experience with AWS based deployments on S3,EC2-Auto-scaling, Elastic Search, etcin a high available production environment.
Good knowledge in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON, Web Services and XML files.
Strong knowledge on new technologies like Spark, Shark, Docker and heavy containers to catch up with industry developments
Effective problem solving skills and outstanding interpersonal skills. Ability to work independently as well as within a team environment. Driven to meet deadlines

TECHNICAL SKILLS:

Languages: C, C++, Java

Big Data Ecosystem: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop,TezOozie, Flume, ZooKeeper

Security: Kerberos

Hadoop management tools: Cloudera Manager, Ambari, Ganglia, Nagios

Databases: Oracle, MySQL, SQL Server

Scripting language: shell, Python

Software Development Tool: Rational Rose, Eclipse, NetBeans

Web Development: PHP, HTML, AJAX, Java script, CSSWeb Designing Tools: Adobe Photoshop, Adobe Dream Weaver

Nosql Databases: Hbase, Cassandra

Operating Systems: Windows, Linux - Redhat, CentOS

Build Tools: Maven, ANT

PROFESSIONAL EXPERIENCE:

Hadoop Engineer

Confidential, Houston, TX

Responsibilities:

Actively participated with the development team to meet the specific customer requirements and proposed effective Hadoop solutions
Worked closely with application developers and end-users for analyzing requirements
Responsible for architecting both Horton works and Cloudera Clusters.
Installed, configured and monitored Horton works Ambari and Cloudera Manager
Configured and deployed cluster monitoring tools Nagios and Gangila.
Responsible for Batch architecture and cluster migration planning
Excellent knowledge On VPC security configuration and in network access for AWS
Installed Ambari 2.0 and CDH 4.7 and CDH 5.1.3 with HA.
Implemented high-availability for Management Services using NFS and Heartbeat
Provided high-availability for Management Services components like - host monitor, service monitor, activity monitor, event server, reports manager
Tested the high-availability architecture with enabling/disabling SSL, enabling/disabling Kerberos, Management Services directories as NFS mount
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive, Sqoop and a few system specific jobs
Development of Pig scripts for handling the raw data for analysis
Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns
Worked extensively in writing Sqoop scripts for importing and exporting of data from RDBMS
Responsibility of extracting the data from various sources into Hadoop HDFS for processing.
Worked on streaming the real time log data into HDFS from web servers using flume.
Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
Implemented Hive queries using indexes and buckets for time efficiency
Designed and implemented Pig UDF’s for evaluation, filtering, loading and storing of data.
Hands on experience in installing Hbase, Hregion servers and creating HBase tables .
Integrated Cognos and Informatica to access hive tables from the cluster.

Environment: JDK1.6, CentOS, FLUME, HBase, HDFS, Maven, Spark,, Map-Reduce, Hive, Oozie,RedHat Zookeeper, sqoop, pig

Hadoop Admin

Confidential, Santa Clara, CA

Responsibilities:

Responsible for architecting Hadoop clusters with CDH4 on CentOS, managing with Cloudera Manager.
Involved in installing Hadoop, Hive, Hbase, Zookeeper, Flume and Oozie.
Worked closely with data analysts to construct creative solutions for their analysis tasks
Lead end-to-end efforts to design, develop, and implement data warehousing and business intelligence solutions.
Worked on performing major upgrade of cluster from CDH3u6 to CDH4.4.0
Implemented Cloudera Manager on existing cluster
Optimized our Hadoop infrastructure at both the software and hardware level
Ensured our Hadoop clusters are built and tuned in the most optimal way to support the activities of our Big Data teams
Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop
Installed, Configured and managed Flume Infrastructure
Created tables in Hive and loaded the structured (resulted from MapReduce jobs) data
Configured Hive metastore to use MySQL Database, to make available all the tables created in Hive different users simultaneously
Using HiveQL developed many queries and extracted the business required information
Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data
Installed and configured Datastax Cassandra cluster

Environment: JDK1.6, CentOS, FLUME, HBase, HDFS, Maven, Redis, Cassandra, Map-Reduce, Hive, Oozie, Zookeeper.

Hadoop Consultant

Confidential, Kansas City, MO

Responsibilities:

Installed and configured Cloudera Hadoop on a 24 node cluster.
Loaded data from Oracle database into HDFS.
Analyzed the Big Data business requirements and transformed it into Hadoop centric technologies.
Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig.
Developed MapReduce pipeline jobs in Apache Crunch to process the data and create necessary HFiles.
Loaded the created HFiles into HBase for faster access of large customer base without taking Performance hit.
Used apache Maven for project build.
Performed unit testing of MapReduce jobs on cluster using MRUnit.
Used Oozie scheduler system to automate the pipeline workflow.
Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews, test development, test automation.
Implemented data serialization using apache Avro.
Involved in story-driven Agile development methodology and actively participated in daily scrum meetings.
Analyzed Business Requirement Document written in JIRA and participated in peer code reviews in Crucible.

Environment: Cloudera Hadoop, MapReduce, HDFS, Crunch, HBase, Avro, Oozie, Java (jdk1.6), JIRA, Crucible, GitHub, Maven.

Linux/ Database Administrator

Confidential

Responsibilities:

Installing and maintaining the Linux servers
Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers
Monitoring System Metrics and logs for any problems.
Running cron-tab to back up data
Adding, removing, or updating user account information, resetting passwords, etc
Creating and managing Logical volumes. Using Java JDBC to load data into MySQL
Maintaining the MySQL server and Authentication to required users for databases
Installing and updating packages using YUM

We provide IT Staff Augmentation Services!

Hadoop Engineer Resume

Houston, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship