Hadoop Engineer Resume
5.00/5 (Submit Your Rating)
West Chester, PA
PROFESSIONAL SUMMARY:
- 7+ years of professional experience in IT field and 3+ years of experience in maintaining, monitoring, deploying and upgrading Apache Hadoop Clusters and Cloudera.
- Experience in deploying versions of MRv1 and MRv2 (YARN).
- Strong knowledge on configuring High Availability and Namenode Federation in a cluster.
- Hands on experience working on Hadoop ecosystem components like Hadoop Map Reduce, HDFS, Zoo Keeper, Oozie, Hive, Cassandra, Sqoop, Pig, Flume.
- Experience using scripting languages like Pig to manipulate data.
- Strong understanding on Hadoop architecture and MapReduce framework.
- Experience in designing, implementing and managing Secure Authentication mechanism to Hadoop cluster with Kerberos.
- Strong knowledge on setting up automatic failover control and manual failover control using zookeeper and quorum journal nodes.
- Experience in upgrading the existing Hadoop cluster to latest releases.
- Experience in working with Flume to load the log data from multiple sources directly into HDFS.
- Hands on experience in installing, configuring and deploying Hadoop distributions in cloud environments (Amazon Web Services)
- Experience in performing joins with Map - Reduce and Hive
- Experience in configuring Hadoop based monitoring tools- Nagios, Ganglia.
- Experience in transferring data between HDFS and Relational Database with Sqoop.
- Experience in using Flume to load logs from multiple sources directly into HDFS.
- Good understanding of distributed systems and parallel processing architectures.
- Hands on experience in upgrading, applying patches for Cloudera distribution
- Good knowledge on Cassandra, MongoDB, Netezza and Vertica.
- Good Understanding of Distributed Systems and Parallel Processing architecture.
- Developed administrative scripts like Kickstart scripts with Shell
- Experience in running cron-tab to back up data and schedule jobs.
TECHNICAL SKILLS:
Hadoop Ecosystem Development: HDFS, Hive, Pig, Flume, Oozie, Zookeeper, HBase and Sqoop.
Operating System: Linux, Windows XP, Server 2008.
Databases: MySQL, Oracle, SQL Server
NoSQL: Cassandra
Languages: C,C++,Java, Python, PIG Latin, Linux, shell scripting, Hive QL.
Network administration: TCP/IP fundamentals, LAN and WAN.
Cluster Management tools: Cloudera Manager, Ambari
Security: Kerberos
PROFESSIONAL EXPERIENCE:
Confidential, West Chester, PA
Hadoop Engineer
Responsibilities:
- Major tasks included upgrading, maintaining and monitoring cluster and its clients
- Gathered requirements from Engineering and Reporting teams to design solutions on the Hadoop ecosystem
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Deployed a cluster using CDH integrated with Nagios and Ganglia for monitoring
- Deployed remote Hive Metastore using MySQL
- HA implementation of HDFS to avoid single point of failure (SPOF) using Quorum Journal Manager (QJM) .
- Imported weblogs from the web servers into HDFS using Flume.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Maintained, audited and built new clusters for testing purposes using the Cloudera manager.
- Worked on analyzing data with Hive and Pig.
- Custom monitoring scripts for Nagios to monitor the daemons and the cluster status
- Development of Pig scripts for handling threat analysis raw data to be analyzed
- Performed a POC on Sqoop imports from heterogeneous data sources to HDFS
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports.
- Automated jobs like start, stop, suspend, resume, rerun using Oozie.
- Custom shell scripts for automating redundant tasks on the cluster
- Debug and solve the major issues with Cloudera manager by interacting with the Cloudera team from Cloudera.
- Integrated Cassandra Querying Language called CQL for Apache Cassandra.
Confidential, Louisville, KY
Hadoop Engineer
Responsibilities:
- Installed and configured various components of Hadoop ecosystem and maintained their integrity.
- Commissioned DataNodes when data grew and decommissioned when the hardware degraded.
- Imported the data to and from RDBMS to HDFS using Sqoop.
- Installed and configured Ganglia and Nagios to monitor Hadoop clusters.
- Managed Hadoop clusters using Cloudera Manager. Performed a major upgrade in Cloudera distribution for Hadoop.
- Implemented security for the Hadoop Cluster using Kerberos authentication.
- Written MR jobs, custom UDF’s in Hive and Pig on HDFS data to analyze the customer behavior.
- Experience in performing joins with Map-Reduce and Hive
- Used Pig to analyze large amounts of data using Pig's query language.
- Good knowledge on handling the xml data and used map reduce jobs to parse the data.
- Developed event driven ELT applications that supports data flows from RDBMs result sets and log files into HDFS.
- Managed day to day operations of the cluster for backup, support and maintenance
- Converted some existing SQL queries to HIVE queries
- Configured password less SSH for Nodes.
- Applied patches to the Apache Hadoop source package with Mellanox provided jar files.
- Involved in loading data from UNIX file system to HDFS.
Confidential
Network Administrator
Responsibilities:
- Worked closely with infrastructure, network, database and application teams to ensure business applications are highly available and performing within service levels
- Installed, configured, upgraded and administrated Red Hat and other Linux Operating Systems
- Managed patching, monitoring system performance and network communication, backups, risk mitigation, troubleshooting, application enhancements, software upgrades and modifications of the Linux servers
- Reviewed existing software/hardware architecture to identify areas for improvement in the areas of scalability, maintainability, and performance
- Created and maintained user account information, setup security policies for users
- Verified that peripherals are working properly
- Developed administrative scripts like kickstart scripts with Shell
- Perform back up, file replications and script management for servers
- Conducted root cause analysis and resolved production problems and data issues
- Built different servers like Web Servers, FTP server, NFS server, Mail severs, NTPS servers
- Performed various configurations which include Networking and Iptable, resolving host names and SSH keyless login.