hadoop Engineer Resume
San Jose, CA
SUMMARY:
- 7+ years of IT experience in full life cycle (SDLC) of the software development process including requirement gathering, analysis, design, development, writing technical specifications, and interface development using Object Oriented Methodologies and RDBMS like MySQL, Oracle.
- Upto3 years of experience in developing and administering Hadoop and Big Data technologies like Hadoop HDFS, Map - Reduce, Pig, Hive, Oozie, Flume, HCatalog, Sqoop, zookeeper and NoSql like Cassandra .
- Diverse experience utilizing Java tools in business, Web, and client-server environments including Java Platform, Enterprise Edition (Java EE), Enterprise Java Bean (EJB), JavaServer Pages (JSP),Java Servlets, and Java database Connectivity (JDBC) technologies.
- In depth and extensive knowledge of Hadoop Architecture and various components such as HDFS, MapReduce, NameNode, DataNode, Secondary NameNode, JobTracker, TaskTracker.
- Experience inDesigning, developing and deploying Hadoop eco system technologies such as HDFS, ZooKeeper MapReduce, Pig, Hive, Oozie, Flume, Hue and Sqoop.
- Implemented MapReduce 2.0 (MRv2) or YARNon the Hadoop cluster.
- Experience in deploying Spark to run with YARN and connect to HiveServer to run Spark-SQL commands.
- Experience in working with multiple file formats JSON, XML, Sequence Files and RC Files by using SerDes.
- Has high-end technical driven experience in analyzing data by writing Hadoop MapReduce jobs using Java.
- Expertise in optimization of mapreduce algorithms using Combiners, Partitioners and Distributed Cache to deliver best results.
- Hands on experience in analyzing data by developing custom UDF's and scripts in Pig.
- Experience in using HCatalog to transfer data in between Pig andHive.
- Hands on experience in implementing and optimizing the queries using HiveQL by implementing partitioning and custom UDF's.
- Experience in importing streaming logs and aggregating the data to HDFS through Flume.
- Hands on experience in scheduling Oozie workflow engine to run multiple Flume, MapReduce, Hive and pig jobs.
- Experience in working with relational databases and integrating with Hadoop infrastructure and pipe the data to HDFS using Sqoop.
- Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Nagios and Ganglia.
- Hands on experience in writing custom Shell Scripts for system management and to automate redundant tasks.
- Experience in writing Python, Shell and Java scripts to monitor the Cluster usage metrics.
- Experience in integrating Hadoop job workflows using Pentaho Data Integration tool.
- Experience in integrating Hadoopwith Enterprise BAtools using Tableau and Pentaho Business Analytics toolto generate reports.
- Experience in installing and Customizing Pentaho Data Integration and Pentaho Business Analytics tools to use Active Directory, LDAP and Custom Header authentication.
- Hands on experience in implementing Junit, Mrunit and Pig unit test cases.
- Experience in managing Maven Plug-in in Eclipse for programming.
- Hands on experience in scheduling the jobs through Tidal Enterprise Scheduler (TES).
- Experience in Installing, configuring and loading data to ElasticSearch Cluster.
- Experience in installing the Static Kibana server to connect to Elasticsearch cluster and generate reports.
- Well versed knowledge and experience in public cloud environment - Amazon Web Services (AWS), RackSpace and on private cloud infrastructure - OpenStack cloud platform.
- Experience in writing OpenStack Heat Templates in YAML to auto deploys Hadoop cluster servers with pre cluster configurations including the required tools like Hive, Pig, Sqoop, and HBase etc.
- Hands on Knowledge on writing and submittting jobs in AmazonElastic Map Reduce.
- Experience in source control using version control tools like GIT, SVN.
- Working experience in configuring and working with automation tools RunDeck.
- Working experience and configuration of continuous integration tool Jenkins.
- Hands on Experience in configuring and using Apache Load balancers with mod proxy.
- Good programming experience in SQL,PL/SQL, Complex Stored Procedures and Triggers.
- Experience in working with multiple RDBMS including Oracle 11i/9i/8i, PostgreSQL, SQL Server and MS Access.
- Experienced in installing, configuring, and administrating multi-node Hadoop cluster of major Hadoop distributions using Cloudera, MapR and Hortonworks.
- Hands on experience in doing Capacity planning and Cluster building based on the requirement.
- Fluid understanding of multiple programming languages, including JAVA, C, C++, HTML, XML, and PHP.
- Ambitious self-starter who plans, prioritizes, and manages multiple technical tasks within deadline-driven environments.
- Excellent communications skills. Adept at building strong working relationships with coworkers and management.
TECHNICAL PROFICIENCIES
Big Data Distributions: Cloudera Hadoop Distributions, MapR, Hortonworks
Hadoop Ecosystem: HDFS, MapReduce, Yarn, Hive, Pig, Flume, Sqoop, Oozie, Zookeeper, Spark, HBase, HCatalog, ElasticSearch, Kibana, LogStashSecurity: Kerberos, LDAP, Active Directory
Databases: Oracle, MySQL, PostgreSQL, MS-Access
NoSql Database: Cassandra
Languages: Shell Scripts, Python, C, C++, Java, Java Script, JSP, HTML, XML, SQL, YAML, Scala
Technologies: J2EE, JDBC, Servlets, JSP, Web Services, XML
Operating Systems: Windows XP/7/8, Macintosh, Ubuntu, Linux CentOS6
WORK EXPERIENCE:
Confidential, San Jose, CA
Hadoop Engineer
Responsibilities:
- Installed Cloudera Manager on an already existing Hadoop cluster.
- Actively participated with developing team to meet the specific customer requirements and proposed effective Hadoop solutions.
- Developed Spark scripts by using Scala Shell commands as per the requirement.
- Managed MapR distribution environment with size of 110 node of produciton, 80 node of stage and 50 node of dev cycle.
- Loading different types of file formats like csv, json from different sources to HDFS for processing.
- Responsible for collecting Data required for testing various Map Reduce applications from different sources.
- Monitored disk, Memory, Heap, CPU utilization on all Master and Slave machines using Cloudera Manager.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked on Pentaho Data Integration tool to design the job work flows to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
- Developed and implemented core API services using Scala and Spark.
- Worked on High Availability for Name Node using Cloudera Manager to avoid single point of failure.
- Worked in installing, configuring and developing an ElasticSearch cluster with static Kibana to load the reports with the help of data in Hive databases.
- Worked on developing the sample scripts to test the Spark and Spark-SQL functionality.
- Wrote Custom scripts using Shell, Python and java to monitor the cluster using Nagios and Ganglia.
- Developed the deployment packages using Jenkins with the help of version controls like SVN, GIT.
- Worked in installing and Configuring Rundeck to do the automation and remote deployment.
- Effectively used Tidal Enterprise Scheduler to schedule the jobs.
- Worked with team in installing and configuring the Hadoop Cluster with Ecosystem tools like hive, Yarn, Spark, Sqoop, hbase, hcatalog, Pig.
- Installed and Configured montioring tools like Nagios, to analyze and improve the performane of cluster.
- Developed OpenStack Heat Templates in YAML to deploy and configure Hadoop cluster in Cloud.
- Configuring and using Apache Load balancers with mod proxy for the pentaho BI reports to face the customers.
- Debugging and troubleshooting the issues related to Big Data Configuration and performance.
- Worked with BI team to generate the reports on Pentaho Business Analytics tool.
- Worked with application team in tuning the performance of the Big Data tools like hive, MapReduce, ElasticSearch, Kibana.
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
- Involved in Minor and Major Release work activities.
Hadoop Engineer
Responsibilities:
- Actively participated with the development team to meet the specific customer requirements and proposed effective Hadoop solutions.
- Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
- Performed various configurations which includes, networking and iptable, resolving hostnames, user accounts and file permissions, http, ftp, SSH key less login.
- Installed, configured and deployed a 30 node Hadoop cluster for development, production and testing.
- Installed and configured Cloudera Manager and brought cluster under Manager.
- Deployed Network file system for NameNode Meta data backup.
- Custom monitoring scripts for Nagios to monitor the daemons and the cluster status.
- Worked on setting up NameNode high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- Involved in extracting the data from various sources into Hadoop HDFS for processing.
- Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, hive and Sqoop.
- Implemented Sqoop to transfer data between RDBMS databases and HDFS.
- Involved in loading data from UNIX file system to HDFS.
- Worked on streaming the data into HDFS from web servers using Flume.
- Configured flume nodes for data ingress from heterogeneous sources.
- Wrote custom interceptors for multiplexing the data to different sinks.
- Applied MapReduce framework jobs in java for data processing by installing and configuring Hadoop, HDFS.
- Supporting Hadoop developers and assisting in optimization of MapReduce jobs, Pig Latin scripts and Hive Scripts.
- Automating jobs using Oozie workflow engine to chain together Shell scripts, Flume, MapReduce jobs, Hive and pig scripts
- Used Tidal enterprise scheduler to schedule and automate the daily jobs.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Worked with big data Analysts, designers and scientists in troubleshooting map reduce job failures and issues with Hive, Pig, Flume etc.
- Debugging and troubleshooting the issues in development and Test environments.
- Involved in Minor and Major Release work activities.
Hadoop Engineer
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, hive and Sqoop.
- Implemented Sqoop to transfer data between RDBMS databases and HDFS.
- Involved in loading data from UNIX file system to HDFS.
- Worked on streaming the data into HDFS from web servers using Flume.
- Applied MapReduce framework jobs in java for data processing by installing and configuring Hadoop, HDFS.
- Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a map reduce way.
- Worked on Implementing Hive Custom UDF's and UDAF's in java to process and analyze data.
- Worked on custom Pig Loaders and Storage classes to work with semi-structured data and unstructured data.
- Supporting Hadoop developers and assisting in optimization of MapReduce jobs, Pig Latin scripts and HiveScripts.
- Automating jobs using Oozie workflow engineto chain together Shell scripts, Flume, MapReduce jobs, Hive and pig scripts
- Used Tidal enterprise scheduler to schedule and automate the daily jobs.
- Implemented unit testing with the help of Mrunit and Junit tools.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Worked with big data Analysts, designers and scientists in troubleshooting map reduce job failures and issueswith Hive, Pig, Flume etc.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Developed test plans, test scripts and test environments to understand and resolve defects.
Associate Java Developer
Responsibilities:
- Participated in planning and development of UML diagrams like use case diagrams, object diagrams and class diagrams to represent a detailed design phase.
- Designed and developed user interface static and dynamic web pages using JSP, HTML and CSS.
- Identified and fixed transactional issues due to incorrect exception handling and concurrency issues due to unsynchronized block of code.
- Used JavaScript for developing client side validation scripts.
- Developed SQL scripts for batch processing of data.
- Created and implemented stored procedures, functions, triggers, using SQL.
- Incorporated custom logging mechanism for tracing errors, resolving all issues and bugs before deploying the application.
- Performed unit testing, system testing and user acceptance test.
- Worked with QA to move the application to production environment.
- Prepared technical reports & documentation manuals during the program development.