Hadoop Admin Resume
CA
SUMMARY
- Overall 8+ years of IT experience in software development and 4+ years of experience on Hadoopecosystem.
- Involved in all phases of Software Development Life Cycle (SDLC) and worked on all activities related to the development, implementation, administration and support for Hadoop.
- Experience in Big Data technologies and Hadoop ecosystem components like HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Hue, Zookeeper, Oozie. NoSQL systems like HBase, Cassandra and Data ingestion frameworks like Flume and Kafka.
- Strong Knowledge on Architecture of Distributed Systems and Parallel processing frameworks.
- Experience in deploying and managing the Hadoop cluster using Cloudera Manager and Hortonworks Ambari.
- Experience with Kerberos installation and configuration.
- In - depth understanding of MapReduce Framework and Spark execution model.
- Extensive experience in working with structured data using Hive QL, join operations, writing custom UDF’s and experienced in optimizing Hive Queries.
- Involved in designing Hive schemas, using performance tuning techniques like partitioning, bucketing.
- Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations in Cloudera Cluster.
- Experience in Apache Flume and Kafka for collecting, aggregating and moving huge chunks of data from various sources such as web server, telnet sources etc.
- Strong Experience in data analytics using Spark Streaming, Storm, HIVE, Pig Latin, HBase.
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Extensive experience in importing/exporting data from/to RDBMS the Hadoop Ecosystem using Apache Sqoop.
- Worked on Java HBase API for ingestion processed data to Hbase tables.
- Strong experience in working with LINUX environments, writing shell scripts.
- Good knowledge and experience of Real time streaming technologies Spark and Kafka.
- Experience in optimization of MapReduce algorithm using Combiners and Partitioners to deliver the best results.
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Extensive experiences in working with semi/unstructured data by implementing complex MapReduce programs using design patterns.
- Extensively worked on NOSQL Database such as HBase,Cassandraand MongoDB.
- Hands on experience in application development using Core JAVA, RDBMS and Linux shell scripting.
- Experience in setting up Nagios and Ganglia for monitoring Hadoop infrastructure.
- Experience in using NFS (Network File Systems) for backing up Namenode metadata.
- Experience in configuring high availability for Namenode, Jobtracker (mrv1) and Resource Manager (yarn/mrv2).
- Experience in PL/SQL programming including SQL queries using stored procedures and triggers in Oracle, SQL Server using TOAD and Query Manager.
- Experience in managing the cluster resources by implementing fair scheduler and capacity scheduler.
- Experience in developing Pig Latin scripts for data processing on HDFS.
- Installation, patching, upgrading, tuning, configuring and troubleshooting Linux based operating systems Red Hat and Centos.
- Capacity planning and monitoring of Hadoop cluster job performances
- Good understanding of distributed systems and parallel processing architectures.
- Experience in building, deploying and integrating applications with ANT, Maven.
- Proficient using version control tools like SVN and GIT.
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, MapReduce, YARN, HDFS, HBase, Zookeeper, Hive, Hue, Pig, Sqoop, Cassandra, Spark, Oozie, Storm, Flume, Cloudera Manager, Hortonworks clusters.
Languages: C, Java, SQL, Pig Latin, HiveQL, Python, Scala, unix shell scripting.
Web programming languages: HTML,CSS, XML,JSP.
Database: Oracle 9i/10g, Microsoft SQL Server, MySQL, DB2, Teradata SQL, RDBS, MongoDB, Cassandra, HBase.
IDE &Build Tools: Eclipse, NetBeans, ANT and Maven.
Version Control System: CVS, SVN, GITHUB.
Web Services: SOAP, RESTful, JAX-WS,JAX-RS
PROFESSIONAL EXPERIENCE
Confidential, CA
Hadoop Admin
Responsibilities:
- Installed and configured Hadoop cluster in Dev, Test and Production environments.
- Performed upgrades to the existing CDH clusters.
- Implemented Commissioning and Decommissioning of new nodes to existing cluster.
- Experience building distributed high-performance systems using Spark and Scala.
- Performed Data Ingestion from multiple internal clients using ApacheKafka.
- Integrated Kafka with Spark Streaming for real time data processing.
- Implemented Sparkadvanced procedures liketext analytics and processingusing theinmemorycomputing capabilities.
- Experience in working with NoSQL database HBase in getting real time data analytics using Apache Spark.
- Prepared System Design document with all functional implementations.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Improving the performance and optimization of existing algorithms inHadoopusing Spark context, Spark-SQL and Spark YARN using Scala.
- Developed spark scripts by usingPythonshell commands as per the requirement.
- Create/ModifyShellscripts for scheduling data cleansing scripts and ETL loading process.
- Developed Spark applications to perform all the data transformations on User behavioral data coming from multiple sources.
- Involved in designing Hive schemas, using performance tuning techniques like partitioning, bucketing.
- Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS for persistence and Hive for real time reporting.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Used Sqoop to import the data to HadoopDistributed File System (HDFS) from RDBMS.
- Provided NoSql solutions in Cassandrafor data extraction and storing huge amount of data.
- Created components like Hive UDFs for missing functionality in HIVE for analytics.
- Worked on various performance optimizations like using distributed cache for small datasets, Partition,Bucketing in Hive and Map Side joins.
- Good experience in understanding and visualizing data by integratingHadoopwith Tableau.
- Used Oozie and Oozie coordinators to deploy end to end data processing pipelines and scheduling the work flows.
Environment: HDFS, Hadoop 2.x, Pig, Hive, Sqoop, Flume, Spark, Kafka, MapReduce, Scala, Oozie, Oracle 11g, YARN, Cassandra, Agile Methodology, JIRA, Cloudera 5.4.
Groupon, CA
Hadoop Dev
Responsibilities:
- Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
- Worked on automation of delta feeds from Teradata using Sqoop, also from FTP Servers to Hive.
- Implemented Hive tables and HQL Queries for the reports. Written and used complex data type in Hive.
- Developed Hive queries to analyze reducer output data.
- Designed workflow by scheduling Hive processes for Log file data, which is streamed into HDFS using Flume.
- Responsible for managing data coming from different sources.
- Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV, JSON.
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
- Ability to develop Map Reduce program using Java..
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Implemented daily workflow for extraction, processing and analysis of data with Oozie.
Environment: s: Hadoop 2.x, Hive, HQL, HDFS, MapReduce, Sqoop, Flume, Oozie, Java, Maven, Eclipse, Putty, Cloudera Manager 4 and CDH 4.
Confidential, TX
Hadoop Dev
Responsibilities:
- Understanding business needs, analyzing functional specifications and map those to develop and designing MapReduce programs and algorithms.
- Created Hive Tables, loaded transactional data from Teradata using Sqoop.
- Developed MapReducejobs for cleaning, accessing and validating the data.
- Implemented Hive Generic UDF’s to in corporate business logic into Hive Queries.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Wrote Pig scripts to transform raw data from several data sources.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in build applications using Maven and integrated with Continuous Integration servers Jenkins to build jobs.
- Worked onpythonscripts to analyze the data of the customer.
- Involved in End-to-End implementation of ETL logic.
- Performing data migration from Legacy Databases RDBMS to HDFS using Sqoop.
- Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
- Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
Environment: Hadoop 1.x, HDFS MapReduce, Hive, Pig, HBase, Sqoop, Oozie, Maven, Shell Scripting, CDH3, Python, Cloudera Manager.
Confidential
Java SQL Programmer
Responsibilities:
- Understanding existing systems and developing based on the new requirements from Functional Team.
- Responsible for analyzing new business rules and interfaces in order to design and implement new SQL objects.
- Developing new interface for the application using JSP, EJB, JDBC modules based on the requirements.
- Working with DBA on creating new DDL & DML statements for connectivity to databases.
- Based on the business requirement analyzed the indexes and partitioning criteria in database for efficiency in throughput.
- Wrote complex SQL and stored procedures.
- Working on SOAP & WSDL web services.
- Developed user and technical documentation.
- Working on Tableau dashboards to build reports/visualizations for Business reports.
Environment: Oracle 11g SQL, Java, J2EE, EJB, HTML, Java Script, Tableau and Windows.
Confidential
Java Developer
Responsibilities:
- Involved in the analysis, design, implementation, and testing of the project.
- Implemented the presentation layer with HTML, XHTML and JavaScript.
- Developed web components using JSP, Servlets and JDBC.
- Implemented database using SQL Server.
- Designed tables and indexes.
- Wrote complex SQL and stored procedures.
- Involved in fixing bugs and unit testing with test cases using JUnit.
- Developed user and technical documentation.
Environment: Oracle 11g SQL, Java, HTML, Java Script, Tableau.