Hadoop Admin/developer Resume
New York New, YorK
SUMMARY:
- Over 8 years of IT experience as a Developer, Designer & quality Tester with cross platform integration experience using Hadoop Ecosystem, Java and Software Functional Testing.
- Expertise in writing Hadoop Jobs for analyzing structured and unstructured data using HDFS, Hive, HBase, Pig, Spark, Kafka, Scala, Oozie and Talend ETL.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and MapReduce concepts.
- Experience in working with different kind of MapReduce programs using Hadoop for working with Big Data analysis.
- Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java.
- Experience in importing/exporting data using Sqoop into HDFS from Relational Database Systems and vice - versa.
- Extensive knowledge and experience on real time data streaming techniques like Kafka, Storm and Spark Streaming.
- Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including PIG, HIVE, Sqoop, Oozie, Flume and zookeeper.
- Good Knowledge in providing support to data analyst in running Pig and Hive queries.
- Experience in writing shell scripts to dump the shared data from MySQL servers to HDFS.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Knowledge in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
- Experience in working with various Cloudera distributions (CDH4/CDH5), Hortonworks and Amazon EMR Hadoop Distributions.
- Extensively worked on Hive and Sqoop for sourcing and transformations.
- Experience in automating the Hadoop Installation, configuration and maintaining the cluster by using the tools like Puppet.
- Hands on experience knowledge in NoSQL databases like HBase, Cassandra, Mongo db.
- Experience in working with flume to load the log data from multiple sources directly into HDFS.
- Strong debugging and problem-solving skills with excellent understanding of system development methodologies, techniques and tools.
- Wrote Flume configuration files for importing streaming log data into HBase with Flume.
- Processing this data using Spark Streaming API with Scala.
- Worked in complete Software Development Life Cycle (analysis, design, development, testing, implementation and support) in different application domain involving different technologies varying from object-oriented technology to Internet programming on Windows NT, Linux and UNIX/ Solaris platforms and RUP methodologies.
- Involved in writing shell scripts, Ant scripts for Unix OS for application deployments to production region.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Hands on experience in developing the applications with Java, J2EE, J2EE - Servlets, JSP, EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g and MS-SQL Server RDBMS.
- Having very good POC and Development experience on Apache Flume, Kafka, Spark, Storm, and Scala.
- Good understanding in using data ingestion tools- such as Kafka, Sqoop and Flume.
- Good working knowledge on Hadoop hue ecosystems.
- Good knowledge in evaluating big data analytics libraries and use of Spark-SQL for data exploratory.
- Exceptional ability to quickly master new concepts and capable of working in-group as well as independently with excellent communication skills.
- Benchmarking Hadoop clusters to validate the hardware before and after installation to tweak the configurations to obtain better performance.
- Strong experience in analyzing large amounts of data sets writing PySpark scripts and Hive queries.
- Extensive experience in working with structured data using Hive QL, join operations, writing custom UDF's and experienced in optimizing Hive Queries.
- Extensive experiences in working with semi/unstructured data by implementing complex map reduce programs using design patterns.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database.
- Experience in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, telnet sources etc.
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop MapReduce, Hive, Spark jobs.
- Involved in moving all log files generated from various sources to HDFS and Spark for further processing.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster.
- Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
PROFESSIONAL EXPERIENCE:
Confidential, New York, New York
Hadoop Admin/Developer
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, managing. Management and support of Hadoop Services including HDFS, Hive, Impala, and SPARK.
- Installing, Upgrading and Managing Hadoop Cluster on Cloudera.
- Troubleshooting many cloud related issues such as Data Node down, Network failure, login issues and data block missing.
- Worked as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC (Proof-of-Concept) to PROD clusters on Cloudera (CDH 5.5.2) distribution.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
- Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
- Migrated Flume with Spark for real time data and developed the Spark Streaming Application with java to consume the data from Kafka and push them into Hive.
- Configured Kafka for efficiently collecting, aggregating and moving large amounts of click stream data from many different sources to HDFS. Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Imported logs from web servers with Flume to ingest the data into HDFS.
- Using Flume and Spool directory loading the data from local system to HDFS.
- Retrieved data from HDFS into relational databases with Sqoop.
- Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis Fine tuning hive jobs for optimized performance.
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
- Installed Hadoop, Map Reduce, HDFS, and Developed multiple maps reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
- Extensively involved in the Design phase and delivered Design documents.
- Involved in Testing and coordination with business in User testing.
- Importing and exporting data into HDFS and Hive using SQOOP.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing Hive queries that will run internally in map reduce way.
- Experienced in defining job flows.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Experienced in managing and reviewing the Hadoop log files.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Implemented a CI/CD pipeline with Docker, Jenkins and GitHub by virtualizing the servers using Docker for the Dev and Test environments by achieving needs through configuring automation using Containerization.
- Worked on Oozie workflow engine for job scheduling.
- Migrated complex MapReduce programs into Spark RDD transformations, actions.
- Implemented SparkRDD transformations to map business analysis and apply actions on top of transformations.
- Involved in creating Hive tables, loading with data and writing hive queries which runs internally in MapReduce way.
- Developed the MapReduce programs to parse the raw data and store the pre-Aggregated data in the partitioned tables.
- Conduct performance tuning of the Hadoop Cluster and map reduce jobs. Also, the real-time applications with best practices to fix the design flaws.
- Implemented Oozie work-flow for ETL Process for critical data feeds across the platform.
- Configured Ethernet bonding for all Nodes to double the network bandwidth
- Implementing Kerberos Security Authentication protocol for existing cluster.
- Built high availability for major production cluster and designed automatic failover control using Zookeeper Failover Controller (ZKFC) and Quorum Journal nodes.
- Proficient with container systems like Docker and container orchestration like EC2 Container Service, Kubernetes, worked with Terraform.
- Managed Docker orchestration and Docker containerization using Kubernetes.
- Used Kubernetes to orchestrate the deployment, scaling and management of Docker Containers.
- Managed the Version Control System GIT to record the various code changes like branching, merging, staging, cherry picking etc.
- Demonstrated on Ansible along with Ansible Tower can be used to automate different software development processes all over the team organization.
- Integrated GIT into Continuous Integration Environment using Jenkins/Hudson.
- Used Maven as Build Tool for the development of build artifacts on the source code.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
Environment: HDFS, Map Reduce, Apache Hadoop, Hive, Sqoop, Flume, MapReduce,, Cloudera Manager, Red Hat, MySQL, Prometheus, Docker, Kubernetes.
Confidential, Cincinnati, Ohio
Hadoop Architect/Developer
Responsibilities:
- Worked on developing architecture document and proper guidelines
- Worked on installing Kafka on Virtual Machine.
- Designed and implemented end to end big data platform solution on AWS.
- Manage Hadoop clusters in production, development, Disaster Recovery environments.
- Implemented Signal Hub a data science tool and configured it on top of HDFS.
- Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS.
- Possess good Linux and Hadoop System Administration skills, networking, shell scripting and familiarity with open source configuration management and deployment tools such as Chef.
- Worked with Puppet for application deployment
- Configured Kafka to read and write messages from external programs.
- Configured Kafka to handle real time data.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Developed MapReduce and Spark jobs to discover trends in data usage by users.
- Implemented Spark using Python and Spark SQL for faster processing of data.
- Developed functional programs in SCALA for connecting the streaming data application and gathering web data using JSON and XML and passing it to FLUME.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Exported the patterns analyzed back into Teradata using Sqoop. Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Used the Spark -Cassandra Connector to load data to and from Cassandra.
- Real time streaming the data using Spark with Kafka.
- Good knowledge on building Apache spark applications using Scala.
- Developed several business services using Java RESTful Web Services using Spring MVC framework
- Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie.
- Expert in creating and designing data ingest pipelines using technologies such as spring Integration, Apache kafka
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
- Implemented test scripts to support test driven development and continuous integration. Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP
- Responsible to manage data coming from different sources.
- Experienced in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
- Used File System check (FSCK) to check the health of files in HDFS.
- Developed the UNIX shell scripts for creating the reports from Hive data.
- Used JAVA, J2EE application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC)
- Involved in the pilot of Hadoop cluster hosted on Amazon Web Services (AWS)
- Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza.
- Involved in collecting metrics for Hadoop clusters using Ganglia and Ambari.
- Extracted files from CouchDB, MongoDB through Sqoop and placed in HDFS for processed
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (Hbase).
- Implemented a multitenant Hadoop cluster and on boarded tenants to the cluster.
- Achieved data isolation through ranger policy-based access control.
- Used YARN capacity scheduler to define compute capacity. Responsible for building a cluster on HDP 2.5.
- Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
- Optimized Map Reduce code by writing Pig Latin scripts.
- Import data from external table into HIVE by using load command
- Created table in hive and use static, dynamic partition for data slicing mechanism
- Working experience with monitoring cluster, identifying risks, establishing good practices to be followed in shared environment
- Good understanding on cluster configurations and resource management using YARN
Environment: , Map Reduce, HDFS, Ambari, Hive, Sqoop, Apache Kafka, Oozie, SQL, Alteryx, Flume, Spark, Cassandra, Scala, Java, AWS, GitHub
Confidential, Kenilworth, New Jersey
Hadoop Admin/Developer
Responsibilities:
- Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
- Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
- Performed S3 buckets creation, policies and also on the IAM role based polices and customizing the JSON template.
- Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
- Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management.
- Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
- Worked in AWS environment for development and deployment of Custom Hadoop Applications.
- Worked closely with the data modelers to model the new incoming data sets.
- Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution.
- Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Performed real time analysis on the incoming data.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.
Confidential, Chicago, Illinois
Hadoop Admin
Responsibilities:
- Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure
- Cluster maintenance as well as creation and removal of nodes.
- Evaluation of Hadoop infrastructure requirements and design/deploy solutions (high availability, big data clusters.
- Cluster Monitoring and Troubleshooting Hadoop issues
- Manage and review Hadoop log files
- Works with application teams to install operating system and Hadoop updates, patches, version upgrades as required
- Created NRF documents which explains the flow of the architecture, which measure the performance, security, memory usage, dependency.
- Setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
- Help maintain and troubleshoot UNIX and Linux environment.
- Experience analyzing and evaluating system security threats and safeguards.
- Experience in Importing and exporting data into HDFS and Hive using Sqoop.
- Developed Pig program for loading and filtering the streaming data into HDFS using Flume.
- Experienced in handling data from different data sets, join them and preprocess using Pig join operations.
- Developed Map-Reduce programs to clean and aggregate the data
- Developed HBase data model on top of HDFS data to perform real time analytics using Java API.
- Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
- Imported and exported data from Teradata to HDFS and vice-versa.
- Strong understanding of Hadoop eco system such as HDFS, MapReduce, HBase, Zookeeper, Pig, Hadoop streaming, Sqoop, Oozie and Hive
- Implement counters on HBase data to count total records on different tables.
- Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
- We used Amazon Web Services to perform big data analytics.
- Implemented Secondary sorting to sort reducer output globally in map reduce.
- Implemented data pipeline by chaining multiple mappers by using Chained Mapper.
- Created Hive Dynamic partitions to load time series data
- Experienced in handling different types of joins in Hive like Map joins, bucker map joins, sorted bucket map joins.
- Created tables, partitions, buckets and perform analytics using Hive ad-hoc queries.
- Experienced import/export data into HDFS/Hive from relational data base and Tera data using Sqoop.
- Handling continuous streaming data comes from different sources using flume and set destination as HDFS.
- Integrated spring schedulers with Oozie client as beans to handle cron jobs.
- Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters
- Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Worked on spring framework for multi-threading.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, RDBMS/DB, Flat files, Teradata, MySQL, CSV, Avro data files. JAVA, J2EE.
Confidential, New York, New York
SDET
Responsibilities:
- Involved in almost all the phases of SDLC.
- Executed test cases manually and logged defects using Clear Quest
- Automated the functionality and interface testing of the application using Quick Test Professional (QTP)
- Design, Develop and maintain automation framework (Hybrid Framework).
- Analyze the requirements and prepare automation scripts scenario
- Develop test data for Regression testing using QTP
- Wrote Test cases on IBM rational Manual Tester
- Conducted Cross Browser testing on Different Platform
- Client Application Testing, Web based Application Performance, Stress, Volume and Load testing of the system using Load Runner 9.5.
- Analyzed performance of the application program itself under various test loads of many simultaneous Users.
- Analyzed the impact on server performance CPU usage, server memory usage for the applications of varied numbers of multiple, simultaneous users.
- Inserted Transactions and Rendezvous points into Web Users
- Created User Scripts using VuGen and used Controller to generate and executed Load Runner Scenarios
- Complete involvement in Requirement Analysis and documentation on Requirement Specification.
- Prepared use-case diagrams, class diagrams and sequence diagrams as part of requirement specification documentation.
- Involved in design of the core implementation logic using MVC architecture.
- Used Apache Maven to build and configure the application.
- Developed JAX-WS web services to provide services to the other systems.
- Developed JAX-WS client to utilize few of the services provided by the other systems.
- Involved in developing EJB 3.0 Stateless Session beans for business tier to expose business to services component as well as web tier.
- Implemented Hibernate at DAO layer by configuring hibernate configuration file for different databases.
- Developed business services to utilize Hibernate service classes that connect to the database and perform the required action.
- Developed JavaScript validations to validate form fields.
- Performed unit testing for the developed code using JUnit.
- Developed design documents for the code developed.
- Used SVN repository for version control of the developed code.
Environment: SQL, Oracle 10g, Apache Tomcat, HP Load Runner, IBM Rational Robot, Clear quest, Java, J2EE, HTML, DHTML, XML, JavaScript, Eclipse, WebLogic, PL/SQL and Oracle.