Bigdata Architect/Â lead Hadoop Admin Resume
Memphis, TN
PROFESSIONAL SUMMARY:
- Strong 9+ years of hands - on experience managing and supporting a wide array of software applications from cluster of the 3 PostgreSQL instances to Hadoop cluster from 400+ machines.
- Excellent understanding and knowledge of Hadoop architecture and various components such as HDFS, MapReduce, NameNode, Data node, Resource Manager, Node Manager, Job Tracker, Task Tracker programming paradigm and Hadoop Ecosystem (Hive, Impala, Sqoop, Flume, Oozie, Zookeeper, Kafka, Spark).
- Well versed in Installation, Configuration, Supporting and Managing of Big Data and Underlying infrastructure of Hadoop Cluster.
- Certified CDH Hadoop and Hbase Administrator, and am currently involved in administration, management and support of Big Data applications
- Experience with Cloudera Manager Administration also experience In Installing, Updating Hadoop and its related components in Single node cluster as well as Multi node cluster environment using Apache, Cloudera.
- Experience in Database Administration, performance tuning and backup & recovery and troubleshooting in large scale customer facing environment.
- Experience in managing and reviewing Hadoop log files.
- Experience in analyzing data using HiveQL, impala and custom MapReduce programs in Java.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Extensive experience with Database administration, maintenance, and schema design for PostgreSQL and MS SQL Server.
- Experience in data management and implementation of Big Data applications using Hadoop frameworks.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data Ingestion, Oozie for scheduling and HBase as a NoSQL data store.
- Experienced in deployment of Hadoop Cluster using Ambari, Cloudera Manager.
- Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.
- Proficient in configuring Zookeeper, Flume &Sqoop to the existing Hadoop cluster.
- Having good Knowledge in Apache Flume, Sqoop, Hive, Hcatalog, Impala, zookeeper, oozie, Ambari, chef.
- Expertise in deployment of Hadoop, Yarn, Spark and Storm integration with Cassandra, ignite and RabbitMQ, Kafka.
- Very Good Knowledge in YARN (Hadoop 2.x.x) terminology and High availability Hadoop Clusters.
- Experience in analyzing the log files for Hadoop and ecosystem services and finding out the root cause.
- Performed Thread Dump Analysis for stuck threads and Heap Dump Analysis for leaked memory with Memory analyzer tool manually.
- Very Good experience on high-volume transactional systems running on Unix/Linux and Windows.
- Involved in all phases of Software Development Life Cycle (SDLC) in large-scale enterprise software using Object Oriented Analysis and Design.
- Provided 24/7 on-call Support for production.
- Co-ordination with different tighter schedules and efficient in meeting deadlines.
- Self- starter, fast learner and a team player with strong communication and interpersonal skills.
TECHNICAL SKILLS:
Big Data Technologies: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, HBase, Kafka, Oozie, Zoo keeper, Storm,Spark and Cassandra, Elastic Search (ELK).
Languages: C, C++, Core Java,Scala,Python
Databases: Oracle and MySQL
Monitoring and Reporting: Nagios,Ganglia, Custom shell scripts
Operating Systems: LINUX, Windows
PROFESSIONAL EXPERIENCE:
Confidential, Memphis, TN
Bigdata Architect/ Lead Hadoop Admin
Responsibilities:
- Worked as a Single point of contact for entire cluster planning and setting up the hadoop clusters and building up the data pipelines and data ingestion and data processing.
- Designed solution for Streaming data applications using Apache Storm.
- Extensively worked on elastic search ELK stack.
- Extensively worked on Kafka and Storm integration to score the PMML(Predictive Model Markup Language) Models.
- Applied transformation Using Spark on the streaming dataset.
- Used Kibana for elk visualization.
- Installed multi node cluster on Hortonworks and configured various components of Hadoop ecosystem such as hdfs, Mapreduce,Hive, HBase, Kafka, Storm and Spark.
- Written scripts to load the data into elastic search indexes using Logstash
- Set up Kerberos for Authenticating and Apache Ranger for Policy implementation.
- Set up the kafka topics and published data into the kafka topics.
- Created storm topologies and set up the storm spout and bolts.
- Created HBase tables to store data depending on column families.
- Extensively written Hive queries for data analysis to meet the business requirements.
- Installing, Upgrading and Managing Hadoop Cluster on Hortonworksdistribution.
- Good experience with shell scripting as written shell script for any corrupted blocks.
- Involved in adding and decommissioning the data nodes.
- Implemented Kerberos for securing the Hadoop cluster.
- Used Nagios and Ganglia monitoring and reporting tools for alerts and reports.
- Responsible for analyzing using Spark SQL queries result with Hive queries.
- Involved in requirement and design phase to implementing real time streaming using Kafka and Storm.
- Using the Maven for the deployments andProcessed structured, semi structured such as XML and unstructured data as well.
- Published papers for Digital Research on Streaming Analytics Manager in HDF (Hortonworks data flow) and Apache Zeppelin .Delivered many webinars on new eco-system components in Bigdata space.
- Developed notebooks and visualization reports using Apache Zeepelin.
- Developed API to connect to elastic search and do partial match, full text match, synonyms and fuzzy logics.
Confidential
Big data Project Lead/Lead Hadoop Admin
Responsibilities:
- Using Sqoop to import and export data from and to relational database from Oracle by handling incremental data loading on the client transaction data by date.
- Performed extensive data analysis using Hive and PigandDeveloped UDFs to perform data cleansing and transforming for ETL activities.
- Developed data pipeline by performing ingestion using Sqoop, Flume and data cleaning using MapReduce and processing the data with Pig and hive for detail analysis.
- Installed 100 + multi node clusters and configured various components of Hadoop ecosystem such as hdfs, MapReduce, yarn, hive, pig, sqoop, flume, hbase, kafka, oozie, zoo keeper and spark.
- Worked extensively on creating combiners, Partitioners, Distributed cache to improve the performance of MapReduce jobs.
- Used Flume to load streaming data from different sources to HDFS and Data migration from RDMS to Hadoop using sqoop for analysis and implemented Oozie jobs for automatic data imports from source.
- Created HBase tables to store data depending on column families.
- Extensively written Hive queries for data analysis to meet the business requirements.
- Set up the test cluster on amazon EC2 instance over the cloud.
- Installing, Upgrading and Managing Hadoop Cluster on Cloudera distribution.
- Good experience with shell scripting as written shell script for any corrupted blocks.
- Involved in adding and decommissioning the data nodes.
- Implemented Spark using SparkSQL for faster processing of queries.
- Developed MapReduce jobs in java and Python for data cleaning and preprocessing.
- Created hive partitions and buckets and also used ORC file formats for the optimizing.
- Taking the regular Meta data back up and also check the file system checks using fsck commands and involved in copying the data across the clusters using distcp.
- Implemented Kerberos for securing the Hadoop cluster.
- Used Nagios and Ganglia monitoring and reporting tools for alerts and reports.
- Used PIG loaders such as XML Loader and Apache log loader for processing the data.
- Used HBase as NoSQL database for row level retrievals and good understanding about hbase architecture in terms of HMaster and Region servers.
- Using the version control tools such as GITand Used eclipse IDE to write the MapReduce Java programs.
- Implemented High availability stand by name node cluster with help of zoo keeper and journal nodes.
- Involved in setting up the user and groups and assigning the name and space quotas.
- Setting up the capacity scheduler in YARN cluster.
Confidential
Hadoop Administrator and Developer
Responsibilities:
- Importing and exporting data into HDFS and Hive using Sqoop.
- Used Shell Scripting, Sqoop, Hive, Pig, Java, Map/Reduce daily to develop ETL, batch processing, and data storage functionality.
- Installed/Configured/Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Developed MapReduce jobs in java for data cleaning and preprocessing.
- Used Oozie and Zookeeper for workflow scheduling and coordinating services.
- Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Use of Sqoop to import and export data from RDBMS to HDFS and vice-versa.
- Used Hive and created Hive external/internal tables and involved in data loading and writing Hive UDFs.
- Cluster maintenance as well as creation and removal of nodes using tools like Ganglia, Nagios, and Hortonworks Enterprise.
- Developed pig scripts to transform the data into structured format.
- Developed Hive queries for ad-hoc query analysis.
- Developed Hive UDF's to bring all the customers information into a structured format.
- Developed OozieWorkflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Developed bash scripts to bring the log files from FTP server and then processing it to load into Hive tables.
- Developed Map Reduce programs for applying business rules on the data.
- Worked on analyzing data with Hive and Pig.
- Initially started with Apache Hadoop binaries and then used cloudera distribution.
- Worked on Creating the MapReduce jobs to parse the raw web logs data into delimited records.
- Developed PIG UDFs to perform data cleansing and transforming for ETL activities.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest data into HDFS for analysis
- Worked extensively on creating combiners, Partitioning, Distributed cache to improve the performance of MapReduce jobs.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Responsible for Cluster configuration maintenance and troubleshooting and tuning the cluster.
- Help design of scalable Big Data clusters and solutions.
Confidential
Linux System CRM Administrator
Responsibilities:
- Worked in Critical Production environment that deals with Linux Operating Systems.
- Regular Admin Tasks include building Solaris, Red-Hat, and Linux servers for Production, Development and Test Environments and supported the ones under production.
- Installing new software and upgrading machines on the servers.
- Designed the architecture of CRM system
- Installation of Siebel 8.1.1.11 serverenvironments and Implemented multi lingual set up with 14 languages.
- Integratedwith BIP 11g for report generation and authentication with Active directory servers.
- Application Integration with 3 rd party product called as JAWS for blind employees.
- Maintaining, troubleshooting, patching, testing and developing software upgrade/update packages.
- Implemented File system partitions and Performance Tuning and crash analysis
- Written the scripts to full automate regular activities like repository migration across non prod environments and daily full compiled SRF and auto gen b scripts and auto SRF movement and regular health monitor for component checkup, disk space checkups.
- Performed day-to-day health checks and maintained the servers by taking the down time from the users.
Confidential, OH
Linux System CRM Administrator
Responsibilities:
- Handling server installation and Administration service requests which are logged by customers.
- Monitoring & Managing Production, Development, Testing environments.
- Installation of patches and other software packages Disk and File system management
- Creating Users and adding views and responsibilities
- To send daily health check report of all environments in time to all the concerned teams and maintain the good health of all environments
- Installation and Patches and Troubleshooting, Creating and modifying application related objects, Creating Profiles, Users, Roles and maintaining system security.
- Support Production and Stage application defects, track and document using Quality Center.
- Installing and configuring Apache and supporting them on Linux production servers.
- Creating, cloning Linux Virtual Machines.
- Administration of RHEL, which includes installation, testing, tuning, upgrading and loading patches, troubleshooting both physical and virtual server issues.
- Set up user and group login ID's, printing parameters, network configuration, password, resolving permissions issues, and user and group quota.
- Implemented various Unix Shell Scripts for server monitoring purposes.
- Involved in resolving highly escalated issues for customer related to Siebel server set up and administration.
Confidential
Java Developer
Responsibilities:
- Used the Core Java concepts to implement the Business Logic.
- Extensively used the java collections such as array list, hash map and hash tables .
- Used Exception handling and Multi-threading for the optimum performance of the application.
- Key responsibilities included requirements gathering, designing and developing the applications.
- Developed business components using core java concepts and classes like Inheritance, Polymorphism, Collections, Serialization and Multithreading.
- Hands on experience in SLDC (Software development Lifecycle) design, implementation, deployment analysis of the project.
- Experience in using HTML, JavaScript for user interface development and designing and Implemented Java Script for client side validations.
- Used Eclipse as an IDE for all development and debugging purposes.
- Developed Proof of Concepts and provided work/time estimates for design and development efforts.
- Coordinated with the QA lead for development of test plan, test cases, test code and actual testing, was responsible for defects allocation and ensuring that the defects are resolved and coordinating with onsite team to provide the requirement, resolving issues and reviewing the deliverables.
- Worked on product deployment, documentation and support.