Hadoop Administrator Resume
Chicago, IL
SUMMARY:
- 8 years of IT experience and 3 years of experience in BigData Technologies . Strong experience as Hadoop Administrator was responsible for smooth running and day - to-day operation of a mission-critical Hadoop Cluster. Involved in performing upgrades, manage configuration changes, maintain System integrity and monitor Cluster performance in a multi-tenancy environment.
- Strong communication skills with a professional attitude and can take the pressures to drive with enthusiasm to support client Hadoop cluster with full potential.
- HDP CERTIFIED ADMINISTRATOR (HDPCA).
- Expertise in Hadoop architecture including Hive, HBase, YARN, MapReduce, Sqoop, Pig, Kafka, Storm, Spark.
- In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode and MapReduce concepts.
- Expertise in managing multiple clusters Production, DR, Stage, dev and lab environments total nodes of 300+ on Hadoop.
- Designed and built end-to-end Hadoop cluster in all the environments.
- Practical knowledge on functionalities of every Hadoop daemons, the interaction between them, resource utilizations and dynamic tuning to make cluster available and efficient .
- Strong Experience with Configuring Security- Authentication, Authorization & Impersonation for Big Data & Eco System technologies (ex: Hive, HBase, Ranger, Hue...etc) .
- Setup Disks for HDP, Handled Disk Failures configured storage pools and worked with a Logical Volume Manager.
- Managed Data with Volumes and worked with Snapshots, Mirror Volumes, Data protection & Scheduling.
- Replicated metadata and stored in the different location to avoid loss of metadata.
- Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on the cluster .
- Experience in setting up and managing the High-Availability to avoid a single point of failure on large Hadoop Clusters.
- Worked in Multi Clustered environment and setting up Cloudera, Hortonworks, MapR and IBM BigInsight Hadoop echo-System.
- Experience in configuring Zookeeper to provide cluster coordination services
- Experience in writing UNIX Shell scripts for various purposes like file validation, automation of ETL process and job scheduling using Crontab.
- Introduced YARN capability scheduler in Hadoop.
- Planned and executed Hadoop cluster’s upgrade.
- Installed and configured HAWQ on Hadoop cluster.
- Introduced Jethro acceleration engine that makes real-time Business Intelligence with SAP BO, Tableau.
- Introduced SAP HANA VORA in-memory services on Hadoop cluster and integrated with Spark.
- Installed Ranger and added security by providing authorization and auditing.
- Configured H20 machine learning and predictive analytics engine on Hadoop.
- Implement and installed the user authentication and authorization for the cluster by Kerberos .
- Worked on Sqoop and Flume to load the data from different databases into HDFS.
- Perform deep dives into both systemic and latent reliability issues; partner with software and systems engineers across the organization to produce and roll out fixes.
- Customer facing role, interacting with customer and resolving issues raised by the customer.
- Good understanding of NoSQL databases such as HBase and MongoDB.
- Experienced in defining job flows with Oozie and Configured MySQL for Hue and Oozie.
- Excellent knowledge of the design and implementation of the Data Warehouse life cycle and familiarity with entity-relationship/multidimensional modeling (star schema, snowflake schema) and strong knowledge on MDX and OLAP cubes.
- Strong problem-solving skills and a strong knowledge of ITIL, SDLC Processes.
- Experience in setting up scheduling, deploying and subscriptions in Report Manager.
- Handled Level L1 database administration activities like installation, configuration, tuning and migration of databases along with database design and management. Good knowledge on RAC and Data Guard.
TECHNICAL SKILLS:
Hadoop/BigData Technologies: HDFS, Map Reduce, HBase, Pig, Hive, Sqoop, Flume and Oozie
Programming Languages: Java, SQL, PL/SQL, Shell Scripting
BI Tools: Microsoft Business Intelligence, Report Builder, JasperSoft, Splunk
Web Technologies: HTML, XML, JavaScript, Ajax, SOAP, CSS, JQuery
Databases: Oracle 9i/10g/11g, SQL Server, MySQL
Database Tools: TOAD, Billing tool, Oracle Warehouse Builder (OWB).
Operating Systems: Linux, Unix, Windows, CentOS, RedHat
Other Concepts: OOPS, Data Structures, Algorithms, Software Engineering, ETL, Puppet, SVN, VSS
PROFESSIONAL EXPERIENCE:
Confidential, Chicago, IL
Hadoop Administrator
Responsibilities:
- Responsible for cluster maintenance, monitoring, commissioning and decommissioning Data nodes, Troubleshooting, manage and review backups, manage and review log files.
- Involved in cluster capacity planning, Hardware planning, Installation, Performance Tuning of the Hadoop cluster.
- Worked with the technical architect in upgrading and increasing the size of Hadoop cluster.
- Coordinate with different teams with user issues and resolved it.
- Monitored the Hadoop cluster with Ambari GUI to ensure the health of Hadoop services in Hadoop cluster
- Connected with Hortonworks support team for resolving issues as well as preferred recommendations.
- Day to day responsibilities includes solving developer issues, deployments, moving code from one environment to another environment, providing access to the new user, providing instant solutions for reducing the impact and documenting the same and preventing future issues.
- Planned and prepared the use case for new Hadoop services and tested on sandbox by adding/installing using Ambari manager.
- Working experience in designing and implementing complete end-to-end Hadoop Infrastructure which includes all Hadoop Ecosystem.
- Performed Rolling and express upgrade from HDP 2.3.2 to HDP 2.4.3.
- Upgraded Ambari 2.1.2 to Ambari 2.4.1
- Configured Grafana and Zepplin Dashboards for monitoring and cost analysis.
- Configured Knox as single of point a single access point for all REST interactions with Hadoop clusters.
- Configured Capacity scheduler with various queues and priority for Hadoop
- Set up and manage High Availability Namenode, Resource Manager, Hiveserver2, Hive Metastore, Oozie to avoid a single point of failures in large clusters.
- Implemented Kerberos authentication infrastructure-KDC server’s setup, creating realm domain, managing principles, generating keytab files for each and every service and managing key tab using keytab tools.
- Designed and allocated HDFS quotas for multiple groups.
- Created HIVE databases and granted appropriate permissions through Ranger policies.
- Introduced SmartSense and got optimal recommendations from the vendor and even for troubleshooting the issues.
- Moving the data from Netezza, Teradata, DB2 into HDFS using Sqoop and importing various formats of flat files into HDFS.
- Experience in providing support to data analyst in running Pig and Hive queries.
- Responsible for design and creation of Hive tables and worked on various performance optimizations like Partition, Bucketing in the hive . Handled incremental data loads from RDBMS into HDFS using Sqoop.
- Used Oozie scheduler to automate the pipeline workflow and orchestrate the sqoop, hive and pig jobs that extract the data on a timely manner.
- Exported the generated results to Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Implemented cross realm between two clusters and established a trust for DISTCP.
- Written complex Hive and SQL queries for data analysis to meet business requirements.
- Designed and installed SAP HANA VORA on top of Hadoop cluster and integrated with Spark Controller.
- Exported analyzed data to downstream systems using Sqoop for generating end-user reports, Business Analysis reports and payment reports.
- Worked on getting data/logs from AWS and Azure with Splunk team.
- Experienced working with Apache Solr for indexing and querying.
- Created custom Solr Query components to enable optimum search matching
- Development operations using GIT, Puppet, it's modules configuration, upload to master server and implement on client servers.
BigData Infrastructure Engineer
Responsibilities:
- Install, Configure and maintain Single-node and Multi-node cluster Hadoop cluster for dev and prod.
- Interacted with UNIX server management team to set up multiple virtual RedHat/Centos Application server on a single Physical box.
- Setup cluster environment for Highly Available systems.
- Installed Cloudera Hadoop CDH 4 and CDH 5 on Linux based Dev servers.
- Upgrade Apache Hadoop from version CDH 4 to CDH 5 on Linux server.
- Implementing High Availability configuration for Hadoop Name node.
- Installing PIG, HIVE as per application requirement.
- Configuring Sqoop to import data from external database - SQL Server and MYSQL.
- User management on Hadoop for HDFS and Map Reduce.
- Setup Hive and NoSQL ( HBase ) on remote metastore .
- Monthly Linux server maintenance, shutting down essential Hadoop namenode and data node, job tracker and task tracker. And restarting Hadoop services including Yarn.
- Kerberised cluster setup to implement security.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Assisted with data capacity planning and node forecasting.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Work with the user to resolve issues related to access and jobs running on the cluster .
- Work with vendor to resolve product related issues.
- Dumped the data from one cluster to another cluster by using DISTCP, and automated the dumping procedure using shell scripts.
- Helped team in installing the monitoring systems such as Nagios to check the health of the cluster. To keep checks on crucial reports I wrote bash scripts to monitor critical components and deliver instantaneous messages.
- Installed the configuration tools like a chef in the cluster .
- Experience in installation, configuration, supporting and managing - CDH in AWS.
- Supported technical team members and formulated procedures in the installation of Hadoop patches, updates and version upgrades.
- Experience in group and user management, file systems backup, crontab scheduling tasks in Linux environment.
- Develop high-performance cache, making the site stable and improving its performance.
- Implemented best income logic using Pig scripts and UDFs.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
- Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Discussions with other technical teams on a regular basis regarding upgrades, process changes, any special processing, and feedback.
- Supported code/design analysis, strategy development, and project planning.
Hadoop /BigData Analyst
Responsibilities:
- Set up automated processes to send alerts in case of predefined system and application level issues.
- Designed architecture of Hadoop systems with the component layout.
- Installed and managed Hadoop production cluster with 150+ nodes with a storage capacity of 10PB with HDP distribution using 1.7 Ambari and 2.1.3 HDP.
- Upgraded Production cluster from Ambari 1.7 to 2.1 and HDP 2.1 to 2.2.6.
- Benchmarking and tuning of newly built Hadoop clusters.
- Installed, upgraded and managed multiple clusters stage, dev, and lab.
- Automated data loading between clusters with retention policies.
- Developed Hive, Pig scripts as a part of data loading process and data enrichment.
- Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume.
- Developed the Sqoop scripts for loading data to Hadoop from multiple databases.
- Extracting the data from the Hive tables for data analysis.
- Implemented authentication service using Kerberos authentication protocol.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
- Developed Java Map Reduce programs on log data to transform into a structured way to find user location, age group, spending time.
- Implemented Partitioning, Dynamic Partitions, buckets in Hive and wrote map reduce programs to analyze and process the data
- Worked with firewall teams to have infrastructure setup with firewall exceptions.
- Participated in capacity planning for clusters by providing metrics of the cluster utilization.
- Worked to keep Hadoop infrastructure operation across different environments with 24x7 support.
- Resolving user tickets and issues proactively and documenting a run book for users for reference.
- Worked with multiple teams to set up automatic user onboarding to Hadoop once the necessary approvals are done.
- Provided timely support to Night Ops and performed to the best of my ability to independently support Night Ops to minimize business disruptions for L-1, L-2, and L-3 application. Closed 100's of incident tickets and problem tickets within SLA’s with an excellent track record for meeting deadlines.
Systems Administrator
Responsibilities:
- Involved in configuration and support of a production environment hosted in a 24x7 setup.
- Installation Responsible for installing, configuring, maintaining and monitoring system performance and capacity of all Linux servers in Production, Development and Test Environments.
- Install and maintain security patches on the operational and development systems, which includes RedHat Linux, Apache Web services.
- Monitoring systems and their availability of all server resources and perform all activities for Linux servers.
- Responsible for performing daily Linux OS Administration, Cluster Administration, DNS Administration and also maintenance activity of Linux systems.
- Creating YUM reports and managing application and system level deployments.
- Configuration Management using tools like Puppet and with the help of shell scripts. Responsible for deploying applications, assisting developers, performing root cause analysis.
- Performed Jump start/Kick start installation using PXE/TPM installation of RHEL/OEL.
- Experience in managing different storage types SAN (Promise, IBM) and NAS storages. Capacity management thru BMC for physical servers and virtual machines.
- Managed shared NFS files system, mount/unmount NFS server, NFS clients on the remote machine, sharing remote file folder, starting and stopping the NFS service. Experience in configuration/ troubleshooting & analysis of hardware software failures for various UNIX servers.
- Troubleshooting the hardware and OS related issues faced by a various team which includes support, DBAs and Business critical servers.
- Experience in package management using RedHat RPM/YUM . Basic Knowledge of Database, Inserting the statements into databases. Run the Query on the database to get the result pertaining to the request.
- Responsible for first and second level problem analysis and resolution for system and application tools utilizing already existing UNIX scripts and Linux run books.
- Installation and support of various versions of Oracle, SQL Server and MySQL databases.
- Used various networking tools such as SSH, telnet, rlogin, tcpdump, snoop, Wireshark, FTP and ping to troubleshoot daily networking issues.
- Work with developers to integrate their web applications with standard infrastructure.
- Writing scripts for monitoring log files and sending alerts. Involved in the Recovery of the Unix servers.
- Work on HPSM Request system for resolving and closing user request tickets in co-ordination with other teams such as Engineering, Database, Application, Linux & Solaris groups.
- Explore new technologies and design methodologies that enhance the efficiency of system administration & support for my team.
Linux Administrator
Responsibilities:
- Installing, performance tuning, system monitoring and managing Linux systems .
- Installation and configuring Red Hat Enterprise Linux Servers & CentOS .
- Performing Administration - creation, deletion, quotas, permissions and file system administration.
- Installing packages on Linux servers using YUM and RPM utilities.
- Installing and Configuring Services like SSH, FTP, HTTP, SAMBA, DNS, DHCP, NFS.
- Performing Mount and unmount operations on file systems.
- Accessing the clients by Telnet, RSH, and SSH .
- Experienced in troubleshooting and resolving network issues related to the server.
- Responsible for and extending Volume group, LVM & Swap space.
- Maintenance and scheduling crontab .
- Maintenance and Configuration of NFS, FTP Server and AutoFS ( AutoMount ) for file sharing.
- Configuring and managing apache web server.
- Configuring YUM server as a repository.
- A s per requirement configuring LVM's, managing file system, creating a file system, file permissions, monitoring system performances.
- Creating and assigning users, groups and manage user rights and security.
- Implementation of ACL permissions.
- Backing up files with extended attributes using tar and Configuration of scheduling Tasks using " cron ".
- Used various monitoring applications for better analysis for maintaining 100+ production servers.
- Managing to implement backup and recovery strategies .
- Troubleshooting and analyzing Linux and Linux resident application issues
- Implementing necessary patches, service packs, cumulative upgrades, hotfixes and upgrades for meeting security compliance and maintaining application integrity.
MSBI/SQL Developer
Responsibilities:
- Analysis/Design, Development of Databases, SSIS packages for migration, SSRS reports for auditing of migrated data.
- Created YTD, MTD and WTD sales and inventory reports on the store level.
- Created several tables, views and stored procedures using SQL Server .
- Developed several SSIS packages and data flow tasks by using SSIS.
- Extracted data from different sources and stored in SQL Staging tables in BI Staging database.
- Strong experience Creating SSIS (SQL Server Integration Service) packages to extract data from OLTP to OLAP systems (SQL Server) and scheduled jobs to call the packages and stored procedures.
- Wrote T-SQL queries, Store Procedures and used them to build packages and reports.
- Responsible for optimizing all indexes, SQL queries, stored procedures to improve the quality of software.
- Used Snapshots and Caching options to improve the performance of Report Server.
- Designed and implemented user login and security in SSRS.
- Worked on SSIS Package, SSIS Import/Export for transferring data from Heterogeneous Database (Oracle and Text format data) to SQL Server.
- Involved in providing production Support Service & Solutions to the applications by doing an analysis, solution and implementation on problem tickets and enhancement of application.
- Created reports in SSRS with a different type of properties like chart controls, filters, interactive sorting, SQL parameters etc., which can we access through business GUI application.
- Scheduled SSIS packages to trigger using TWS scheduler and raised a request for ad-hoc running packages both for exporting and importing data.
- Responsible for optimizing all indexes, SQL queries, stored procedures to improve the performance.
- Provided graphs and update sheet of application usage and performance by doing daily monitoring before the business starts in order to reduce incidents.
- Used Tortoise SVN for version and team-logging control.