Hadoop Administrator/engineer Resume
CA
SUMMARY
- Over 8 years of work experience from assorted IT (6 years) and Non - IT fields (2 Years) and technologies including approx. 3 years of mid-level work experience in Hadoop Administration and Support which includes hands-on experience in BigData engineering and its related integrated technologies.
- Extremely good and excellent at Hadoop/Bigdata Engineering, Monitoring and Support. Experience with both versions of Hadoop 1.x and 2.x.
- Responsible for implementation and ongoing administration of Hadoop infrastructure.
- Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
- Working with data delivery teams to setup new Hadoop users.
- This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
- Cluster maintenance as well as creation and removal of nodes using monitoring tools like Cloudera Manager, Ambari and other tools.
- Experience with Cloudera, Pivotal and Hortonworks Hadoop distributions.
- Performance tuning of Hadoop clusters and Hadoop MapReduce, Spark, Kafka and Yarn applications.
- Screen Hadoop cluster job performances and capacity planning, setting up queues and scheduler.
- Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
- Hadoop application support, maintenance and troubleshooting at L1, L2 and L3 level..
- Implementing Access Control for Hadoop components Impala, Hive and HUE, Implementing Data encryption, data migration, data prevention, disaster recovery and data loading.
- Collaborating with application teams to install operating system and Hadoop updates, patches, minor/ major version upgrades when required.
- Considerable knowledge about automation/configuration management using Puppet.
- Experience on Designing, Planning, Administration, Installation, Configuring, Troubleshooting, Performance monitoring and Fine-tuning of Hadoop clusters at Lab, Dev and Production Environments.
- Excellent understanding of Hadoop architecture and underlying framework including Storage management.
- Diligently teamed with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
- Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters in HDP Hortonworks, Pivotol and Cloudera Hadoop Distribution.
- Capacity planning and performance tuning of Hadoop clusters.
- Strong background in Linux/Unix Administration and have hands on with various OS environments RHEL, Cent OS and Ubuntu etc.
- Also have some hands-on experience working with Cloud computing platforms such as Amazon EC2 Clusters AWS, and Microsoft Azure (HD Insight Clusters).
- Involved in all phases of Software Development Life Cycle (SDLC) and Worked on all activities related to the development, implementation, administration and support for Hadoop.
- Ability to use a wide variety of open source technologies.
- Develop and enhance platform best practices and educate application users on best practice.
- Knowledge of best practices and IT operations in an always-up, always-available service.
TECHNICAL SKILLS
Hadoop Tools: Hortonworks HDP 2.x, HDFS,MR2,YARN, Hive, Spark, Kafka, Solr, Ranger, Oozie, Flume, Cloudera CDH5.x, Pig, Hive, Impala, Hbase, HDFS, MapReduce, Kerberos, Cloud AWS, EC2, S3, Elastic Search, MS Azure, HD Insight Cluster
Other Technologies: UNIX, Java EE, C/C++, Scala, Shell, Ruby, SAS, R, Puppet, Oracle, MySQL, Red Hat RHEL 6, Ubuntu, CentOS, Docker, JIRA, HTML, CSS, JS, Jenkins, GIT, REST.
PROFESSIONAL EXPERIENCE
Confidential, CA
Hadoop Administrator/Engineer
Responsibilities:
- An Active Hadoop Engineer in Confidential ’s CAE Hadoop Infrastructure of Confidential Analytics.
- Responsible for implementation and ongoing administration of Hadoop infrastructure.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Cluster Planning, Manage and review data backups & log files.
- Installed, configured, and upgraded Hadoop components in Cloudera platform CDH5.
- Worked with data delivery teams to setup new Hadoop users.
- This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
- Monitored different clusters PC, Dev, New Dev, QA, New QA, Production Clusters ANA and OPR of 450+ data nodes with 11PB of HDFS data and addressed immediate issues to users and App teams on Analytics Ad-hoc Cluster.
- Worked closely with System Administrators, Network team, ACL Team, UNIX team, BI analysts, Developers, Testers, Key business leaders Application and Business Intelligence teams to establish SLAs and acceptable performance metrics for the Hadoop as a service offering.
- Performance tuning of Hadoop clusters and Hadoop MapReduce, Spark, Kafka and Yarn applications.
- Screened Hadoop cluster job performances and capacity planning, setting up queues and scheduler.
- Performance tuning for infrastructure and Hadoop settings for optimal performance of jobs and checking their throughput.
- Responsible for building robust environment for Non-production Hadoop and other clusters.
- Troubleshooting cluster issues and support users through Jira ticketing System and fixing issues providing L1, L2, L3 Support to customers.
- Monitoring different clusters for ongoing issues using Cloudera Manager and perform fixes.
- Implementing Access Control for Hadoop components Impala, Hive and HUE.
- Collaborating with application teams to install operating system and Hadoop updates, patches, minor/ major CDH version upgrades when required.
- Applied CDH patch upgrades and activated cloudera parcels and performed version upgrade.
- Setting up Kerberos principals in KDC server and testing HDFS, Hive, Pig and MapReduce access for the new users and creating key tabs for service ID’s using key tab scripts.
- Performance tuning for infrastructure and Hadoop settings for optimal performance of jobs and their throughput.
- Requested Firewall requests through Network Analyzer, Tufin and Blue print.
- Documenting set up process related information and technical details in Wiki Confluence.
- Worked with different teams such as Unix team, off shore team and chase merchant teams.
- Worked with Cloudera Vendor team and resolving the Cluster issues by fixing them receiving instructions through Cloudera support portal.
- Provided security and authentication with Kerberos which works on issuing Kerberos tickets to users.
- On boarding and training on best practices for new users who were migrated to our clusters.
- Requesting access to users through EMC Viper ticketing system.
- Helping users and teams with incidents related to administration and development.
Environment: Cloudera CDH5, YARN, Resource Manager, HDFS, Hive, Hue, Kerberos, Cloudera Manager, Zoo Keeper, Fair Scheduler, JIRA, RHEL, PaaS, IaaS, Viper, Tufin, Network Analyser, Fire Wall Blue Print, Tableau, Kafka, Flume, Hbase, Oozie, SSL, Solr, Wiki Cinfluence, Impala, Sqoop, Flafka, Queues, ESP Scheduler, Ab Initio, Datameer, GIT and GPFS
Confidential, CT
Big Data Engineer / Hadoop Administrator
Responsibilities:
- An Active Hadoop Engineer in Confidential ’s Hadoop Infra - Integrated Systems engineering ISE.
- Responsible for implementation and ongoing administration of Hadoop infrastructure.
- Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Responsible for building robust environment for Non-production Hadoop and other clusters.
- Installed and managed HDP Hortonworks DL and DW components as per documentation.
- Worked on Hadoop Hortonworks (HDP 2.3) distribution with services viz. HDFS, MapReduce2, Tez, Hive, Pig, Hbase, Sqoop, Flume, Spark, Ambari Metrics, Zookeeper, Falcon and oozie etc.) For 4 cluster ranges from LAB, DEV, QA to PROD which exclusively contains nearly 350+ nodes with 7PB data.
- Monitor Hadoop cluster health status and configuration on Ambari monitoring system.
- Led the installation, configuration and deployment of product soft wares on new edge nodes that connect and contact Hadoop cluster for data acquisition.
- Rendered L1/L2/L3 support services for BI users, Developers and Testers through Jira ticketing system.
- Managed and reviewed Log files as a part of administration for troubleshooting purposes.
- Interacted, communicated and appropriated appropriately for the tickets raised by BI users, testers and developers through JIRA ticketing system.
- Performance Tuning and ETL, Agile Software Deployment, Team Building & Leadership, Engineering Management.
- Setting up Kerberos principals in KDC server and testing HDFS, Hive, Pig and MapReduce access for the new users and creating key tabs for service ID’s using key tab scripts.
- Capable of upgrading HDP version in production environment from HDP 2.3.x to HDP 2.4.x.
- Monitored multiple Hadoopclusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
- Installed OLAP software Atscale on its designated edge node server.
- Applied UNIX administration skills whenever it is required to access from putty and terminal.
- Conducted cluster sizing, tuning, and performance benchmarking on a multi-tenant OpenStack platform to achieve desired performance metrics.
- Good knowledge on providing solution to the users who encountered java exception and error problems while running the SAS data models and Rscript.
- Good understanding on memory usage while running random forest data models on R.
- Guided team who works on data ingestion on systems to pull data scooping from traditional RDBMS platforms such as Oracle, MySQL and Teradata to Hadoop cluster using automated ingestion scripts and also store data in NoSQL databases such as HBase and Cassandra.
- Provided security, authorization and authentication with Ranger where ranger admin provides administration and user sync adds the new users to the cluster.
- Created and maintained Technical documentation for launching Hive server trouble shooting, Installation of product software on new edge nodes and also Hive queries and Pig Scriptsand kept in Confluence for users’ reference.
- Experienced on adding/installation of new components and removal of them through ambari on HDP and manually on EC2 clusters.
- Proficient using OLAP cubes to design backend data models and entity relationship diagrams (ERDs) for star schemas, snowflake dimensions and fact tables and datasets using AtScale
- Good troubleshooting skills on over all Hadoop stack components, ETL services and Hue, Rstudio which provides GUI for developers/business users for day-to-day activities.
- Set queues and allocated the clusters resources to provide the priority for jobs in hive.
- Hands on experience with Spark, Scala and Spark SQL running through spark/hive context.
Environment: Hortonworks HDP 2.3, Mapreduce 2, YARN, Resource Manager, HDFS, Hive, Hue, Kerberos, Ambari, Capacity Scheduler, PaaS, IaaS, JIRA, SAS, R Studio, Atscale, Nifi, Tableau, Zoo Keeper, Kafka, Flume, Falcon, Tez, Hbase, Oozie, SSL, Solr, RHEL6, Confluence, Ranger, Sqoop, Queues, Flight Tracker, Tom cat, Datameer, Jenkins, H2O, GITLab and Maven.
Confidential, MI
Hadoop/Java DevOps Consultant
Responsibilities:
- Built environments viz, POC, Dev and QA for multiple projects spanning from Architecting Hadoop Clusters, Installation, Configuration and Management of Hadoop Cluster.
- Performed Benchmarking and performance tuning on the Hadoop infrastructure.
- Researched on various Java exceptions and troubleshooting methods.
- Deployed source code OD developers using tools Maven, Ant, Jenkins, GIT etc.
- L1/L2/L3 Troubleshooting based on occurred java exceptions and errors while running apps.
- Exporting data from RDBMS to HIVE, HDFS and HIVE, HDFS to RDBMS by using SQOOP.
- Hands on experience with Spark, Scala and Spark SQL running through spark/hive context.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- There are also 2 different clusters for POC managed by Map R & Cloudera CDH3 distribution.
- Managed Big Data and creating datasets using Pig and Hive query languages.
- Complete Big Data management and Application trouble shooting support.
- Maintenance of Hadoop Clusters using HDP Horton Works Distribution and also CDH3.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (like MapReduce, Pig, Hive, Sqoop) as well as system specific jobs.
- Applied UNIX administration skills whenever it is required to access from CLI and terminal.
- Implementing the SFTP for the projects to transfer data SCP from External servers to servers.
- Installed and managed multiple Hadoop clusters - Production, stage, development.
- Installed and managed production cluster of 150 Node cluster with 4+ PB.
- Implemented dual data center set up for all Cassandra cluster.Performed much complex system analysis in order to improve ETL performance, identified high critical batch jobs to prioritize.
- Involved in analysing system failures, identifying root causes, and recommended course of actions and lab clusters.
- Designed the Cluster tests before and after upgrades to validate the cluster status.
- Regular Maintenance of Commission/decommission nodes as disk failures occur using Cloudera Manager.
- Performed Benchmarking and performance tuning on the Hadoop infrastructure.
- Automated data loading between production and disaster recovery cluster.
- Helping users and teams with incidents related to administration and development.
- Guide users in development and work with developers closely for preparing a data lake.
- Created Hive external tables for loading the parse data using partitions.
- Developed various workflows using custom MapReduce, Pig, Hive and scheduled with Oozie.
- Experienced on adding/installation of new components and removal of them through ambari of HDP.
- Worked as a lead on Big Data Integration and Analytics based on Hadoop, BI tools and Web method technologies.
Confidential
Business Promoter (Hadoop Admin + Marketing + Promotions)
Responsibilities:
- Install, configure, and upgrade Hadoop components in Cloudera.
- Troubleshoot cluster issues, and support developers running Map Reduce or Tez jobs in Pig and Hive.
- Proactively optimize and tune cluster configuration for performance.
- Organize, document, and improve process around data inventory management of data.
- Monitor Hadoop cluster with Cloudera Manager, Nagios, and Ganglia.
- Manage cluster resources and multiple concurrent workloads to maintain availability and SLAs.
- Implement cluster backups and manage production data deployments between Data Centers.
- Implement and maintain security policies in the Hadoop & Linux environment.
- Research latest developments in the Hadoop open source platform and recommend solutions and improvements.
- Evaluate tools and technologies to improve cluster performance and ETL processing.
- Install, configure, and administrate Linux cloud based servers running in Amazon Web Services.
- Create documentation on cluster architecture, configuration, and best practices.
- Work with Hortonworks support team to investigate and resolve tickets for cluster or developer issues.
- Opportunity to contribute in Hadoop architecture, Hive ETL development, and data QA.
- Have worked on various distributions like Cloudera (PROD) and Map R for another cluster.
- Worked in implementing Hadoop with the AWS EC2 system using a few instances in gathering and analysing data log files. Developed technical solutions to business problems.
- Applied UNIX administration skills whenever it is required to access from putty and terminal.
- Hands on experience installing, configuring, administering, debugging and troubleshooting.
- Architecture design and implementation of deployment, configuration management, backup, and disaster recovery systems and procedures.
- Extensive knowledge in troubleshooting related issues.
- Sanpro was a start-up company which was testing and trying to introduce big data at their environment. The aim is to promote a BI product named Go Digital Student ID at colleges.
Confidential
Data Analyst
Responsibilities:
- Unix / Linux Environment, Data Modelling, Data Analysis, Quality Control, QC,GIS, Data profiling.
- Processing and Analysing Geo Data using Java based GIS application “Atlas”.
- Responsible for loading, extracting and validation of client data.
- Liaising with end users and 3rd party suppliers.
- Analyzing raw data, drawing conclusions & developing recommendations.
- Editing, cleansing & processing data using Excel, Access and SQL.
- Complete Unix/Linux environment involved with UNIX /Linux commanding and also SQL querying.
- Quality controlling (reviewing) Geo Data labeller’s initial level works at QA Level.
- Data mining using applications and databases, Oracle, SAP and My SQL etc.
- Delegating with clients regarding work status every week through presentations and reports.
- Composing power point slides for trainers who train labellers and QC for Quality assurance at testing stage.
- Interacting with various departments involved in SDLC (Trainers, Developers, BI analysts, Team Leaders, Testers).
- Used Informatica & SAS to extract transform & load source data from transaction systems.
Confidential
UNIX / Linux Package Installer
Responsibilities:
- Administration of RHEL which includes installation, testing, tuning, upgrading and troubleshooting both physical and virtual server issues.
- Managing Disk File Systems, Server Performance, Users Creation and granting file access Permissions.
- RPM and YUM package installations, patch and other server management.
- Responsible for handling the tickets raised by the end users which includes installation of packages, login issues, access issues.
- User management like adding, modifying, deleting, grouping.
- Responsible for preventive maintenance of the servers on monthly basis.
- Configuration of the RAID for the servers and running Cron-tab to back up Data.
- Resource management using the Disk quotas.
- Documenting the issues on daily basis to the resolution portal.
- Responsible for change management release scheduled by service providers.
- Generating the weekly and monthly reports for the tickets that worked and reporting it.
- Managing Systems operations with final accountability for smooth installation, networking, and operation, troubleshooting of hardware and software in LINUX environment.
- Creation of groups, adding User to a group and also removing User from group.
- Established/implemented firewall rules, Validated rules with vulnerability scanning tools.
- Trouble shooting User's login & home directory related issues.
- Monitoring System Metrics and logs for any problems.
- Applied Operating System updates, patches and configuration changes.
- Maintaining the MySQL server and Authentication to required users for Databases.
- Appropriately documented various Administrative technical issues for this training institute.
- This is a start-up training company and I used to work often for them on freelance basis.