We provide IT Staff Augmentation Services!

Hadoop Admin Resume

0/5 (Submit Your Rating)

San Jose, CA

SUMMARY

  • Over 6+ years of experience in IT field including 3 years of experience in Hadoop Administration in diverse industries which includes hands on experience in Big data ecosystem related technologies.
  • Excellent understanding of Hadoop architecture and underlying framework including storage management.
  • Have good experience with design, management, configuration and troubleshooting of distributed production environments based on NoSQL technologies like MongoDB, Apache Hadoop/ HBase, Couchbase etc.
  • Well Experienced in defining, designing, integrating and reengineering the Enterprise Data warehouse and Data Marts in different Environments like Teradata and various levels of complexity.
  • Extensive database experience and highly skilled in SQL in Oracle, MS SQL Server, Teradata, Sybase, Mainframe Files, Flat Files, MS Access.
  • Experience working in object oriented programming (OOP) concepts using Python, C++ and PHP. Hands on experience in developing Web Services with Python programming language.
  • Rich SAS experience with exposure to Base SAS, Macros, SAS CONNECT, SAS ACCESS, SAS ODS, SAS 9 BI Suite (SAS ETL, SAS Management Console, SAS Enterprise Guide, SAS Stored Procedures, SAS Add - in for Microsoft Office).
  • Experience in Hadoop Ecosystem including HDFS, Hive, Pig, Hbase, Oozie, Sqoop and knowledge of Map-Reduce framework.
  • Working experience on designing and implementing complete end to end Hadoop Infrastructure.
  • Good Experience in Hadoop cluster capacity planning and designing Name Node, Secondary Name Node, Data Node, Job Tracker, Task Tracker.
  • Hands on experience in installation, configuration, management and development of big data solutions using Apache, CLOUDERA (CDH3, CDH4) andHortonworks distributions.
  • Good experience on Design, configure and manage the backup and disaster recovery for Hadoop data.
  • Experience in setting up automated monitoring on Hadoop Cluster using Nagios and Gangalia.
  • In-depth knowledge of modifications required in static IP (interfaces), hosts and bashrc files, setting up password-less SSH and Hadoop configuration for Cluster setup and maintenance.
  • Good understanding of Cassandra monitoring and Querying Language called CQL for Apache Cassandra.
  • Experienced using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Strong experience as a Java Developer in Web/intranet, client/server technologies using J2SE, J2EE Servlets, JSP, JDBC and SQL.
  • Extensive experience in developing applications using HTML, JQuery, DOJO Tool Kit, JSP, Servlets, JavaBeans, EJB, JSTL, JSP Custom Tag Libraries, JDBC, JMS publish/Subscribe, JNDI, JavaScript, XML, XSLT, JAXB.

TECHNICAL SKILLS

Big Data Technologies: HDFS, Hive, Map Reduce, Pig, Sqoop, Oozie, Zookeeper, YARN, Avro, Spark

Scripting Languages: Shell, Python, Perl

Programming Languages: Java, C++,C,SQL,PL/SQL

Front End Technologies: HTML, XHTML, CSS, XML, Javascript, AJAX, Servlets, JSP

Java Frameworks: MVC, Apache Struts2.0, Spring and Hibernate

Web Services: SOAP(JAX-WS), WSDL, SOA, Restful(JAX-RS), JMS

Application Servers: Apache Tomcat, Weblogic Server, Websphere, JBoss

Databases: Oracle 11g, MySQL, MS SQL Server, IBM DB2

NoSQL Databases: HBase, MongoDB, Casaandra

RDBMS: Teradata,Oracle 9i, Oracle 10g, MS Access, MS SQL Server, IBM DB2, PL/SQL

Operating Systems: Linux, UNIX, MAC, Windows NT / 98 /2000/ XP / Vista, Windows 7,Windows 8

Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP, SMTP, POP3.

Testing Tools: NetBeans, Eclipse, WSAD, RAD

PROFESSIONAL EXPERIENCE

Confidential, San Jose, CA

Hadoop Admin

Responsibilities:

  • Responsible for implementation and ongoing administration of Hadoop infrastructure.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Involved in developing and implementation of the web application using Python.
  • Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
  • Involved in Implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non secured cluster to secured cluster.
  • Involved in architecting Hadoop clusters using major Hadoop Distributions - CDH3 & CDH4.
  • Performed bulk data load from multiple data source (ORACLE 8i, legacy systems) to TERADATA RDBMS using BTEQ, Multiload and FastLoad.
  • Monitoring systems and services, architecture design and implementation of hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
  • Wrote python scripts to parse XML documents and load the data in database.
  • Addressed Data Quality Using Informatica Data Quality (IDQ) tool.
  • Utilized Informatica Data Explorer (IDE) to analyze legacy data for data profiling.
  • Development of Informatica mappings and workflows using Informatica 7.1.1.
  • Worked on Identifying and eliminating duplicates in datasets thorough IDQ 8.6.1 components.
  • Optimized the full text search function by connecting MongoDB and ElasticSearch.
  • Utilize big-data technologies such as ElasticSearch, Riak, RabbitMQ, Couchbase, Redis, Docker, Mesos/Marathon, Jenkins, Puppet/Chef, Github, and much more.
  • Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and ZooKeeper.
  • Involved in a POC to implement a failsafe distributed data storage and computation system using Apache YARN.
  • Created 48 node Cassandra cluster for Single Point Inventory application in Apache 1.2.5.
  • Upgraded Single Point Inventory 48 node cluster from Apache 1.2.5 to DSE-4.6.1.
  • Upgraded mobile checkout cluster from Apache Cassandra1.1 to DSE 4.6.7.
  • Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
  • Responsible for building scalable distributed data solutions using Datastax Cassandra.
  • Used the Spark - Cassandra Connector to load data to and from Cassandra.
  • Ran many performance tests using the Cassandra-stress tool in order to measure and improve the read and write performance of the cluster.
  • Created User defined types to store specialized data structures in Cassandra.
  • Wrote a technical paper and created slideshow outlining the project and showing how Cassandra can be potentially used to improve performance.
  • Setting up monitoring tools Ganglis, Nagios for Hadoop monitoring and alerting. Monitoring and maintaining Hadoop cluster Hadoop/HBase/zookeeper using these tools Ganglia and Nagios.
  • As a admin followed standard Back up policies to make sure the high availability of cluster.
  • Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
  • Screen Hadoop cluster job performances and capacity planning.
  • Monitored Hadoop cluster connectivity and security and also involved in management and monitoring Hadoop log files.
  • Used Puppet for creating scripts, deployment for servers, and managing changes through Puppet master server on its clients.

Confidential, San Francisco, CA

Hadoop Admin

Responsibilities:

  • Working on multiple projects spanning from Architecting Hadoop Clusters, Installation, Configuration and Management of Hadoop Cluster.
  • Designed and developed Hadoop system to analyze the SIEM (Security Information and Event Management) data using MapReduce, HBase, Hive, Sqoop and Flume.
  • Developed custom writable MapReduce JAVA programs to load web server logs into HBase using flume.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (like MapReduce, Pig, Hive, Sqoop) as well as system specific jobs.
  • Developed entire data transfer model using Sqoop framework.
  • Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink.
  • Configured flume agent with flume syslog source to receive the data from syslog servers.
  • Implemented the Hadoop Name-node HA services to make the Hadoop services highly available.
  • Exporting data from RDBMS to HIVE, HDFS and HIVE, HDFS to RDBMS by using SQOOP.
  • Installed and managed multiple hadoop clusters - Production, stage, development.
  • Performance tuning for infrastructure and Hadoop settings for optimal performance of jobs and their throughput.
  • Involved in analyzing system failures, identifying root causes, and recommended course of actions and lab clusters.
  • Regular Maintenance of Commissioned/decommission nodes as disk failures occur using Cloudera Manager.
  • Performed Benchmarking and performance tuning on the Hadoop infrastructure. Automated data loading between production and disaster recovery cluster.
  • Migrated hive schema from production cluster to DR cluster.
  • Worked on Migrating application by doing Poc's from relation database systems. Helping users and teams with incidents related to administration and development.
  • Extensively worked on Teradata performance optimization and brought down the queries to seconds or minutes from spool out and never ending queries by using various Teradata optimization strategies.Migrated data from SQL Server to HBase using Sqoop.
  • Log data Stored in HBase DB is processed and analyzed and then imported into Hive warehouse, which enabled end business analysts to write HQL queries.
  • Built re-usable Hive UDF libraries which enabled various business analysts to use these UDF's in Hive querying.
  • Created Hive external tables for loading the parse data using partitions.
  • Developed various workflows using custom MapReduce, Pig, Hive and scheduled them using Oozie.
  • Responsible for Installing, setup and Configuring Apache Kafka and Apache Zookeeper.
  • Setup Cassandra with Apache Solrand Setup Cassandra With Apache Spark
  • Writing MapReduce Jobs to cleanse and parse data in HDFS obtained from various data sources and migrated to MPP databases such as Teradata.
  • Using Pentaho generated the reports which are consumed by the business analysts.
  • Extensive knowledge in troubleshooting code related issues.
  • Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.

Confidential, New York

Hadoop Consultant

Responsibilities:

  • Worked on Distributed/Cloud Computing (Map Reduce/ Hadoop, Hive, Pig, Hbase, Sqoop, Flume, Spark AVRO, Zookeeper, Tableau, etc.), Hortonworks (HDP 2.2.4.2), PIVOTAL HD 3.0 distribution for 4 clusters ranges from POC to PROD contains nearly 100 nodes.
  • Developed POC’s on Amazon Web Services (S3, EC2, EMR, etc.), Performance Tuning and ETL, Agile Software Development, Team Building & Leadership, Engineering Management.
  • Worked with application teams to install operating system,Hadoopupdates, patches, version upgrades as required.
  • Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.2. As an admin followed standard Back up policies to make sure the high availability of cluster.
  • Monitored multiple Hadoopclusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
  • Implemented dual data center set up for all Cassandra cluster.Performed many complex system analysis in order to improve ETL performance, identified high critical batch jobs to prioritize
  • Implemented Spark solution to enable real time reports from Cassandra data.Was also actively Involved in designing column families for various Cassandra Clusters.
  • I am responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Cluster Planning, Manage and review data backups,Manage & review log files.
  • Writing data ingest on systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in NoSQL databases such as HBase, Cassandra.
  • Created and maintained Technical documentation for launching HadoopClusters and for executing Hive queries and Pig Scripts
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Cluster Planning, Manage and review data backups, Manage & review log files.
  • Development projects Extensively on Hive, Spark, Pig, Sqoop and GemfireXD through out the development Lifecycle until the projects went into Production.
  • Experienced on adding/installation of new components and removal of them through ambari on HDP and manually on EC2 clusters.
  • Architecture design and implementation of deployment, configuration management, backup, and disaster recovery systems and procedures. Hands on experience on cluster up gradation and patch upgrade without any data loss and with proper backup plans.
  • Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster.
  • Good troubleshooting skills on over all Hadoop stack components, ETL services and Hue, which provides GUI for developers/business users for day-to-day activities.
  • Setup flume for different sources to bring the log messages from outside to Hadoop hdfs.
  • Create queues and allocated the clusters resources to provide the priority for jobs.
  • Involved in snapshots and mirroring to maintain the backup of cluster data and even remotely.
  • Implementing the SFTP for the projects to transfer data from External servers to servers. Experienced in managing and reviewing log files.
  • Involved in scheduling Oozie workflow engine to run multiple Hive, sqoop and pig jobs.
  • Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases with cron jobs.
  • Setting up MySQL master and slave replications, Postgres and helping business applications to maintain their data.
  • Managed and reviewed Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
  • Experienced on adding/installation of new components and removal of them through ambari on HDP and manually on EC2 clusters.
  • Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases with cron jobs.

Confidential

Linux Administrator

Responsibilities:

  • Responsible for handling the tickets raised by the end users which includes installation of packages, login issues, access issues
  • User management like adding, modifying, deleting, grouping
  • Responsible for preventive maintenance of the servers on monthly basis. Configuration of the RAID for the servers. Resource management using the Disk quotas
  • Documenting the issues on daily basis to the resolution portal.
  • Responsible for change management release scheduled by service providers.
  • Generating the weekly and monthly reports for the tickets that worked on and sending report to the management
  • Managing Systems operations with final accountability for smooth installation, networking, and operation, troubleshooting of hardware and software in LINUX environment.
  • Identifying operational needs of various departments and developing customized software to enhance System's productivity.
  • Running LINUX SQUID Proxy server with access restrictions with ACLs and password.
  • Established/implemented firewall rules, Validated rules with vulnerability scanning tools.
  • Proactively detecting Computer Security violations, collecting evidence and presenting results to the management.
  • Accomplished System/e-mail authentication using LDAP enterprise Database.
  • Implemented a Database enabled Intranet web site using LINUX, Apache, MySQL Database backend.
  • Installed Cent OS using Pre-Execution environment boot and Kick-start method on multiple servers.
  • Monitoring System Metrics and logs for any problems.
  • Running Cron-tab to back up Data.Applied Operating System updates, patches and configuration changes.
  • Maintaining the MySQL server and Authentication to required users for Databases.Appropriately documented various Administrative & technical issues

Confidential

Linux Administrator

Responsibilities:

  • Responsible for handling the tickets raised by the end users which includes installation of packages, login issues, access issues
  • User management like adding, modifying, deleting, grouping.
  • Responsible for preventive maintenance of the servers on monthly basis.Configuration of the RAID for the servers.
  • Resource management using the Disk quotas.
  • Documenting the issues on daily basis to the resolution portal.
  • Responsible for change management release scheduled by service providers.
  • Generating the weekly and monthly reports for the tickets that worked on and sending report to the management.
  • Managing Systems operations with final accountability for smooth installation, networking, and operation, troubleshooting of hardware and software in LINUX environment.
  • Identifying operational needs of various departments and developing customized software to enhance System's productivity.
  • Running LINUX SQUID Proxy server with access restrictions with ACLs and password.
  • Established/implemented firewall rules, Validated rules with vulnerability scanning tools.
  • Proactively detecting Computer Security violations, collecting evidence and presenting results to the management.
  • Accomplished System/e-mail authentication using LDAP enterprise Database.
  • Implemented a Database enabled Intranet web site using LINUX, Apache, MySQL Database backend.
  • Installed Cent OS using Pre-Execution environment boot and Kick-start method on multiple servers.
  • Monitoring System Metrics and logs for any problems.
  • Running Cron-tab to back up Data.
  • Applied Operating System updates, patches and configuration changes.
  • Maintaining the MySQL server and Authentication to required users for Databases.
  • Appropriately documented various Administrative & technical issues

We'd love your feedback!