Hadoop Admin Resume
San Jose, CA
SUMMARY
- Over 6+ years of experience in IT field including 3 years of experience in Hadoop Administration in diverse industries which includes hands on experience in Big data ecosystem related technologies.
- Excellent understanding of Hadoop architecture and underlying framework including storage management.
- Have good experience wif design, management, configuration and troubleshooting of distributed production environments based on NoSQL technologies like MongoDB, Apache Hadoop/ HBase, Couchbase etc.
- Well Experienced in defining, designing, integrating and reengineering teh Enterprise Data warehouse and Data Marts in different Environments like Teradata and various levels of complexity.
- Extensive database experience and highly skilled in SQL in Oracle, MS SQL Server, Teradata, Sybase, Mainframe Files, Flat Files, MS Access.
- Experience working in object oriented programming (OOP) concepts using Python, C++ and PHP. Hands on experience in developing Web Services wif Python programming language.
- Rich SAS experience wif exposure to Base SAS, Macros, SAS CONNECT, SAS ACCESS, SAS ODS, SAS 9 BI Suite (SAS ETL, SAS Management Console, SAS Enterprise Guide, SAS Stored Procedures, SAS Add - in for Microsoft Office).
- Experience in Hadoop Ecosystem including HDFS, Hive, Pig, Hbase, Oozie, Sqoop and knowledge of Map-Reduce framework.
- Working experience on designing and implementing complete end to end Hadoop Infrastructure.
- Good Experience in Hadoop cluster capacity planning and designing Name Node, Secondary Name Node, Data Node, Job Tracker, Task Tracker.
- Hands on experience in installation, configuration, management and development of big data solutions using Apache, CLOUDERA (CDH3, CDH4) andHortonworks distributions.
- Good experience on Design, configure and manage teh backup and disaster recovery for Hadoop data.
- Experience in setting up automated monitoring on Hadoop Cluster using Nagios and Gangalia.
- In-depth knowledge of modifications required in static IP (interfaces), hosts and bashrc files, setting up password-less SSH and Hadoop configuration for Cluster setup and maintenance.
- Good understanding of Cassandra monitoring and Querying Language called CQL for Apache Cassandra.
- Experienced using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Strong experience as a Java Developer in Web/intranet, client/server technologies using J2SE, J2EE Servlets, JSP, JDBC and SQL.
- Extensive experience in developing applications using HTML, JQuery, DOJO Tool Kit, JSP, Servlets, JavaBeans, EJB, JSTL, JSP Custom Tag Libraries, JDBC, JMS publish/Subscribe, JNDI, JavaScript, XML, XSLT, JAXB.
TECHNICAL SKILLS
Big Data Technologies: HDFS, Hive, Map Reduce, Pig, Sqoop, Oozie, Zookeeper, YARN, Avro, Spark
Scripting Languages: Shell, Python, Perl
Programming Languages: Java, C++,C,SQL,PL/SQL
Front End Technologies: HTML, XHTML, CSS, XML, Javascript, AJAX, Servlets, JSP
Java Frameworks: MVC, Apache Struts2.0, Spring and Hibernate
Web Services: SOAP(JAX-WS), WSDL, SOA, Restful(JAX-RS), JMS
Application Servers: Apache Tomcat, Weblogic Server, Websphere, JBoss
Databases: Oracle 11g, MySQL, MS SQL Server, IBM DB2
NoSQL Databases: HBase, MongoDB, Casaandra
RDBMS: Teradata,Oracle 9i, Oracle 10g, MS Access, MS SQL Server, IBM DB2, PL/SQL
Operating Systems: Linux, UNIX, MAC, Windows NT / 98 /2000/ XP / Vista, Windows 7,Windows 8
Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP, SMTP, POP3.
Testing Tools: NetBeans, Eclipse, WSAD, RAD
PROFESSIONAL EXPERIENCE
Confidential, San Jose, CA
Hadoop Admin
Responsibilities:
- Responsible for implementation and ongoing administration of Hadoop infrastructure.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Involved in developing and implementation of teh web application using Python.
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Involved in Implementing security on Hortonworks Hadoop Cluster using wif Kerberos by working along wif operations team to move non secured cluster to secured cluster.
- Involved in architecting Hadoop clusters using major Hadoop Distributions - CDH3 & CDH4.
- Performed bulk data load from multiple data source (ORACLE 8i, legacy systems) to TERADATA RDBMS using BTEQ, Multiload and FastLoad.
- Monitoring systems and services, architecture design and implementation of hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Aligning wif teh systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
- Wrote python scripts to parse XML documents and load teh data in database.
- Addressed Data Quality Using Informatica Data Quality (IDQ) tool.
- Utilized Informatica Data Explorer (IDE) to analyze legacy data for data profiling.
- Development of Informatica mappings and workflows using Informatica 7.1.1.
- Worked on Identifying and eliminating duplicates in datasets thorough IDQ 8.6.1 components.
- Optimized teh full text search function by connecting MongoDB and ElasticSearch.
- Utilize big-data technologies such as ElasticSearch, Riak, RabbitMQ, Couchbase, Redis, Docker, Mesos/Marathon, Jenkins, Puppet/Chef, Github, and much more.
- Implemented a distributed messaging queue to integrate wif Cassandra using Apache Kafka and ZooKeeper.
- Involved in a POC to implement a failsafe distributed data storage and computation system using Apache YARN.
- Created 48 node Cassandra cluster for Single Point Inventory application in Apache 1.2.5.
- Upgraded Single Point Inventory 48 node cluster from Apache 1.2.5 to DSE-4.6.1.
- Upgraded mobile checkout cluster from Apache Cassandra1.1 to DSE 4.6.7.
- Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
- Responsible for building scalable distributed data solutions using Datastax Cassandra.
- Used teh Spark - Cassandra Connector to load data to and from Cassandra.
- Ran many performance tests using teh Cassandra-stress tool in order to measure and improve teh read and write performance of teh cluster.
- Created User defined types to store specialized data structures in Cassandra.
- Wrote a technical paper and created slideshow outlining teh project and showing how Cassandra can be potentially used to improve performance.
- Setting up monitoring tools Ganglis, Nagios for Hadoop monitoring and alerting. Monitoring and maintaining Hadoop cluster Hadoop/HBase/zookeeper using these tools Ganglia and Nagios.
- As a admin followed standard Back up policies to make sure teh high availability of cluster.
- Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
- Screen Hadoop cluster job performances and capacity planning.
- Monitored Hadoop cluster connectivity and security and also involved in management and monitoring Hadoop log files.
- Used Puppet for creating scripts, deployment for servers, and managing changes through Puppet master server on its clients.
Confidential, San Francisco, CA
Hadoop Admin
Responsibilities:
- Working on multiple projects spanning from Architecting Hadoop Clusters, Installation, Configuration and Management of Hadoop Cluster.
- Designed and developed Hadoop system to analyze teh SIEM (Security Information and Event Management) data using MapReduce, HBase, Hive, Sqoop and Flume.
- Developed custom writable MapReduce JAVA programs to load web server logs into HBase using flume.
- Integrated Oozie wif teh rest of teh Hadoop stack supporting several types of Hadoop jobs out of teh box (like MapReduce, Pig, Hive, Sqoop) as well as system specific jobs.
- Developed entire data transfer model using Sqoop framework.
- Integrated Kafka wif Flume in sand box Environment using Kafka source and Kafka sink.
- Configured flume agent wif flume syslog source to receive teh data from syslog servers.
- Implemented teh Hadoop Name-node HA services to make teh Hadoop services highly available.
- Exporting data from RDBMS to HIVE, HDFS and HIVE, HDFS to RDBMS by using SQOOP.
- Installed and managed multiple hadoop clusters - Production, stage, development.
- Performance tuning for infrastructure and Hadoop settings for optimal performance of jobs and their throughput.
- Involved in analyzing system failures, identifying root causes, and recommended course of actions and lab clusters.
- Regular Maintenance of Commissioned/decommission nodes as disk failures occur using Cloudera Manager.
- Performed Benchmarking and performance tuning on teh Hadoop infrastructure. Automated data loading between production and disaster recovery cluster.
- Migrated hive schema from production cluster to DR cluster.
- Worked on Migrating application by doing Poc's from relation database systems. Helping users and teams wif incidents related to administration and development.
- Extensively worked on Teradata performance optimization and brought down teh queries to seconds or minutes from spool out and never ending queries by using various Teradata optimization strategies.Migrated data from SQL Server to HBase using Sqoop.
- Log data Stored in HBase DB is processed and analyzed and tan imported into Hive warehouse, which enabled end business analysts to write HQL queries.
- Built re-usable Hive UDF libraries which enabled various business analysts to use these UDF's in Hive querying.
- Created Hive external tables for loading teh parse data using partitions.
- Developed various workflows using custom MapReduce, Pig, Hive and scheduled them using Oozie.
- Responsible for Installing, setup and Configuring Apache Kafka and Apache Zookeeper.
- Setup Cassandra wif Apache Solrand Setup Cassandra Wif Apache Spark
- Writing MapReduce Jobs to cleanse and parse data in HDFS obtained from various data sources and migrated to MPP databases such as Teradata.
- Using Pentaho generated teh reports which are consumed by teh business analysts.
- Extensive knowledge in troubleshooting code related issues.
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
Confidential, New York
Hadoop Consultant
Responsibilities:
- Worked on Distributed/Cloud Computing (Map Reduce/ Hadoop, Hive, Pig, Hbase, Sqoop, Flume, Spark AVRO, Zookeeper, Tableau, etc.), Hortonworks (HDP 2.2.4.2), PIVOTAL HD 3.0 distribution for 4 clusters ranges from POC to PROD contains nearly 100 nodes.
- Developed POC’s on Amazon Web Services (S3, EC2, EMR, etc.), Performance Tuning and ETL, Agile Software Development, Team Building & Leadership, Engineering Management.
- Worked wif application teams to install operating system,Hadoopupdates, patches, version upgrades as required.
- Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.2. As an admin followed standard Back up policies to make sure teh high availability of cluster.
- Monitored multiple Hadoopclusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
- Implemented dual data center set up for all Cassandra cluster.Performed many complex system analysis in order to improve ETL performance, identified high critical batch jobs to prioritize
- Implemented Spark solution to enable real time reports from Cassandra data.Was also actively Involved in designing column families for various Cassandra Clusters.
- I is responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Cluster Planning, Manage and review data backups,Manage & review log files.
- Writing data ingest on systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in NoSQL databases such as HBase, Cassandra.
- Created and maintained Technical documentation for launching HadoopClusters and for executing Hive queries and Pig Scripts
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Cluster Planning, Manage and review data backups, Manage & review log files.
- Development projects Extensively on Hive, Spark, Pig, Sqoop and GemfireXD through out teh development Lifecycle until teh projects went into Production.
- Experienced on adding/installation of new components and removal of them through ambari on HDP and manually on EC2 clusters.
- Architecture design and implementation of deployment, configuration management, backup, and disaster recovery systems and procedures. Hands on experience on cluster up gradation and patch upgrade wifout any data loss and wif proper backup plans.
- Provided security and autantication wif ranger where ranger admin provides administration and user sync adds teh new users to teh cluster.
- Good troubleshooting skills on over all Hadoop stack components, ETL services and Hue, which provides GUI for developers/business users for day-to-day activities.
- Setup flume for different sources to bring teh log messages from outside to Hadoop hdfs.
- Create queues and allocated teh clusters resources to provide teh priority for jobs.
- Involved in snapshots and mirroring to maintain teh backup of cluster data and even remotely.
- Implementing teh SFTP for teh projects to transfer data from External servers to servers. Experienced in managing and reviewing log files.
- Involved in scheduling Oozie workflow engine to run multiple Hive, sqoop and pig jobs.
- Working experience on maintaining MySQL databases creation and setting up teh users and maintain teh backup of cluster metadata databases wif cron jobs.
- Setting up MySQL master and slave replications, Postgres and halping business applications to maintain their data.
- Managed and reviewed Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- Experienced on adding/installation of new components and removal of them through ambari on HDP and manually on EC2 clusters.
- Working experience on maintaining MySQL databases creation and setting up teh users and maintain teh backup of cluster metadata databases wif cron jobs.
Confidential
Linux Administrator
Responsibilities:
- Responsible for handling teh tickets raised by teh end users which includes installation of packages, login issues, access issues
- User management like adding, modifying, deleting, grouping
- Responsible for preventive maintenance of teh servers on monthly basis. Configuration of teh RAID for teh servers. Resource management using teh Disk quotas
- Documenting teh issues on daily basis to teh resolution portal.
- Responsible for change management release scheduled by service providers.
- Generating teh weekly and monthly reports for teh tickets that worked on and sending report to teh management
- Managing Systems operations wif final accountability for smooth installation, networking, and operation, troubleshooting of hardware and software in LINUX environment.
- Identifying operational needs of various departments and developing customized software to enhance System's productivity.
- Running LINUX SQUID Proxy server wif access restrictions wif ACLs and password.
- Established/implemented firewall rules, Validated rules wif vulnerability scanning tools.
- Proactively detecting Computer Security violations, collecting evidence and presenting results to teh management.
- Accomplished System/e-mail autantication using LDAP enterprise Database.
- Implemented a Database enabled Intranet web site using LINUX, Apache, MySQL Database backend.
- Installed Cent OS using Pre-Execution environment boot and Kick-start method on multiple servers.
- Monitoring System Metrics and logs for any problems.
- Running Cron-tab to back up Data.Applied Operating System updates, patches and configuration changes.
- Maintaining teh MySQL server and Autantication to required users for Databases.Appropriately documented various Administrative & technical issues
Confidential
Linux Administrator
Responsibilities:
- Responsible for handling teh tickets raised by teh end users which includes installation of packages, login issues, access issues
- User management like adding, modifying, deleting, grouping.
- Responsible for preventive maintenance of teh servers on monthly basis.Configuration of teh RAID for teh servers.
- Resource management using teh Disk quotas.
- Documenting teh issues on daily basis to teh resolution portal.
- Responsible for change management release scheduled by service providers.
- Generating teh weekly and monthly reports for teh tickets that worked on and sending report to teh management.
- Managing Systems operations wif final accountability for smooth installation, networking, and operation, troubleshooting of hardware and software in LINUX environment.
- Identifying operational needs of various departments and developing customized software to enhance System's productivity.
- Running LINUX SQUID Proxy server wif access restrictions wif ACLs and password.
- Established/implemented firewall rules, Validated rules wif vulnerability scanning tools.
- Proactively detecting Computer Security violations, collecting evidence and presenting results to teh management.
- Accomplished System/e-mail autantication using LDAP enterprise Database.
- Implemented a Database enabled Intranet web site using LINUX, Apache, MySQL Database backend.
- Installed Cent OS using Pre-Execution environment boot and Kick-start method on multiple servers.
- Monitoring System Metrics and logs for any problems.
- Running Cron-tab to back up Data.
- Applied Operating System updates, patches and configuration changes.
- Maintaining teh MySQL server and Autantication to required users for Databases.
- Appropriately documented various Administrative & technical issues