We provide IT Staff Augmentation Services!

Sr. Hadoop Administrator Resume

2.00/5 (Submit Your Rating)

Irving, TX

PROFESSIONAL SUMMARY

  • About 7+ years of experience with emphasis on Big Data Technologies, Development and Design of Java based enterprise applications.
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode and MapReduce programming paradigm.
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions and on amazon web services (AWS).
  • Hands - on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase-Hive Integration, PIG, Sqoop, Flume& knowledge of Mapper/Reduce/HDFS Framework.
  • Set up standards and processes for Hadoop based application design and implementation.
  • Worked on NoSQL databases including Hbase, Cassandra and MongoDB.
  • Good experience in analysis using PIG and HIVE and understanding of SQOOP and Puppet.
  • Expertise in database performance tuning & data modeling.
  • Experience in Operational Intelligence using Splunk.
  • Prepared, arranged and tested Splunk search strings and operational strings.
  • Worked on large datasets to generate insights by using Splunk
  • Developed software to distribute and generate feeds for Comcast clients using Hadoop (HDFS) Java API, a Java SSH library and Azkaban GUI.
  • Developed automated scripts using Unix Shell for performing RUNSTATS, REORG, REBIND, COPY, LOAD, BACKUP, IMPORT, EXPORT and other related to database activities.
  • Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
  • Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
  • Expertise in working with different databases likes Oracle, MS-SQL Server, Postgress, and MS Access 2000 along with exposure to Hibernate for mapping an object-oriented domain model to a traditional relational database.
  • Extensive experience in data analysis using tools like Syncsort and HZ along with Shell Scripting and UNIX.
  • Involved in design and architecture of Enterprise grade technologies associated with Docker.
  • Involved in log file management where the logs greater than 7 days old were removed from log folder and loaded into HDFS and stored for 3 months.
  • Expertise in development support activities including installation, configuration and successful deployment of changes across all environments.
  • Familiarity and experience with data warehousing and ETL tools.
  • Good working Knowledge in OOA&OOD using UML and designing use cases.
  • Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
  • Experience in production support and application support by fixing bugs.
  • Used HP Quality Center for logging test cases and defects.
  • Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.

TECHNICAL EXPERTISE

Big Data Technologies: Hadoop, HDFS, Hive, Map Reduce, Cassandra, Pig, Scoop, Falcon, Flume, Zookeeper, YarnMahout, Oozie, Avro, HBase, MapReduce, HDFS, Storm, CDH 5.3, CDH 5.4ETL Tools Teradata, Pentaho

Databases: IBM DB2, PostgreSQL, MongoDB, MySQL, NoSQL, Oracle 11i/10g/9i

Server: WEBrick, Thin, Unicorn, Apache, AWS

Security: Kerberos

Scripting Languages: Shell Scripting, Puppet, Scripting, Python, Bash, CSH, Ruby, PHP

Monitoring Tools: Cloudera Manager, Ambari, Nagios, Ganglia

Java Technologies: Java, J2EE, JSP, Servlets, Struts, Hibernate, Spring

Testing: Capybara, Web Driver Testing Frameworks RSpec, Cucumber, Junit, SVN

Operating Systems: Linux RHEL/Ubuntu/CentOS, Windows (XP/7/8/10)

Other tools: Angular.js, knockout.js, backbone.js, ember.js, react.js, node.js, bootstrap, Redmine, Bugzilla, JIRA, Agile SCRUM, SDLC Waterfall

PROFESSIONAL EXPERIENCE

Confidential - Irving, TX

Sr. Hadoop Administrator

Job Responsibilities:

  • Worked on Distributed/Cloud Computing (Map Reduce/ Hadoop, Hive, Pig, Hbase, Sqoop, Flume, Spark AVRO, Zookeeper, Tableau, etc.), Hortonworks (HDP 2.2.4.2), for 4 clusters ranges from POC to PROD contains nearly 100 nodes.
  • Here i have installed 5 Hadoop clusters for different teams, we have developed a Data lake which serves as a Base layer to store and do analytics for Developers, we provide services to developers, Install their custom softwares, upgrade hadoop components, solve their issues, and help them troubleshooting their long running jobs, we are L3 and L4 support for the Datalake, and I also manage clusters for other teams.
  • Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.2. As an admin followed standard Back up policies to make sure the high availability of cluster.
  • Implement Flume, Spark, Spark Stream framework for real time data processing. Developed analytical components using Scala, Spark and Spark Stream. Implemented Proofs of Concept on Hadoop and Spark stack and different big data analytic tools, using Spark SQL as an alternative to Impala.
  • Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non secured cluster to secured cluster.
  • Responsible for upgrading Hortonworks Hadoop HDP2.2.0 and Mapreduce 2.0 with YARN in Multi Clustered Node environment.Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS.
  • Development projects Extensively on Hive, Spark, Pig, Sqoop and GemfireXD throughout the development Lifecycle until the projects went into Production.Created reporting views in Impala using Sentry Policy files.
  • Responsible for Handler configuration and handler ESB mappings. Also Involved in Integrating Hive with Mulesoft ESB to land data into applications running on Salesforce and vice versa.
  • Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink.Auto Populate Hbase tables with data coming from Kafka sink.
  • Building automation frameworks for data ingestion, processing in Python, Java, Javascript, and Scala with NoSQL and SQL databases and Chef, Puppet, Kibana, Elastic Search, Tableau, GoCD, Redhat infrastructure for data ingestion, processing, and storage.
  • Migrated services from a managed hosting environment to AWS including: service design, network layout, data migration, automation, monitoring, deployments and cutover, documentation, overall plan, cost analysis, and timeline.
  • Managing Amazon Web Services (AWS) infrastructure with automation and configuration management tools such as Chef, Ansible, Puppet, or custom-built .designing cloud-hosted solutions, specific AWS product suite experience.
  • Monitored multiple Hadoopclusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.Involved in exporting data from Hadoop to Greenplum using GPload utility.
  • Implemented dual data center set up for all Cassandra cluster.Performed many complex system analysis in order to improve ETL performance, identified high critical batch jobs to prioritize
  • Implemented Spark solution to enable real time reports from Cassandra data.Was also actively involved in designing column families for various Cassandra Clusters.

Environment RHEL, Ubuntu, Cloudera Manager, Cloudera Search, CDH4, HDFS, Hbase, Hive, Pig, ZooKeeper, Monitoring Cluster with automated scripts, Map Reduce2 (YARN), PostgreSQL, MySQL, QAS and Ganglia.

Confidential, CA

Hadoop Administrator

Job Responsibilities:

  • Involved in generating and applying rules to profile data for flat files and relational data by creating rules to case cleanse, parse, standardize data through mappings in IDQ and generated as Mapplets in PC. Working knowledge with Talend ETL tool to filter data based on end requirements.
  • Created Talend mappings for initial load and daily updates and also involved in the ETL migration jobs from Infromatica to Talend.
  • Extensively worked in Hadoop, spark cluster and streams processing using Spark Streaming. Experience in Spark, python interfaces to Spark. Write Scoop, Spark and Map Reduce scripts and workflows.
  • Day to day Interaction with different teams on resolving the Nosql/Hadoop/Elastic search issues, presentation on various products on DRF meeting. Optimized the full text search function by connecting MongoDB and ElasticSearch.
  • Utilize big-data technologies such as ElasticSearch, Riak, RabbitMQ, Couchbase, Redis, Docker, Mesos/Marathon, Jenkins, Puppet/Chef, Github, and much more.
  • Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and ZooKeeper. Involved in a POC to implement a failsafe distributed data storage and computation system using Apache YARN.
  • Managing Amazon Web Services (AWS) infrastructure with automation and configuration management tools such as Chef, Ansible, Puppet, or custom-built .designing cloud-hosted solutions, specific AWS product suite experience.
  • Designed a Solr (Cloudera Search) index pipeline using the Lily Indexer in both batch and service (near real-time) modes. The source of this index will be the MNsure Audit HBase environment.
  • Worked on NoSQL databases including HBase, MongoDB, and Cassandra. Developed Map Reduce (YARN) jobs for cleaning, accessing and validating the data.
  • Created User defined types to store specialized data structures in Cassandra.
  • Experience IaaS managing Amazon Web Services (AWS) infrastructure with automation and configuration management tools such as Chef, Ansible, Puppet, boto.
  • Migrating Informatica v6 artifacts to Informatica v9.x. Architect Informatica ETL solutions .Performed Data Source data analysis. Architect Informatica ETL solutions and Performed Change Data Capture load.
  • Used AWS (Amazon Web Services) Cloud computing EC2 for provisioning like new instance (VM) creation. Designed architecture based on Client’s requirements including Hadoop, HBase, Solr.
  • Used Apache Solr search engine server to help speed up the search of the transaction logs. Created an XML schema for the Solr search engine based on the Database schema.
  • Wrote a technical paper and created slideshow outlining the project and showing how Cassandra can be potentially used to improve performance.
  • Setting up monitoring tools Ganglis, Nagios for Hadoop monitoring and alerting. Monitoring and maintaining Hadoop cluster Hadoop/HBase/zookeeper using these tools Ganglia and Nagios.
  • Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
  • Monitored Hadoop cluster connectivity and security and also involved in management and monitoring Hadoop log files.
  • Used Puppet for creating scripts, deployment for servers, and managing changes through Puppet master server on its clients.

Environment: Hadoop, HDFS, Map Reduce, Shell Scripting,spark, Splunk, solr, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zoo keeper, Base, cluster health, monitoring security, Redhat Linux, impala, Cloudera Manager,Hortonworks.

Confidential, Herndon VA

Hadoop Administrator

Job Responsibilities:

  • Working in implementing Hadoop with the AWS EC2 system using a few instances in gathering and analyzing data log files. Developed Use cases and Technical prototyping for implementing PIG, HDP, HIVE and HBASE.
  • Working as a lead on Big Data Integration and Analytics based on Hadoop, SOLR and web methods technologies. Setting up and supporting Cassandra (1.2)/DataStax (3.2) for POC and prod environments using industry's best practices.
  • Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Tuned the Hadoop Clusters and Monitored for the memory management and for the Map Reduce jobs, to enable healthy operation of Map reduce jobs to push the data from SQL to Nosql store.
  • Analyzed the alternatives for NOSQL Data stores and intensive documentation for HBASE vs. Accumulo data stores.
  • Involved in setup, installation, configuration of OBIEE 11g in Linux operating system also integrating with the existing environment. Involved in trouble shooting of errors encountered. And worked with Oracle support to analyze the issue.
  • Communicate with developers using in-depth knowledge of Cassandra Data Modeling for converting some of the applications to use Cassandra instead of Oracle
  • Working on data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Working on united health done loading files to Hive and HDFS from MongoDB. Responsible for building scalable distributed data solutions using Datastax Cassandra.
  • Hands on experience installing, configuring, administering, debugging and troubleshooting Apache and Datastax Cassandra clusters.
  • Led the evaluation of Big Data software like Splunk, Hadoop for augmenting the warehouse, identified use cases and led Big Data Analytics solution development for Customer Insights and Customer Engagement teams.
  • Built, Stood up and delivered HADOOP cluster in Pseudo distributed Mode with Namenode, Secondary Name node, Job Tracker, and the Task tracker running successfully with Zookeeper installed, configured and Apache Accumulo ( NO SQL Google's Big table) is stood up in Single VM environment.
  • Developed Use cases and Technical prototyping for implementing PIG, HDP, HIVE and HBASE.
  • Working on Hive/Hbase vs RDBMS, imported data to Hive, HDP created tables, partitions, indexes, views, queries and reports for BI data analysis.
  • Hands on experience installing, configuring, administering, debugging and troubleshooting Apache and Datastax Cassandra clusters.
  • Used the Spark - Cassandra Connector to load data to and from Cassandra.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Worked on TOAD for Data Analysis, ETL/Informatica for data mapping and the data transformation between the source and the target database.

Environment: Hadoop, HDFS, Map Reduce, Impala,Splunk, Sqoop, HBase, Hive, Flume, Oozie, Zoo keeper, solr,Performance tuning, cluster health, monitoring security, Shell Scripting, NoSQL/HBase/Cassandra, Cloudera Manager.

Confidential

LINUX/UNIX administrator

Job Responsibilities:

  • Responsible for monitoring overall project and reporting status to stakeholders.
  • Developed project user guide documents which help in knowledge transfer to new testers and solution repository document which gives quick resolution of any issues occurred in the past thereby reducing the number of invalid defects.
  • Identify repeated issues in production by analyzing production tickets after each release and strengthen the system testing process to arrest those issues moving to production to enhance customer satisfaction
  • Designed and coordinated creation of Manual Test cases according to requirement and executed them to verify the functionality of the application.
  • Manually tested the various navigation steps and basic functionality of the Web based applications.
  • Experience interpreting physical database models and understanding relational database concepts such as indexes, primary and foreign keys, and constraints using Oracle.
  • Writing, optimizing, and troubleshooting dynamically created SQL within procedures
  • Creating database objects such as Tables, Indexes, Views, Sequences, Primary and Foreign keys, Constraints and Triggers.
  • Responsible for creating virtual environments for the rapid development.
  • Responsible for handling the tickets raised by the end users which includes installation of packages, login issues, access issues User management like adding, modifying, deleting, grouping
  • Responsible for preventive maintenance of the servers on monthly basis. Configuration of the RAID for the servers. Resource management using the Disk quotas.
  • Responsible for change management release scheduled by service providers.
  • Generating the weekly and monthly reports for the tickets that worked on and sending report to the management.
  • Managing Systems operations with final accountability for smooth installation, networking, and operation, troubleshooting of hardware and software in LINUX environment.
  • Identifying operational needs of various departments and developing customized software to enhance System's productivity.
  • Established/implemented firewall rules, Validated rules with vulnerability scanning tools.
  • Proactively detecting Computer Security violations, collecting evidence and presenting results to the management.
  • Accomplished System/e-mail authentication using LDAP enterprise Database.
  • Implemented a Database enabled Intranet web site using LINUX, Apache, MySQL Database backend.
  • Installed Cent OS using Pre-Execution environment boot and Kick-start method on multiple servers. Monitoring System Metrics and logs for any problems.
  • Running Cron-tab to back up Data. Applied Operating System updates, patches and configuration changes.

Environment: Windows 2008/2007 server, Unix Shell Scripting, SQL Manager Studio, Red Hat Linux, Microsoft SQL Server 2000/2005/2008, MS Access,NoSQL, Linux/Unix, Putty Connection Manager, Putty, SSH.

Confidential

LINUX/UNIX administrator

Job Responsibilities:

  • Day - to-day administration on Sun Solaris, RHEL 4/5 which includes Installation, upgrade & loading patch management & packages
  • Assist with overall technology strategy and operational standards for the UNIX domains.
  • Manage problem tickets and service request queues, responding to monitoring alerts, execution of change controls, routine & preventative maintenance, performance tuning and emergency troubleshooting & incident support
  • Performed day-to-day administration tasks like User Management, Space Monitoring, Performance Monitoring and Tuning, alert log monitoring and backup monitoring.
  • Provides accurate root cause analysis and comprehensive action plans.
  • Manage daily system administration cases using BMC Remedy Help Desk
  • Investigated, installed and configured software fail-over system for production Linux servers
  • Maintained the continuous integration environment, installed Azkaban jobs, supported Unix machines as system admin.
  • Monitor and maintain the disk space, backup systems and tape libraries and Implement change controls, capacity planning and growth projections on the systems.
  • Experience with Unix or Linux, including shell scripting
  • Planning and coordinating activities related to upgrades and maintenance on the systems.
  • Create status reports, project plans and attend team meetings to coordinate activities.

Environment: Linux/Unix, Sun Solaris, Red hat Linux, Unix Shell Scripting, Oracle10g, SQL Server 2005, XML, Windows 2000/NT/2003 Server, UNIX.

We'd love your feedback!