We provide IT Staff Augmentation Services!

Sr. Hadoop Admininstrator Resume

3.00/5 (Submit Your Rating)

San Ramon, CA

SUMMARY

  • Over 6+ years of comprehensive IT experience in Big Data domain with tools like Hadoop, Hive and other open source tools/technologies in Banking, Healthcare, Insurance, and Energy.
  • Around 3 years of experience in Hadoop development which include Map reduce, Hive, Oozie, Scoop, Hbase, Pig, HDFS, Yarn, SAS interface configuration projects in direct client facing roles.
  • Hands on experience for 2 years in building scalable distributed data solutions using Hadoop ecosystem.
  • Involved in all phases of Software Development Life Cycle (SDLC) and Worked on all activities related to the development, implementation, administration and support for Hadoop.
  • Experience in understanding Big Data business requirements and providing then Hadoop based solutions.
  • Worked Extensively on OBIEE Administration Tool, OBIEE Presentation Services, Answers, Interactive Dashboards and BI Publisher.
  • Experience in designing, installing and configuring complete Hadoopecosystem (components such as HDFS, Map reduce, pig, hive, oozie, flume, zookeeper).
  • Hands on experience on major components in Hadoop Ecosystem likeMap Reduce, HDFS, HIVE, PIG, Hbase, Zookeeper, Sqoop, Oozie, Cassandra and Flume.Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
  • Experience in developing and scheduling ETL workflows inHadoop using Oozie. Also have substential experience writing MapReduce jobs in Java, Pig, Flume, Zookeeper and Hive and Storm.
  • Experience in deploying and managing Hadoop cluster using Cloudera Manager, Pivotal Command Center and Hortonworks Ambari.
  • Good understanding and experience in tools like puppet to automate Hadoopinstallation, configuration and monitoring.
  • Hands - on experience with Productionalizing Hadoop applications such as administration, configuration management, monitoring, debugging, and performance tuning.
  • Hands on experience in installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, Hbase, AVRO, Zookeeper, Oozie, Hive, HDP, Cassandra, Sqoop, Pig, Flume, HAWQ, GemfireXD.
  • Capacity planning and performance tuning of Hadoop clusters. Extensive experience in SQL and NoSQL development.
  • Experience in deploying applications in heterogeneous Application Servers TOMCAT, Weblogic and Oracle Application. Server. Worked on Multi Clustered environment and setting up Cloudera Hadoop, Hortonworks HDP, Pivotal PHD and Apache Hadoop.
  • Background with traditional databases such as Oracle, Teradata, Netezza, SQL Server, ETL tools / processes and data warehousing architectures.
  • Proficient using ERwin to design backend data models and entity relationship diagrams (ERDs) for star schemas, snowflake dimensions and fact tables.
  • Excellent conceptual knowledge of NoSQL databases such as HBase, MongoDB, Cassandra.
  • Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.
  • Expertise in J2EE Technology - Servlets, JSP, EJB, RMI, JDBC, JNDI, Java
  • Expertise in developing GUI (Graphical User Interfaces) using JAVA Swings, JSF
  • Experience in web-based languages such as HTML, CSS, PHP, XML and other web methodologies including Web Services and SOAP.
  • Worked on ServiceNOW and Pagerduty for real time issues, tracking them and solving and also worked on JIIRA for changes in the cluster, and also active member for Change Control Board for several projects.

TECHNICAL SKILLS

  • HDFS
  • Hive
  • Map Reduce
  • Pig
  • Scoop
  • Flume
  • Zookeeper
  • Oozie
  • Avro
  • HBase
  • MapReduce
  • Splunk
  • HDFS
  • Storm
  • ETL Tool (Informatica)
  • Java
  • J2EE
  • JSP
  • Servlets
  • Struts
  • Hibernate
  • Spring
  • Teradata
  • MySql
  • NoSql
  • Oracle 11i/10g/9i.

PROFESSIONAL EXPERIENCE

Sr. Hadoop Admininstrator

Confidential - San Ramon, CA

RESPONSIBILITIES:

  • Setup, Configured and worked on Distributed/Cloud Computing (Map Reduce/ Hadoop, Hive, Pig, Hbase, Sqoop, Flume, Spark AVRO, Zookeeper, Tableau, etc.), Hortonworks (HDP 2.2.4.2), PIVOTAL HD 3.0 distribution for 5 clusters ranges from POC to PROD contains nearly 200 nodes.
  • Developed POC’s on Amazon Web Services (S3, EC2, EMR, etc.), Performance Tuning and ETL, Agile Software Development, Team Building & Leadership, Engineering Management.
  • Worked with application teams to install operating system,Hadoop updates, patches, version upgrades as required.
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Created and maintained Technical documentation for launchingHadoop Clusters and for executing Hive queries and Pig Scripts
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Cluster Planning, Manage and review data backups, Manage & review log files.
  • Development projects Extensively on Hive, Spark, Pig, Sqoop and GemfireXD through out the development Lifecycle until the projects went into Production.
  • Experienced on adding/installation of new components and removal of them through ambari on HDP and manually on EC2 clusters.
  • Setting up and supporting Cassandra(1.2)/DataStax (3.2) and GREENPLUM DB for POC and prod environments using industry's best practices.
  • Working as a lead on Big Data Integration and Analytics based on Hadoop, SOLR and BI technologies.
  • Monitoring systems and services through ambari dashboard to make the clusters available for the business.
  • Architecture design and implementation of deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Hand on experience on cluster up gradation and patch upgrade without any data loss and with proper backup plans.
  • Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster.
  • Good troubleshooting skills on over all Hadoop stack components, ETL services and Hue, which provides GUI for developers/business users for day-to-day activities.
  • Setup flume for different sources to bring the log messages from outside to Hadoop hdfs.
  • Implemented NameNode HA in all environments to provide high availability of clusters.
  • Create queues and allocated the clusters resources to provide the priority for jobs.
  • Experienced in Setting up the project and volume setups for the new projects.
  • Involved in snapshots and mirroring to maintain the backup of cluster data and even remotely.
  • Implementing the SFTP for the projects to transfer data from External servers to servers. Experienced in managing and reviewing log files.
  • Involved in scheduling Oozie workflow engine to run multiple Hive, sqoop and pig jobs.
  • Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases with cron jobs.
  • Setting up MySQL master and slave replications, Postgres and helping business applications to maintain their data.
  • Managed and reviewed Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
  • As an admin followed standard Back up policies to make sure the high availability of cluster.

Hadoop Consultant

Confidential, Pleasanton, CA

RESPONSIBILITIES:

  • Setting up and supporting Hadoop Clusters, Cassandra(1.2)/DataStax (3.2) for POC and prod environments using industry's best practices.
  • Configured several multi-node Cloudera Hadoop clusters.
  • Working as a lead on Big Data Integration and Analytics based on Hadoop, SOLR and web methods technologies.
  • Working in implementing Hadoop with the AWS EC2 system using a few instances in gathering and analyzing data log files. Developed technical solutions to business problems in the OBIEE repository, and OBIA.
  • Involved in setup, installation, configuration of OBIEE 11g in Linux operating system also integrating with the existing environment. Involved in trouble shooting of errors encountered. And worked with Oracle support to analyze the issue.
  • Communicate with developers using in-depth knowledge of Cassandra Data Modeling for converting some of the applications to use Cassandra instead of Oracle
  • Working on data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Working on united health done loading files to Hive and HDFS from MongoDB. Responsible for building scalable distributed data solutions using Datastax Cassandra.
  • Hands on experience installing, configuring, administering, debugging and troubleshooting Apache Hadoop, Cloudera Hadoop, Hortonworks and Datastax Cassandra clusters.
  • Led the evaluation of Big Data software like Splunk, Hadoop for augmenting the warehouse, identified use cases and led Big Data Analytics solution development for Customer Insights and Customer Engagement teams.
  • Used the Spark - Cassandra Connector to load data to and from Cassandra.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Worked on TOAD for Data Analysis, ETL/Informatica for data mapping and the data transformation between the source and the target database.
  • Working on Hive/Hbase vs RDBMS, imported data to Hive, HDP created tables, partitions, indexes, views, queries and reports for BI data analysis.
  • Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Tuned the Hadoop Clusters and Monitored for the memory management and for the Map Reduce jobs, to enable healthy operation of Mapreduce jobs to push the data from SQL to Nosql store.
  • Built, Stood up and delivered HADOOP cluster in Pseudo distributed Mode with Namenode, Secondary Namenode, Job Tracker, and the Task tracker running successfully with Zookeeper installed, configured and Apache Accumulo ( NO SQL Google's Big table) is stood up in Single VM environment.
  • Experience in the Extraction, Transformation and Loading of data from multiple sources into Data Warehousing using Informatica Power Center, OLTP, and DSS.
  • Delivered Working Widget Software using EXTJS4, HTML5, RESTFUL Web services, JSON Store, Linux, HADOOP, ZOOKEEPER, NO SQL databases, JAVA, SPRING Security, JBOSS Application Server for Big Data analytics.
  • Working on use cases, data requirements and business value for implementing a Big Data Analytics platform.
  • Working on configuring and Maintaining Hadoop environment on AWS.
  • Working on Modifying Chef Recipes used to configure the Hadoop stack.
  • Working on Installing and configuring Hive, HDP, Pig, Sqoop, Flume, Storm and Oozie on the Hadoop cluster.
  • Working on transformation processes using ETL tools like INFORMATICA POWERCENTER 8.x/9.0/9.1/9.5
  • Developed Use cases and Technical prototyping for implementing PIG, HDP, HIVE and HBASE.
  • Analyzed the alternatives for NOSQL Data stores and intensive documentation for HBASE vs. Accumulo data stores.

Hadoop Consultant

Confidential - San Mateo, CA

RESPONSIBILITIES:

  • Designed, planned and delivered proof of concept and business function/division based implementation of Big Data roadmap and strategy project (Apache Hadoop stack with Tableau) in UnitedHealthcare using Hadoop.
  • Developed MapReduce jobs in java for data cleaning and preprocessing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Used Bash Shell Scripting, Sqoop, AVRO, Hive, HDP, Redshift, Pig and Java Mapreduce daily to develop ETL, batch processing, and data storage functionality.
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Worked on NoSQL databases including Hbase and MongoDB.
  • Worked on data for classification of different Health Care Boards using Mahout.
  • Used data stores included Accumulo/Hadoop and graph database.
  • Exploited Hadoop MySQL - Connector to store Mapreduce results in RDBMS.
  • Worked on Business Intelligence (BI)/Data Analytics, Data Visualization, Big Data with Hadoop and Cloudera based projects, SAS/R, Data warehouse Architecture Design and MDM/Data Governance.
  • Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and ZooKeeper.
  • Worked on deployed technologies for exclusively off-site using the Amazon infrastructure and ecosystem (EMR, Redshift, Hive, DynamoDB)
  • Worked on loading all tables from the reference source database schema through sqoop.
  • Developed scalable Big Data Architecture which process the Terabytes of semi-structure data to extract business insights.
  • Collected data from different databases( i.e. Teradata, Oracle, MySql) to Hadoop
  • Used oozie and Zookeeper for workflow scheduling and monitoring.
  • Worked on Designing and Developing ETL Workflows using Java for processing data in HDFS/Hbase using Oozie.
  • Queried and analyzed data from Datastax Cassandra for quick searching, sorting and grouping.
  • Created User defined types to store specialized data structures in Cassandra.
  • Experienced in managing and reviewing Hadoop log files.
  • Responsible for coding of .net data ingestion tool for solr 4.1 and integration/adaptation of current IIS/.net/Microsoft solution into solr exclusively.
  • Created design approach to lift and shift the existing mappings to Netezza.
  • Conduct vulnerability analyses; reviewing, analyzing and correlating threat data from available sources such as Splunk.
  • Working on extracting files from MongoDB through Sqoop and placed in HDFS and processed.
  • Worked with Different file formats (AVRO, RC file Format )
  • Worked on Data Architecture, Data Modelling, ETL, Data Migration, Performance tuning and optimization.
  • Worked on Hadoop installation & configuration of multiple nodes on AWS EC2 system.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Supported Mapreduce Programs those are running on the cluster.Jobs management using Fair scheduler.
  • Involved in loading data from UNIX file system to HDFS. Installed and configured Hive and also written Hive UDFs.
  • Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in mapreduce way.
  • Worked on setting up Pig, Hive, Redshift and Hbase on multiple nodes and developed using Pig, Hive, Hbase, MapReduce and Storm.
  • Data scrubbing and processing with Oozie.

Software Developer

Confidential

RESPONSIBILITIES:

  • Solid Understanding of Hadoop HDFS, Mapreduce and other Ecosystem Projects
  • Installation and Configuration of Hadoop Cluster
  • Working with Cloudera Support Team to Fine tune Cluster
  • Extensive experience in Hadoop mapreduce as Programmer Analyst in business requirement gathering, analysis, scoping, and documentation, designing, developing and creating Test Cases.
  • Installation and Configuration of other Open Source Software like Pig, Hive, HBASE, Flume and Sqoop
  • Integration with RDBMS using sqoop and JDBC Connectors
  • Working with Dev Team to tune Job Knowledge of Writing Hive Jobs
  • Developed customer transaction event path tree extraction model using Hive from customer transaction data.
  • Enhanced and optimized the customer path tree GUI viewer to incrementally load the tree data from HBase
  • Design and implement Map/Reduce jobs to support distributed data processing.
  • Process large data sets utilizing our Hadoop cluster.
  • Developing mapreduce ETL in Java/Pig.
  • Good experience in writing data ingesters and complex MapReduce jobs in java for data cleaning and preprocessing and fine tuning them as per data sets
  • Extensive data validation using HIVE and also written Hive UDFs
  • Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.

Software Engineer

Confidential

RESPONSIBILITIES:

  • Involved in various phases of Software Development Life Cycle (SDLC) as requirement gathering, data modeling, analysis, architecture design and development of project.
  • Used struts 2.0 framework in the application which is based on MVC design pattern and developed several Action classes which extends Actionsupport class to validate input parameters.
  • Developed Login Interceptor for the project and modified struts.xml file for the application.
  • Developed business logic activities by developing business service classes.
  • Developed user interface with JSP, JSTL, and Struts Tag Libraries to populate model objects from value stack in view pages.
  • Designed UI using JSP and HTML and validated with JavaScript for providing the user interface and communication between client and server.
  • Used spring framework for dependency Injection, security features and integrated with the hibernate framework.
  • Configured Sessionfactory class in the Spring Context.xml to integrate Hibernate with Spring Bean Container.
  • Extensively worked on Spring Bean Wiring and Spring AOP module to achieve loose coupling between different layers of application.
  • Used join-point, advice, pointcuts, and aspects in Spring AOP.
  • Programmed the Data Access Layer using DAO pattern, with Hibernate being used for Data Access.
  • Created all hibernate domain classes to map to the database and handled all CRUD operations.
  • Created named queries for the module.
  • Involved in coding Oracle Stored Procedures and functions.
  • Used JAXB API to bind XML schema to java classes.
  • Worked on XML schema and SOAP message.
  • Used LOG4J for tracking errors and debugging the code.
  • Used ANT scripts to build the application and deployed in Weblogic application server.
  • Developed Unit Test Cases and test the Interfaces.
  • Involved in Modules Testing and Integration Testing.
  • MVC implementation using Struts framework.
  • Involved in Unit Testing of Various Modules based on the Test Cases.
  • Involved in Bug fixing of various modules that were raised by the Testing teams in the application during the
  • Involved in deployment of application on IBM Websphere Application Server.

We'd love your feedback!