- I am an innovative and skilled Hadoop Administrator with a foundation in Linux and MySQL. More than just extensive with the latest Big Data technologies, I bring over three years of hands on experience with architecting, installing, and administering distributed data storage systems in very large and complex business intelligence ecosystems.
- 8+ Years of extensive IT experience with 3+ years of experience in Hadoop Administration & Big Data Technologies and 4+ years of experience into Linux administration and also good hands on experiences in following areas.
- Hands on experience with "Productionalizing" Hadoop applications (i.e. administration, configuration, management, monitoring, debugging, and performance tuning)
- Experience in software configuration, build, release, deployment and DevOps with Windows and UNIX based operating systems
- Installation, configuration, supporting and managing Hadoop Clusters using Hortonworks, Cloudera, MapR.
- Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Planning, Installing and Configuring Hadoop Cluster in Cloudera and Hortonworks Distributions.
- Excellent understanding of Hadoop architecture and underlying framework including storage management.
- Experience in building new OpenStack Deployment through Puppet and managing them in production environment.
- Have extensively worked on Pivotal HD (3.0) and Hortonworks (HDP 2.3), MapR, EMR and Cloudera (CDH5) distributions.
- Hands on experience in creating and upgrading Cassandra clusters
- Experience in using various Hadoop infrastructures such as MapReduce, Pig, Hive, Zookeeper, HBase, Sqoop, YARN 2.0, Scala, Spark, Kafka, Strom, Impala, Oozie, and Flume for data storage and analysis.
- Experience with Oozie Scheduler in setting up workflow jobs with MapReduce and Pig jobs.
- Knowledge on architecture and functionality of NOSQL DB like HBase, Cassandra and MongoDB.
- Extending HIVE and PIG core functionality by using custom UDF’s.
- Experience in troubleshooting errors in HBase Shell/API, Pig, Hive, Sqoop, Flume, Spark and MapReduce.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice - versa.
- Collected logs of data from various sources and integrated into HDFS Using Flume.
- Experienced in running MapReduce and Spark jobs over YARN.
- Good understanding of HDFS Designs, Daemons and HDFS high availability (HA).
- Good experience in Big data analytics tools like Tableau and Trifacta.
- Setting up the Linux environments, Password less SSH, creating file systems, disabling firewalls, swappiness, Selinux and installing Java.
- Provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS)-EC2 and on private cloud infrastructure - Open Stack cloud platform.
- Implementing a Continuous Integrations and Continuous Delivery framework using Jenkins, Puppet, and Maven & Nexus in Linux environment. Integration of Maven/Nexus, Jenkins, Urban Code Deploy with Patterns/Release, Git, Confluence, Jira and Cloud Foundry.
- Hands on experience in Zookeeper in managing and configuring in NameNode failure scenarios.
- Worked on Hadoop Security with MIT Kerberos, Ranger with LDAP.
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain.
- Extensive experience in data analysis using tools like Sync sort and HZ along with Shell Scripting and UNIX.
- Experience with writing Oozie workflows and Job Controllers for job automation.
Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, HCatalog, Phoenix, Falcon, Scoop, Flume, Zookeeper, Mahout, Oozie, Avro, HBase, MapReduce, HDFS, Storm, CDH 5.3, CDH 5.4
Scripting Languages: Shell Scripting, Puppet, Scripting, Python, Bash, CSH, Ruby, PHP
Databases: Oracle 11g, MySQL, MS SQL Server, Hbase, Cassandra, MongoDB
Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP, SMTP
Monitoring Tools: Cloudera Manager, Solr, Ambari, Nagios, Ganglia
Application Servers: Apache Tomcat, Weblogic Server, WebSphere
Reporting Tools: Kerberos Cognos, Hyperion Analyzer, OBIEE & BI+
Sr. Hadoop DevOps Engineer/Administrator
Confidential, Atlanta, GA
- Installed and configured Hadoop on YARN and other ecosystem components.
- Configured and used HCatalog to access the table data maintained in the Hive metastore and use the same table information for processing in Pig .
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Cluster Planning, Manage and review data backups, Manage & review log files
- Worked with the Data Science team to gather requirements for various data mining projects.
- Here I have installed 5 Hadoop clusters for different teams, we have developed a Data lake which serves as a Base layer to store and do analytics for Developers, we provide services to developers, install their custom software’s, upgrade Hadoop components, solve their issues, and help them troubleshooting their long running jobs, we are L3 and L4 support for the Data lake, and I also manage clusters for other teams.
- Building automation frameworks for data ingestion, processing in Python, and Scala with NoSQL and SQL databases and Chef, Puppet, Kibana, Elastic Search, Tableau, GoCd, RedHat infrastructure for data ingestion, processing, and storage.
- I’m a mix of DevOps and Hadoop admin here, and work on L3 issues and installing new components as the requirements comes and did as much automation and implemented CI /CD Model.
- Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non secured cluster to secured cluster.
- Responsible for upgrading Hortonworks Hadoop HDP 2.4.2 and MapReduce 2.0 with YARN in Multi Clustered Node environment. Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS.
- Hadoop security setup using MIT Kerberos, AD integration (LDAP) and Sentry authorization.
- Migrated services from a managed hosting environment to AWS including: service design, network layout, data migration, automation, monitoring, deployments and cutover, documentation, overall plan, cost analysis, and timeline.
- Managing Amazon Web Services (AWS) infrastructure with automation and configuration management tools such as Chef, Ansible, Puppet, or custom-built designing cloud-hosted solutions, specific AWS product suite experience.
- Configured Zookeeper to implement node coordination, in clustering support.
- Configured Kafka for efficiently collecting, aggregating and moving large amounts of click stream data from many different sources to MaprFS.
- Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.4.2.
- As an admin followed standard Back up policies to make sure the high availability of cluster.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari. Installed and configured Hortonworks and Cloudera distributions on single node clusters for POCs.
- Used Trifacta for data cleansing and data wrangling.
- Implementing a Continuous Delivery framework using Jenkins, Puppet, and Maven & Nexus in Linux environment. Integration of Maven/Nexus, Jenkins, Urban Code Deploy with Patterns/Release, Git, Confluence, Jira and Cloud Foundry.
- Involved in running Hadoop jobs for processing millions of records of text data. Troubleshoot the build issue during the Jenkins build process. Implement Docker to create containers for Tomcat Servers, Jenkins.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- I have used Service now and JIIRA to track issues, Mostly Managing and reviewing Log files as a part of administration for troubleshooting purposes, meeting the SLA’s on time.
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, HBase, ANT and Maven, Chef, Puppet, DevOps, Jenkins, Clear case.
Confidential, San Jose, CA
- Worked on a live Big Data Hadoop production environment with 200 nodes.
- Configured, installed, monitored MapR Hadoop on 10 AWS EC2 instances and configured MapR on Amazon EMR making AWS S3 as default file system for the cluster.
- Developed Use cases and Technical prototyping for implementing PIG, HDP, HIVE and HBASE.
- Analyzed the alternatives for NOSQL Data stores and intensive documentation for HBASE vs. Accumulo data stores.
- Communicate with developers using in-depth knowledge of Cassandra Data Modeling for converting some of the applications to use Cassandra instead of Oracle. Responsible for design and development of Big Data applications using Hortonworks Hadoop.
- Modular wise Data integrity and Data Validation practices.
- Working on data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Maintaining and troubleshooting Hadoop core and ecosystem components (HDFS, Map/Reduce, Name node, Data node, Job tracker, Task tracker, Zookeeper, YARN, Oozie, Hive, Hue, Flume, HBase, and Fair Scheduler). Hands on experience installing, configuring, administering, debugging and troubleshooting Apache and DataStax Cassandra clusters.
- Led the evaluation of Big Data software like Splunk, Hadoop for augmenting the warehouse, identified use cases and led Big Data Analytics solution development for Customer Insights and Customer Engagement teams.
- Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Worked on Identifying and eliminating duplicates in datasets thorough IDQ 8.6.1 components.
- Tuned the Hadoop Clusters and Monitored for the memory management and for the Map Reduce jobs, to enable healthy operation of Map reduce jobs to push the data from SQL to NoSQL store.
- Configuring Security with Active Directory onto Hadoop using Kerberos, perimeter defense with Knox, and granular access auditing with Ranger.
- Successfully perform various data migration projects from Oracle to NoSQL databases and consulting projects Confidential customer sites using my own Big Data migration products like Big Data Pumper, MongoDB Pumper, Couch base Pumper, NoSQL Viewer.
- Assembled Puppet Master, Agent and Database servers on Red Hat Enterprise Linux Platforms.
- Built, stood up and delivered Hadoop cluster in Pseudo distributed Mode with NameNode, Secondary Name node, Job Tracker, and the Task tracker running successfully with Zookeeper installed, configured and Apache Accumulo (NO SQL Google's Big table) is stood up in Single VM environment.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Addressed Data Quality Using Informatica Data Quality (IDQ) tool.
Environment: Hadoop, Map Reduce, HDFS, Pig, GIT, JENKINS, Puppet, Chef, Maven Spark, Yarn, HBase, CDH 5.4, Oozie, MapR, NoSQL, ETL, MYSQL, agile, Windows, UNIX Shell Scripting, Teradata.
Confidential, Pittsburgh, PA
- Worked on Distributed/Cloud Computing for clusters ranges from POC to PROD.
- Installed and configured Hadoop cluster across various environments through Cloudera Manager.
- Integrated external components like Informatica BDE, Tibco and Tableau with Hadoop using Hive server2.
- Processed Multiple Data sources input to same Reducer using Generic Writable and Multi Input format.
- Created Data Pipeline of Map Reduce programs using Chained Mappers.
- Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink.
- Visualize the HDFS data to customer using BI tool with the help of HIVE ODBC Driver.
- Familiarity with a NoSQL database such as MongoDB.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Worked big data processing of clinical and non-clinical data using MapR.
- Implemented complex map reduce programs to perform joins on the Map side using Distributed Cache in Java.
- Responsible for importing log files from various sources into HDFS using Flume.
- Created customized BI tool for manager team that perform Query analytics using HiveQL.
- Used Hive and Pig to generate BI reports.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Created Hive Generic UDF's to process business logic that varies based on policy.
- Moved Relational Data base data using Sqoop into Hive Dynamic partition tables using staging tables.
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
- Log data Stored in HBase DB is processed and analyzed and then imported into Hive warehouse, which enabled end business analysts to write HQL queries.
- Experienced with different kind of compression techniques like LZO, GZIP, and Snappy.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.
- Experienced in Monitoring Cluster using Cloudera manager.
Environment: Hadoop, HDFS, HBase, MongoDB, MapReduce, Java, Hive, Pig, Sqoop, Flume, Oozie, Hue, SQL, ETL, Cloudera Manager, MySQL.
- Installation, Upgradation and configuration of RedHat Linux and IBM AIX OS IBM Blade servers and P-Series Servers using Kickstart, NIM and CD media.
- Installing and configuration and maintenance of Veritas Net Backup 6.0/5. x.
- Installation and customization of Window 2003 servers.
- Configuration and Administration of NFS, NIS, NIS+, LDAP, DNS, Samba and Send mail Servers.
- Working knowledge of VMware (Virtualization).
- Upgrading VMware server 2.x to 3.x.
- Installed RPM packages and LPP on Linux Servers and IBM P-Series AIX Servers.
- Oracle installation & system level support to clients.
- Installed and configured the iPlanet (Sun One) Web servers & setup firewall filtering with Squid Proxy server for web caching on Sun Solaris.
- Written shell scripts to automate the administrative tasks using Cron and Confidential in AIX and Linux.
- Performance monitoring using Sar, Iostat, VMstat and MPstat on AIX servers.
- Created tables, views, Types, triggers, complex join queries, stored procedures, and functions and modifications to existing database structure as required for addition of new features using SQL developer.
- Developed various UML diagrams like use cases, class diagrams, sequence and activity diagrams.
- Extensively used Quartz scheduler to schedule the automated jobs and Created POC for running batch jobs.
- Used JIRA for bug tracking, issue tracking and project management.
- Wrote GWT code to create presentation layer using GWT widgets and event handlers.
- Used SVN, CVS, and CLEARCASE as a version control tools.
- Automate build process by writing ANT build scripts.
- Experience in building the war with help the putty and deployed into cloud environment using the cloud controller and Experience in solving the cloud issue.
- Configured and customized logs using Log4J.
- Involved in doing AGILE (SCRUM) practices and planning of sprint attending daily AGILE (SCRUM) meetings and SPRINT retrospective meetings to produce quality deliverables within time.
- Worked in Agile Scrum environment and used Kanban board to track progress.
Environment: RedHat Linux AS3.0, AS4.0, VXFS, IBM P Series AIX servers, Veritas Volume Manager, Veritas Net backup
- Wrote command line utility used to issue commands across hundreds of host in parallel, similar to push but through serial console.
- Wrote auditing script used to populate and maintain system, which documented addresses for 4500+ hosts.
- Wrote many small utilities to automate a variety of tasks from pulling data out of admin portal, to gathering data from switches, to rebooting systems via ipmitool.
- Identify repeated issues in production by analyzing production tickets after each release and strengthen the system testing process to arrest those issues moving to production to enhance customer satisfaction
- Maintain, document and adhere to strict change control procedures for the automated management of RedHat RHEL, SuSE Linux and Sun Solaris Unix server environments.
- Develop and maintain monitoring and automation framework built around CFEngine, Shell and Perl.
- Manage SAN (HP 3Par, EMC) and NAS (Netapp) storage technologies.
- Manage Veritas VxFS file systems and VCS cluster environment for operating high availability Oracle databases. Writing, optimizing, and troubleshooting dynamically created SQL within procedures
- Creating database objects such as Tables, Indexes, Views, Sequences, Primary and Foreign keys, Constraints and Triggers.
- Responsible for creating virtual environments for the rapid development.
- Responsible for handling the tickets raised by the end users which includes installation of packages, login issues, access issues User management like adding, modifying, deleting, grouping
- Responsible for preventive maintenance of the servers on monthly basis. Configuration of the RAID for the servers. Resource management using the Disk quotas.
- Responsible for change management release scheduled by service providers.
- Generating the weekly and monthly reports for the tickets that worked on and sending report to the management.
- Managing Systems operations with final accountability for smooth installation, networking, and operation, troubleshooting of hardware and software in LINUX environment.
- Identifying operational needs of various departments and developing customized software to enhance System's productivity.
- Established/implemented firewall rules, Validated rules with vulnerability scanning tools.
- Proactively detecting Computer Security violations, collecting evidence and presenting results to the management.
- Accomplished System/e-mail authentication using LDAP enterprise Database.
- Implemented a Database enabled Intranet web site using LINUX, Apache, MySQL Database backend.
- Installed Cent OS using Pre-Execution environment boot and Kick-start method on multiple servers. Monitoring System Metrics and logs for any problems.
- Running Cron-tab to back up Data. Applied Operating System updates, patches and configuration changes.
- Maintaining the MySQL server and Authentication to required users for Databases. Appropriately documented various Administrative & technical issues.
Environment: Red Hat Enterprise Linux, Ubuntu, Centos, Sun Solaris 8,9,10, VERITAS Cluster Server, Veritas Volume Manager, SLURM, Oracle 11G, HP UX, HP Blade, IBM AIX, HP ProLiant DL 385, 585 Weblogic, Oracle RAC/ASM, MS Windows 2008 server.