Hadoop Cloudera Administrator Resume
Oak Brook, IL
SUMMARY
- Around 8+ Years of IT Industry experience in Linux Administration and Cloud Computing.
- Expertise in configuration Management tools like chef and puppet, CI/CD with Jenkins.
- Extensively worked with Version Control Systems CVS, SVN (Subversion), GIT, Perforce, and IBM Rational Team Concert.
- Experience in working on cloud AWS cloud EC2, S3, RDS, Load Balancer, Auto Scaling with AWS command line interface and AWS python SDK.
- Worked at optimizing volumes and EC2 instances and created multiple VPC instances.
- Experience in working with IAM in order to create new accounts, roles and groups.
- Setting up scalability for application servers using command line interface for Setting up and administering DNS system in AWS using Route53 Managing users and groups using amazon identity and access management (IAM).
- Migrated VMWARE VMs to AWS and Managed Services like EC2, S3 Bucket, Route53, ELB, EBS.
- Ability to write scripts in Bash, shell, Perl, Ruby and Python scripting languages.
- Strong proficiency in supporting Production Cloud environments (AWS, Azure, VMWare) as well as traditional managed hosted environments.
- Extensively worked on Hudson, Jenkins and Team City for continuous integration and for End to End automation for all build and deployments.
- Worked with project management tools - Fisheye, Crucible, IBM Clear Quest.
- Created puppet manifests and modules to automate system operations.
- Conceived, designed, installed and implemented Puppet configuration management system.
- Expertise in Querying RDBMS such as Oracle, SQL Server using SQL, PL/SQL for data integrity.
- Efficient in working closely with core product teams to ensure high quality and timely delivery of builds.
- Experience in configuring and administering apache ZooKeeper.
- Used ZooKeeper for communicating data between servers by publishing information.
- Excellent knowledge in ZooKeeper usage in distributed-memory computation.
- Created and configured new JIRA projects and worked with departments to maintain existing JIRA projects.
- Experience in using bug tracking systems like JIRA, Remedy and HP Quality Center.
- Proficient in deploying applications that uses MySQL or similar RDBMS.
- Expert in Chef/Puppet as Configuration management tool, to automate the repetitive tasks, quickly deploy critical applications, and enthusiastically managed the changes.
- Worked with development engineers to ensure automated test efforts are tightly integrated with the build system and in fixing the errors while doing the deployment and building.
- Ability in deploying the developed code in a WebSphere/WebLogic/Apache Tomcat/JBOSS, IIS7.
- Experience in using weblogic, Admin, weblogic, Deployer and weblogic server commands.
- Support for implementing redundant monitoring hosts using Nagios.
- Analysis, design, development, enhancement, testing and maintenance of LDAP applications.
- Extensively used build utilities like MAVEN, ANT for building of jar, war and ear files.
- Defined AWS Security Groups which acted as virtual firewalls that controlled the traffic allowed to reach one or more AWS EC2 instances.
- Experience with IBM Rational Suite as an Administrator of the suite on a Linux environment.
- Created Manifest files and modules in puppet.
- Created Linux Slaves using groovy scripts.
TECHNICAL SKILLS
Big Data Tools: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie, Kafka, Hortonworks, Ambari, Knox, Phoniex, Impala, Storm.
Hadoop Distribution: Cloudera Distribution of Hadoop (CDH).
Operating Systems: UNIX, Linux, Windows XP, Windows Vista, Windows 2003 Server
Servers: Web logic server, WebSphere and Jboss.
Programming Languages: Java, Pl SQL, Shell Script, Perl, Python.
Tools: Interwoven Teamsite, GMS, BMC Remedy, Eclipse, Toad, SQL Server Management Studio, Jenkins, GitHub,Ranger Test NG, Junit,NiFi.
Database: MySQL, NoSQL, Couchbase, InfluxDB, Teradata, HBase, MongoDB, Cassandra, Oracle.
PROFESSIONAL EXPERIENCE
Hadoop Cloudera Administrator
Confidential, Oak Brook, IL
Responsibilities:
- Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages. Providing reports to management on Cluster Usage Metrics and Charge Back customers on their Usage.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, Cassandra and slots configuration.
- Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop.
- Installed and managed Hadoop production cluster with 50+ nodes with storage capacity of 10PB with HDP distribution using 1.7 Ambari and 2.1.3 HDP.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Performed on cluster up-gradation in Hadoop from HDP 2.1 to HDP 2.3.
- Architecture and designed Hadoop 30 nodes Innovation Cluster with Nagios, Ganglia, SQRRL, SPARK, chef, Puppet, HDP 2.2.4.
- Managed log files, backups and capacity.
- Responsible fro providing Support to Hadoop developers and assisting in optimization of map reduce jobs, Pig Latin scripts, Hive Scripts and HBase ingest required.
- Maintained and administrated HDFS through Hadoop - Java API, shell scripting, Python.
- Created Hive tables, loaded with data and wrote hive queries which will run internally in MapReduce way.
- Exported data to Teradata using sqoop data is stored in Vertica database table and Spark was used to load the data from Vertica table in to Data.
- Monitored all MapReduce Read Jobs running on the cluster using Cloudera Manager and ensured that they were able to read the data to HDFS without any issues.
- Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing Hive queries to further analyze the logs to identify issues and behavioral patterns.
- Prepared Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
- Trouble shot many cloud related issues such as Data Node down, Network failure and data block missing.
- Extracted the data from Teradata into HDFS using the Sqoop.
- Hands on experience in installation, configuration, management and development of big data solutions using Hortonworks distributions.
- Define support plan for all Hadoop environments and supporting technologies - including resourcing needs, communication plan, onshore/offshore hand-offs and incident management.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
- Perform Knowledge Transfer for offshore Hadoop support roles including documentation on environments, monitoring requirements, access & communication process.
- Setup monthly cadence with Hortonworks to review upcoming releases and technologies and review issues or needs.
- Implemented installation and configuration of multi-node cluster on the cloud using Amazon Web Services (AWS) on EC2.
- Loaded the aggregated data onto the oracle from Hadoop environment using Sqoop for reporting on the dashboard.
Environment: Hadoop, HDFS, Hive, Pig, Hbase, Sqoop, Python scripting, UNIX Shell Scripting, Nagios, Kerberos, Ganglia, Tidal, Tableau, Informatica, CDH 5.7.6, Cloudera Manager, Yarn, Flume, Zookeeper, Red hat/Centos 6.5.JDK1.6, CDH4.x, CDH5.x, MapReduce, Scala, Oracle, Python, Cassandra, Yarn, AWS, S3, AWS Redshift, CentOS, Horton Works UNIX/Linux.
Hadoop Consultant
Confidential - St. Louis, MO.
Responsibilities:
- Design and document CI/CD tools configuration management.
- Responsible for CI/CD processes by responding to Git triggers, human input, and dependency chains and environment setup.
- Created and maintained documentation of build and release processes and application configuration to comply with audit requirements and Industry best practices.
- Configuring, automation and maintaining build and deployment CI/CD tools GitLab, Jenkins (Local/POC/NON-PROD/PROD) with high degrees of standardization for both infrastructure and application stack automation in AWS cloud platform.
- Involved in Region-based Deployments which supports deployments to multiple regions in the same pipeline run. Blue-Green Deployments (automatic tracking of active/Inactive stacks).
- Various branching, merging, resolve merge conflict and tagging. Continuous Delivery infrastructure setup in AWS VPCs
- Creating S3 buckets and also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS.
- Worked on Integration and Production AWS environments.
- Implemented the email notification and slack channel notification service in the Jenkins file it triggers message when the pipeline runs.
- Deploying multitude applications utilizing almost all of the AWS stack (Including EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud formation with JSON templates.
- Installed and configured configuration tool such as Chef Server / workstation and nodes via CLI.
- Developed CI pipeline for testing cookbooks using Rspec unit testing strategy which checks the versions and dependencies.
- Performed build and release automation (continuous integration/deployment build principles).
Environment: AWS - EC2, S3, Elastic Load Balancing, Route 53, IAM, Jenkins, Git, Splunk, Chef, Artifactory, CFT, webhooks, maven, Windows-2012 build agent, Rspec.
Hadoop Developer
Confidential - Columbus, GA
Responsibilities:
- Worked on Large-scale Hadoop YARN cluster for distributed data processing and analysis using Sqoop, Pig, Hive and NoSQL databases.
- Created end to end process for delivering reports including ETL, data driven subscriptions, ultimately to have the SSRS report delivered in multiple formats to different people from the organization depending on specific internal business rules.
- Worked independently on several large projects that were dead line driven and also helped the team with peer code reviews in order to ensure high quality code went into production.
- Created and maintained ETL Technical Specification Documents.
- Participated in Gathering requirements, analyze requirements and design technical documents for business requirements.
- Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations.
- Used SQOOP to import the data from RDBMS to HDFS to achieve the reliability of data.
- Developed and used existing UDFs for custom implementation on table data.
- Involved in collecting metrics for Hadoop clusters using Ganglia and Ambari.
- Implemented 100 node CDH4 Hadoop cluster on Red hat Linux using Cloudera Manager
- Created Partitioning, Bucketing, Map side Join, Parallel execution for optimizing the hive queries.
- Responsible for monitoring Cluster using Cloudera Manager.
- Developed Pig scripts for track data capture between arrived data and current data.
- Orchestrated hundreds of Sqoop queries, Pig scripts, Hive queries using Oozie workflows and sub-workflows.
- Responsible for handling different data formats like Avro, Parquet and ORC formats.
Environment: Hadoop, HDFS, Sqoop, Oozie, Pig, Hive, Oozie, Cassandra, Linux, YARN, Cloudera Manager.
Hadoop Admin
Confidential - Pleasanton, CA
Responsibilities:
- Worked on Hadoop cluster, which ranged from 30 nodes in development stage, 40 nodes in pre-production and 140 nodes in production.
- Responsible to manage data coming from different sources and importing structured and unstructured data.
- Handle the installation and configuration of a Hadoop cluster.
- Build and maintain scalable data pipelines using the Hadoop ecosystem and other open source components like Hive, and HBase.
- Handle the data exchange between HDFS and different web sources using Flume and Sqoop.
- Monitor the data streaming between web sources and HDFS.
- Monitor the Hadoop cluster functioning through monitoring tools.
- Close monitoring and analysis of the MapReduce job executions on cluster at task level.
- Changes to the configuration properties of the cluster based on volume of the data being processed and performance of the cluster.
- Set up automated processes to analyze the System and Hadoop log files for predefined errors and send alerts to appropriate groups.
- Responsible for building scalable distributed data solutions using Hadoop.
- Commission or decommission the data nodes from cluster in case of problems.
- Set up automated processes to archive/clean the unwanted data on the cluster, in particular on Name node and Secondary name node.
- Set up and manage HA Name node and Name node federation using Apache 2.0 to avoid single point of failures in large clusters.
- Set up the checkpoints to gathering the system statistics for critical set ups.
- Discussions with other technical teams on regular basis regarding upgrades, Process changes, any special processing and feedback.
Environment: Hadoop 1.0.0 and Hadoop 2.0.0, HDFS, Map Reduce, Cloudera, SQOOP, Hive, Pig, HBase, Java, Flume 1.2.0, Eclipse IDE.CDH3.
Linux Admin
Confidential
Responsibilities:
- Supported Solaris/Linux servers in production/QA/Development Environment, including Solaris Zone and RHEL VM's.
- Installed ESXi 4.1 Hypervisor on HP Servers.
- Installing, configuring and maintaining apache, samba, Web Sphere& Web Logic Application Servers.
- Worked on VMware, VMware View, and vSphere 4.0.
- Installation of systems using Jumpstart for Sun Servers and Kickstart for RHEL on HP Hw.
- Configure, support and perform routine maintenance of hardware and software for Linux and Solaris servers.
- Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
- Supported 200+ AWS cloud instances running Ubuntu, Redhat and Windows environments.
- Involved in Hardware and software evaluation, recommendation and vendor management.
- Automation of various administrative tasks on multiple servers using Puppet.
- Deployed Puppet, Puppet Dashboard, and Puppet DB for configuration management to existing infrastructure.
- Proficient in installation, configuration and maintenance of applications like Apache, LDAP, PHP
- Involved installing and managing different automation and monitoring tools on Redhat Linux like Nagios, Splunk and Puppet.
- Resolved configuration issues and problems related to OS, NFS mounts, LDAP user ids DNS and issues.
- Regularly applying patches for Redhat Linux, Sun and HP systems.
Software Developer
Confidential
Responsibilities:
- Designed a system and developed a framework using J2EE technologies based on MVC architecture.
- Involved in the iterative/incremental development of project application. Participated in the requirement analysis and design meetings.
- Designed and Developed UI's using JSP by following MVC architecture
- Designed and developed Presentation Tier using Struts framework, JSP, Servlets, TagLibs, HTML and JavaScript.
- Designed the control which includes Class Diagrams and Sequence Diagrams using VISIO.
- Used the STRUTS framework in application. Programmed the views using JSP pages with the struts tag library, Model is a combination of EJB's and Java classes and web implementation controllers are Servlets.
- Generated XML pages with templates using XSL. Used JSP and Servlets, EJBs on server side.
- Developed a complete External build process and maintained using ANT.
- Implemented Home Interface, Remote Interface, and Bean Implementation class.
- Extensive usage of XML - Application configuration, Navigation, Task based configuration.
- Designed and developed Unit and integration test cases using Junit.
- Used EJB features effectively- Local interfaces to improve the performance, Abstract persistence schema, CMRs.
- Used Struts web application framework implementation to build the presentation tier.
- Wrote PL/SQL queries to access data from Oracle database.
- Set up Web Sphere Application server and used ANT tool to build the application and deploy the application in Web sphere.
- Implemented JMS for making asynchronous requests
Environment: Java, J2EE, Struts, Hibernate, JSP, Servlets, HTML, CSS, UML, JQuery, Log4J, XML Schema, JUNIT, Tomcat, JavaScript, Oracle 9i, UNIX, Eclipse IDE.
