Hadoop Administrator/ Hadoop Infrastructure Architect Resume
Hoffman Estates, IL
SUMMARY
- 13 years of Professional experience in IT background which includes 6 years in Hadoop Technologies and extensive experience in Linux flavors
- Experience in implementing Data Warehousing/ETL solutions for different domains like financial, telecom, loyalty, Retail and insurance verticals
- Experience in operating and managing large clusters with 650+ nodes and 4+ Peta Bytes of storage.
- Hands on experience in setting up, configuring Hadoop ecosystem components like Hadoop, MapReduce, HDFS, HBase, IMPALA, OOZIE, HIVE, SQOOP, PIG, SPARK, FLUME, KAFKA, SENTRY, RANGER services.
- Experience in planning, implementing, testing and documenting the performance benchmarking to Hadoop platform.
- Helped in planning, development and architecture of Hadoop ecosystem.
- Experience in both On - Premises and Cloud space: AWS, GCP, and AZURE.
- Experience with securing Hadoop clusters by implementing Kerberos KDC installation, LDAP Integration, data transport encryption with TLS, and data-at-rest encryption with Cloudera Navigator Encrypt.
- Experience on Design, configure and manage the backup and disaster recovery using Data Replication, Snapshots, Cloudera BDR utilities.
- Implemented Role based authorization for HDFS, HIVE, IMPALA using Apache Sentry.
- Good Knowledge on Implementing and using Cluster monitoring tools like Cloudera Manager,Ganglia and Nagios.
- Experienced in implementing and supporting auditing tools like Cloudera Navigator.
- Experience in implementing High Availability features for services like Namenode, HUE, IMPALA.
- Hands-on experience in Deploying and using automation tools like PUPPET for cluster configuration management.
- Experience in creating cookbooks/playbooks and documentations for Installation, upgrades and support projects.
- Participated in the application on-boarding meetings along with Application owners, Architects and helps them to identify/review the technology stack, use case and estimation of resource requirements.
- Experience in documenting standard practices and compliance policies.
- Fix the issues by interacting with dependent and support teams and log the cases based on the priorities.
- Assisted in tuning the performance of the Hadoop ecosystem as well as monitoring.
- Hands on experience in performing functional testing and helps application teams/users to in corporate third party tools with Hadoop environment.
- Experience in analyzing Log files and finding the root cause and then involved in analyzing failures, identifying root causes and taking/recommending course of actions.
- Experience in Data Warehousing and ETL processes.
- Experience Implemented and configured High Availability Hadoop Cluster using Hortonworks Distribution.
- Knowledge of integration with Reporting tools like Tableau, Micro-Strategy and Datameer.
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing job counters and application logs files.
- Experience on setting up Hortonworks cluster and installing all the ecosystem components through Ambari.
- Experience in tuning performance for various services
- Experience in job scheduling and monitoring tools like Control M, Nagios and Ganglia.
- Additional responsibilities include interacting with offshore team on a daily basis, communicating the requirement and delegating the tasks to offshore/on-site team members and reviewing their delivery.
- Good Experience in managing Linux platform servers
- Effective problem-solving skills and ability to learn and use new technologies/tools quickly
- Good scripting knowledge in Bash shell scripting.
- Experience in working on ITIL tools JIRA, SUCCEED and SERVICE-NOW tools for change management and support processes.
- Has good experience, excellent communication and interpersonal skills which contribute to timely completion of project deliverables well ahead of schedules
- Experience in providing 24x7X365, on-call and weekend production support.
TECHNICAL SKILLS
Operating Systems/Platforms: UNIX& Linux (CentOS 6 & RHEL6), CentOS, Ubuntu 14. x, AIX, Windows
Programming Languages: C, C++, Java, Pig Latin, SQL, HQL
CloudComputing Services: VMware, AWS, GoogleCloud, Microsoft Azure
CRM Package: Siebel 7. x, Siebel 8. x
SQL & NOSQL Data Storage: PostgreSQL, MYSQL, Cassandra, MongoDB, Teradata, Oracle
Big Data Ecosystem: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, SQOOP, Oozie, YARN, flume, Impala, Ganglia, Storm, Cassandra, Sentry, Kafka,Ranger,R
Management Tool: Cloudera Manager,Ambari
Application Servers: WebLogic 11g, 12c, Tomcat 5. x and 6.x
ETL Tool: Informatics 8. x and 9. x, BODS 4.0/4.1, Talend
Reporting tools: BI Publisher,, Web Intelligence, Tableau, Micro Strategy, Datameer
SCM Tools: Perforce, Team track, VSS, Harvest, SVN and HP Quality Centre, Jira
Methodology: Agile SDLC, UML
Scripting language: Bash, Perl, Pig, Python, Puppet
Security: Kerberos, Sentry, LDAP, AD, SSL/TLS, REST Encryption
Protocols: TCP/IP, UDP, SNMP, Socket Programming, Routing Protocol
PROFESSIONAL EXPERIENCE
Hadoop Administrator/ Hadoop Infrastructure Architect
Confidential - Hoffman Estates, IL
Responsibilities:
- Created a new project solution based on the company's technology direction ensured that infrastructure services are projected based on current standard
- Implemented HA for namenode and HUE using Cloudera manager
- Created and configured cluster monitoring service activity monitor, service monitor, report manager, event server and alert publisher.
- Created cookbooks/playbooks and documentations for special tasks
- Configured HA proxy for IMPALA service
- Writing desktop scripts for synchronizing the data within and across clusters.
- Created snapshot’s for in cluster backup of the data instance.
- Created SQOOP scripts for ingesting data from Transactional systems to Hadoop.
- Regularly accessing JIRA and Service now tools and other internal issue trackers for the Project development.
- Worked independently with Cloudera support and Hortonworks support for any issue/concerns with Hadoop cluster.
- Conducted Technology Evaluation sessions for Big Data, Data Governance, Hadoop and Amazon Web Services, Tableau and R, Data Analysis, Statistical Analysis, Data Driven Business Decision
- Integrated Tableau, Teradata, DB2, ORACLE via ODBC/JDBC drivers with Hadoop
- Worked with application teams to install the operating system, Hadoop updates, patches, version upgrades as required.
- Hortonworks configuration parameters to new different environments such as Integration, Production
- Created scripts for automating balancing data across the cluster using the HDFS load balancer utility.
- Created POC for implementing streaming use case with Kafka and HBase services.
- Working experience of maintaining MySQL database creation and setting up the users and maintain the backup of databases.
- Implemented Kerberos Security Authentication protocol for existing cluster.
- Integrated is existing LLE and Production clusters with LDAP.
- Implemented TLS for CDH Services and for Cloudera Manager.
- Working with data delivery teams to set up new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
- Managed the backup and disaster recovery for Hadoop data. Coordinated root cause analysis efforts to minimize future system issues
- Served as lead technical infrastructure Architect and Big Data subject matter expert.
- Deployed Big Data solutions in the cloud. Built, configured, monitored and managed end to end Big Data applications on Amazon Web Services (AWS)
- Screen Hadoop cluster job performances and capacity planning
- Spinning clusters in Azure using Cloudera director. Implemented this for POC for the cloud migration project.
- Leveraged AWS cloud services such as EC2, auto-scaling and VPC to build secure, highly scalable and flexible systems that handled expected and unexpected load bursts
- Defined Migration strategy to move the application to the cloud. Developed architecture blueprints and detailed documentation.
- Created bill of materials, including required Cloud Services (such as EMR, EC2, S3 etc.) and tools, experience in scheduling cron jobs on EMR
- Created bash scripts frequently, depending on the project requirements
- Work on GCP Cloud architecture design patterns
- Guided application teams on choosing the right file formats in Hadoop file systems Text, Avro, Parquet and compression techniques such as Snappy, bz2,LZO
- Improved communication between teams in the matrix environment which led to increase in number of simultaneous projects and average billable hours
- Substantially improved all areas of the software development life cycle for the company products, introducing frameworks, methodologies, reusable components and best practices to the team
- Implemented VPC, Auto scaling, S3, EBS, ELB, Cloud Formation templates and Cloud Watch services from AWS
Environment: Over 1500 nodes, Approximately 5 PB of data, Cloudera's distribution Hadoop (CDH) 5.5, HA name node, map reduce, Yarn, Hive, Impala, Pig, Sqoop, Flume, Cloudera Navigator, Control-M, Oozie, Hue, White elephant, Ganglia, Nagios, HBase, Cassandra, Kafka, Storm, Cobbler, Puppet
Sr. Hadoop Consultant
Confidential - Chicago, IL
Responsibilities:
- Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including Pig, Hive, Sqoop, Oozie and Zookeeper.
- Used Sqoop to migrate data to and fro from HDFS and My SQL or Oracle and deployed Hive and HBase integration to perform OLAP operations on HBase data
- Designed, planned and delivered a proof of concept and business function/division based implementation of a Big Data roadmap and strategy project
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Involved in MapReduce Converged Data Platform was built with the idea of data movement in mind, with a real-time
- Involved in exporting the analyzed data to the databases such as Teradata, MySQL and Oracle use Sqoop for visualization and to generate reports for the BI team.
- Worked on an Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive and pig jobs that extract the data in a timely manner
- Exported the generated results to Tableau for testing by connecting to the corresponding Hive tables using the Hive ODBC connector.
- Running Periodic Map-Reduce jobs to load data from Cassandra into Hadoop
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency
- Experience in analyzing Cassandra database and comparing it with other open-source NoSQL databases to find which one of them best suits the current requirements.
- Transformed the data using Hive, Pig for BI team to perform visual analytics, according to the client requirement.
- Developed scripts and automated data management from end to end and sync up b/w all the Clusters
- Implemented Fair schedulers on the Job Tracker to share the resources of the cluster of the Map Reduce jobs given by the users
Environment: Cloudera CDH 3/4 Distribution, HDFS, MapReduce, Cassandra, Hive, Oozie, Pig, Shell Scripting, MySQL
Sr. Hadoop Administrator
Confidential, New York, NY
Responsibilities:
- Strong working experience with open source technology
- Store unstructured data in semi structure in HDFS using HBase
- Used Change management and Incident management process following the company standards
- Implemented partitioning, dynamic partitions and buckets in HIVE
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager
- Demonstration of the Live Proof Of Concept Demo to Clients
- Supported technical team members in the management and review of Hadoop log files and data backups.
- Suggested improvement processes for all process automation scripts and tasks.
Environment: Hadoop, Map Reduce, Hive, Pig, Oozie, HBase, Sqoop, Flume, Java,MySQL, Eclipse, UNIX Script
Sr. Consultant
Confidential, Los Angeles, CA
Responsibilities:
- Prepared test packs, release notes, deployment checklists
- Maintaining Siebel Server Components & Parameters
- Deployed the new releases in different Environments
- Remote Administration - Managing & monitoring all the Remote components.
- Attending the P1/P2 calls to resolve the issues
Environment: Siebel 8. x, Oracle, AIX, Harvest, OBIEE
IT Consultant
Confidential
Responsibilities:
- Perform Siebel upgrades, patch installation
- Analyzing and resolving the Production Incidents tickets adhering to SLA’s.
- Prepared deployment plans for major upgrade releases
- Monitoring components, Server utilizations and Troubleshooting issues as Server
Environment: Siebel 8.x, Oracle, Linux
IT Consultant
Confidential
Responsibilities:
- Compilation of the srf in both English and Spanish
- Responsible for Migrating of Repository
- Siebel Installations, Upgrades, Applying patches & Repository Migrations
- Health checks the Production servers
Environment: Siebel 8. x, Oracle, Linux
Technical Associate
Confidential
Responsibilities:
- Responsibilities undertaken as Siebel Server Administration
- Installation and configuration of third party products
- Non-Repository Migrations to various testing environments and troubleshooting Migration Issues
- Setup & Maintenance of offshore Develops data (Data Extracts &synchronization)
- User Creation at Application and Database Level
Environment: Siebel7x, UNIX, Solaris Servers, Windows 2000, Oracle10g
Software Engineer
Confidential
Responsibilities:
- Involved in maintenance / enhancement of the Confidential .
- Have worked on tailoring the Linux to suit the hardware requirements of Confidential thin Clients
- Involved in BUG fixing and improvement of the overall Linux Product.
- Fixed problems arising due to the window manager issues.
Environment: Linux, GTK, C++, Java, GNU C Compiler