Hadoop Admin/architect Resume
5.00/5 (Submit Your Rating)
Hoffman Estates, IL
Summary
- Over the past 8 years, my primary focus has been on Big Data Hadoop Engineering, Data Analytics, and Machine Learning. I have implemented Big Data solutions using Hadoop distributions from Cloudera (CDH) and Hortonworks (HDP) on VMware, Amazon AWS and Microsoft Azure cloud platforms.
- Hands - on experience with ecosystem tools such Hive, Sqoop, Impala, Hbase, Pig, Flume, Kafka, HDFS, Oozie, and MapReduce.
- To handle the need for more speed, I have gain knowledge of Apache Spark for the fast in-memory data processing engine. Motivated to learn more, I have completed the IBM Data Science
- Professional Certification with real-world experience creating prediction models on subjects such as cancer detection, economic trends, customer churn, and banking loan approvals.
Work Experience
Hadoop Admin/Architect
Confidential, Hoffman Estates, IL
Responsibilities:
- Created a new project solution based on the company's technology direction ensured that infrastructure services are projected based on current standard
- Upgrading the cluster from CDH 4. x to CDH 5.x.
- Implemented HA for name node and HUE using Cloudera manager
- Created and configured cluster monitoring service activity monitor, service monitor, report manager, event server and alert publisher.
- Created cookbooks/playbooks and documentations for special tasks
- Configured HA proxy for IMPALA service
- Writing desktop scripts for synchronizing the data within and across clusters.
- Created snapshot's for in-cluster backup of the data instance.
- Created SQOOP scripts for ingesting data from Transactional systems to Hadoop.
- Regularly accessing JIRA and Service now tools and other internal issue trackers for the Project development.
- Conducted Technology Evaluation sessions for Big Data, Data Governance, Hadoop and Amazon Web Services, Tableau and R, Data Analysis, Statistical Analysis, Data Driven Business Decision
- Integrated Tableau, Teradata, DB2, ORACLE via ODBC/JDBC drivers with Hadoop
- Worked with application teams to install the operating system, Hadoop updates, patches, version upgrades as required.
- Created scripts for automating balancing data across the cluster using the HDFS load balancer utility.
- Created POC for implementing streaming use case with Kafka and HBase services.
- Working experience of maintaining MySQL database creation and setting up the users and maintain the backup of databases.
- Implemented Kerberos Security Authentication protocol for existing cluster.
- Integrated is an existing LLE and Production cluster with LDAP.
- Implemented TLS for CDH Services and for Cloudera Manager.
- Working with data delivery teams to set up new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
- Managed the backup and disaster recovery for Hadoop data. Coordinated root cause analysis efforts to minimize future system issues
- Served as lead technical infrastructure Architect and Big Data subject matter expert.
- Deployed Big Data solutions in the cloud. Built, configured, monitored and managed end to end Big Data applications on Amazon Web Services (AWS)
- Screen Hadoop cluster job performances and capacity planning
- Spinning clusters in Azure using Cloudera director. Implemented this for POC for the cloud migration project.
- Leveraged AWS cloud services such as EC2, auto-scaling and VPC to build secure, highly scalable and flexible systems that handled expected and unexpected load bursts
- Defined Migration strategy to move the application to the cloud. Developed architecture blueprints and detailed documentation. Created bill of materials, including required Cloud Services (such as EMR, EC2, S3 etc.) and tools, experience in scheduling cron jobs on EMR
- Created bash scripts frequently, depending on the project requirements
- Work on GCP Cloud architecture design patterns
- Guided application teams on choosing the right file formats in Hadoop file systems Text, Avro, Parquet and compression techniques such as Snappy, bz2, LZO
- Improved communication between teams in the matrix environment which led to increase in number of simultaneous projects and average billable hours
- Substantially improved all areas of the software development life cycle for the company products, introducing frameworks, methodologies, reusable components and best practices to the team
- Implemented VPC, Auto scaling, S3, EBS, ELB, Cloud Formation templates and Cloud Watch services from AWS
Environment: Over 1500 nodes, Approximately 5 PB of data, Cloudera's distribution Hadoop (CDH) 5.5, HA name node, map reduce, Yarn, Hive, Impala, Pig, Sqoop, Flume, Cloudera Navigator, Control-M, Oozie, Hue, White elephant, Ganglia, Nagios, HBase, Cassandra, Kafka, Storm, Cobbler, Puppet
Hadoop Architect / Administrator
Confidential, Norfolk, VAResponsibilities:
- Built proof of concept Cloudera cluster running in AWS VPC based on client requirements.
- Architected, built, and deployed Development, QA and production clusters on Azure platform.
- Created Python programs to automate manual processes.
- Performed configuration management utilizing Chef and Ansible.
- Handled version control system with Git, and Bitbucket for software and documentation.
- Performed day-to-day cluster management and security utilizing TLS, Kerberos, Cloudera Manager, and Cloudera Distributed Hadoop.
- SSL (Secure Socket Layer) / TLS (Transport Layer Security) experience
- AD (Active Directory) integration to HDP / HDF experience
- SSL (Secure Socket Layer) / TLS (Transport Layer Security) experience
- AD (Active Directory) integration to HDP / HDF experience
- SL (Secure Socket Layer) / TLS (Transport Layer Security) experience
- AD (Active Directory) integration to HDP / HDF experience
- Configured SSL (Secure Socket Layer) /TLS (Transport Layer Security) and AD (Active Directory) integration into Hadoop.
- Configured and optimized Kudu, HDFS, YARN, Sentry, Hue, Navigator, Impala, Spark on YARN, and Hive services to achieve business requirements in a secure cluster.
- Recommended architectural design and drove infrastructure requirements.
- Created dashboards and reports in Tableau after the integration of Apache Hive and Impala.
- Utilized Sqoop and Flume to export data into HDFS from relational databases and log files.
- Perform troubleshooting on SQL Server Integration Services (SSIS) and ETL packages.
- Scheduled Oozie workflows to generate the monthly reports files automatically.
- Created a Backup and recovery solutions using Cloudera Enterprise Backup and Disaster Recovery (BDR), and Snapshots.
Hadoop Administrator / Python Data Analysis
Confidential, New York, NYResponsibilities:
- Responsible for Cloudera cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups & log files.
- Performed performance turning on Apache Pig, Hive, and Hbase to increase MapReduce jobs.
- Designed and Developed Real-Time Stream Processing Application using Kafka, and Hive to perform Streaming ETL and apply Machine Learning.
- Performed security configurations with Ranger, Kerberos, and HDFS commands.
- Utilized NoSQL database Hbase tables to store Internet of Things (IoT) device information.
- Extract datasets from Excel and RDBMS databases and perform data cleaning, Data frame manipulation, and summarization utilizing Python programming.
- Build machine learning Regression models and data pipelines using Python libraries.
- Researched, designed and prototyped robust and scalable models based on machine learning, data mining, and statistical modeling to answer key business problems
- Worked with development teams & business groups to ensure models can be implemented as
- part of a delivered solution replicable across departments.
- Converted datasets into actionable (modeling) to Predict and/or Analyzed habits, budget, population segmentation, and population classification.
- Utilized Jupyter Notebooks, RStudio IDE, Apache Zeppelin for developing in Python and creating predictive models and visualization.
Big Data Engineer
Confidential, Atlanta, GAResponsibilities:
- Monitored and managed all Big Data ecosystem services and ensured high availability of services along with performance tuning.
- Utilized RedHat Satellite for infrastructure management to handle security, patching, and compliance-related issues.
- Performed administrative activities for HDFS, YARN, MapReduce, Sqoop, Hive, Hbase flume, Zookeeper, Oozie and Spark.
- Utilized tools such as Puppet and for application and configuration management.
- Perform data analytics utilizing both Apache Hive and Python programming.
- Performed day-to-day management of memory, CPU, disk space, and log rotations with the use of UNIX and Python programming.
- Worked with the Infrastructure support team to install operating system and version upgrades as required.
- Monitored multiple hadoop clusters environments using and Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in Installing and configuring Kerberos for the authentication of users and Hadoop daemons.
- Developed data Migration processes across Hadoop Clusters using distcp.
UNIX System Engineer
Confidential, Sacramento, CAResponsibilities:
- Performed file system tuning and growing using VERITAS File System (VxFS), coordinated with SAN Team for storage allocation and Disk Dynamic Multi path
- Patching of Veritas Netback media/master server that were setup as two node Veritas Cluster running on physical Red Hat Linux to do OS and Security patch upgrade and troubleshoot any cluster related issues while testing failover of service groups between nodes.
- Worked on Volume management, Disk Management, software RAID solutions using VERITAS Volume manager & Solaris Volume Manager
- Developed data Migration processes across Hadoop Clusters using distcp.
- Decommissioning of the old Unix servers (Linux, AIX and Solaris) and keeping track of decommissioned and new servers using inventory list. Migration of Local to SAN Boot disks on Production Servers.
- Provided project applications developed in Shell, Perl, Java, Oracle, SQL, web methods, Business Objects