Hadoop Developer /admin Resume
Atlanta, GA
EXPERIENCE SUMMARY:
- 8 years of experience in software development life cycle design, development and support of systems application architecture.
- 4+ Years of Big Data Hadoop Ecosystems experience in ingestion, storage, querying, processing and analysis of big data.
- Experience in working with Hadoop clusters using Cloudera (CDH5) distributions.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce concepts.
- Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters & Cloudera Hadoop Distribution.
- In depth knowledge and hands on experience in installing, configuring, monitoring and integration of Hadoop ecosystem components Hadoop (HDFS, MapReduce, Pig, Hive, Scoop, Flume, HBase, Oozie).
- Designing and creating Hive external tables using shared meta - store instead of derby with partitioning, dynamic partitioning and buckets.
- Exposure on Spark, Kafka and Scala Programming.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Knowledge of job workflow scheduling and monitoring tools like oozie and Zookeeper, of NoSQL databases such as HBase, Cassandra, and of administrative tasks such as installing Hadoop, Commissioning and decommissioning, and its ecosystem components such as Flume, Oozie, Hive and Pig.
- Experience in securing the cluster using Kerberos authentication, Sentry and Cloudera Navigator.
- Experience in designing, installing and configuring complete Hadoop ecosystem (components such as pig, hive, oozie, HBase, flume, zookeeper).
- Providing security for Hadoop Cluster with Kerberos, Active Directory/LDAP, and TLS/SSL utilizations and dynamic tuning to make cluster available and efficient.
- Experience in managing the cluster resources by implementing fair scheduler and capacity scheduler.
- Experience in developing and scheduling ETL workflows in Hadoop using Oozie.
- Experience in tools like Chef automates Hadoop installation, configuration and monitoring.
- Worked on Firewall implementation & Load balancer between various Windows servers.
TECHNICAL SKILLS:
Bigdata: Apache Hadoop, Cloudera, HBase, Hive, MapReduce, Zookeeper, Oozie, Sqoop, Flume, Pig, Solr and Hortonworks
Programming Languages: Core Java, J2EE, Scala, XML, DB2, CICS, SQL, PL/SQL, HiveQL, Pig Latin
Operating Systems: RedHat Linux, CentOS, Windows
Databases: IBM DB2, Oracle, SQL server 2000/2005/2008, MYSQL
Networking and Protocols: Tcp/IP, HTTP, FTP, SNMP, LDAP, DNS
Application Servers: Apache HTTP webserver, Websphere, weblogic
Tools: Nagios, Ganglia,Chef
PROFESSIONAL EXPERIENCE:
Confidential, Atlanta, GA
Hadoop Developer /Admin
Responsibilities:
- Responsible for architecting and proposing solutions for Big data using AWS services.
- Worked on analyzing, writing Hadoop MapReduce/Spark jobs using Java API, Pig and Hive.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Installed and configured EMR clusters with auto scaling to process huge amount of machine data.
- Involved in loading data from edge node to HDFS using shell scripting.
- Used EMR to process data from AWS S3 Buckets using Spark.
- Installed and configured EMR clusters with auto scaling to process huge amount of machine data
- Worked on installing cluster, commissioning & decommissioning of data node, name node high availability, capacity planning, and slots configuration.
- Worked on POC to process Large datat sets on EMR AWS Hadoop clusters.
- Used DynamoDB as NOSQL platform for storing time-series data.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Implemented a script to transmit sysprin information from Oracle to HBase using Sqoop.
- Implemented test scripts to support test driven development and continuous integration.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Responsible for Design and implementation of cross-region replication for data movement using Lambda functions
- Load and transform large sets of structured, semi structured and unstructured data
- Formulated ETL processes (Using Talend) to Extract Data.
- Develop ETL mappings and workflows using Talend for source system’s data extraction and data transformations.
- Experience in managing and reviewing Hadoop log files.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Worked on python scripts to analyze the data.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Confidential, Chicago, IL
Hadoop Developer/Admin
Responsibilities:
- Responsible for coding Map Reduce program, Hive queries, testing and debugging the Map Reduce programs.
- Developed Pig Latin scripts in the areas where extensive coding needs to be reduced to analyze large data sets.
- Installed and configured Hive and also implemented various business requirements by writing HIVE UDFs.
- Used Sqoop tool to extract data from a relational database into Hadoop.
- Worked closely with data warehouse architect and business intelligence analyst to develop solutions.
- Responsible for performing peer code reviews, troubleshooting issues and maintaining status report.
- Involved in creating Hive Tables, loading with data and writing Hive queries, which will invoke and run MapReduce jobs in the backend.
- Installed and configured Hadoop cluster in DEV, QA and Production environments.
- Strongly recommended to bring in Elastic Search and was responsible for installing, configuring and administration.
- Performed upgrade to the existing Hadoop clusters.
- Enabled Kerberos for Hadoop cluster Authentication and integrate with Active Directory for managing users and application groups.
- Implemented Commissioning and Decommissioning of new nodes to existing cluster
- Worked with systems engineering team for planning new Hadoop environment deployments, expansion of existing Hadoop clusters.
- Responsible for data ingestions using Talend.
- Designed and presented plan for POC on impala.
- Experienced in migrating Hive QL into Impala to minimize query response time.
- Monitoring workload, job performance and capacity planning using Cloudera Manager.
- Worked with application teams to install OS level updates, patches and version upgrades required for Hadoop cluster environments.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
Hadoop Administrator/Developer
Confidential, IL
Responsibilities:
- Developed data pipeline using Kafka, Flume, Sqoop, Pig and Spark to ingest customer behavioral data and financial histories into HDFS for analysis.
- Developed Turbo cow framework for filtering data from raw to enrich with business rules.
- Framework contains different actions like lookups, replace null with zero and also simple copy for further uses.
- Involved in Sqoop that to bring data from Teradata into hdfs to do lookups with dimensional data.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting. Involved in developing Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS Designed and implemented various metrics that can statistically signify the success of the experiment.
- Used Intellij to build the application.
- Hands on experience in Hadoop cluster 5.5 and 5.4.
- Involved in extracting customer’s Big data from various data sources into Hadoop HDFS. This included data from mainframes, databases and also logs data from servers.
- Responsible and managed entire Hive warehouse.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs.
- Created Hive tables, loaded data and wrote Hive queries that run within the map.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Involved in streaming data to ingest from Kafka cluster on json format and also Teradata tables imported on periodic basis as batch jobs.
- Taken care of flume agents to ingest the ALS event data stream from Kafka to hdfs as compressed for batch processing with spark and also streams raw data to spark streaming.
- Used sqoop to import tables and also data from teradata to hdfs periodically.
- Implemented automatic failover Zookeeper and zookeeper failover controller.
- Worked on impala performance tuning with different workloads and file formats.
Environment: Hadoop, Spark, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, Kafka, Cloudera 5.5, Oozie, Impala, Tableau, Eclipse, Intellij.
Linux /Hadoop Admin
Confidential, OH
Responsibilities:
- Involved in start to end process of Hadoop cluster setup where in installation, configuration and monitoring the Hortonworks Hadoop Cluster.
- Implemented Kerberos and AD integration for authentication purposes.
- Installed and configured sentry and Cloudera navigator for authorization purpose
- Responsible for Cluster maintenance & Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage & review data backups, Manage & review Hadoop log files.
- Monitoring systems & services, architecture design & implementation of Hadoop deployment, configuration management, backup and disaster recovery systems and procedures.
- Installation of various Hadoop Ecosystems and Hadoop Daemons.
- Responsible for Installation and configuration of Hive, Pig, HBase and sqoop on the Hadoop cluster.
- Implemented Hadoop stack and different bigdata analytic tools, migration from different databases to Hadoop.
- Installed and configured Spark and fine-tuned applications for Apache Spark.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning.
- Expertise in recommending hardware configuration for Hadoop cluster.
- Installing, Upgrading and Managing Hadoop Cluster on Cloudera distribution.
- Trouble shooting many cloud related issues such as Data Node down, Network failure and data block missing.
- Managing and reviewing Hadoop and HBase log files.
- Experience with Unix or Linux, including shell scripting.
- Loading the data from the different Data sources like (Teradata and DB2) into HDFS using sqoop and load into Hive tables, which are partitioned.
- Developed bash scripts to bring the Tlog files from ftp server and then processing it to load into hive tables.
- Built automated set up for cluster monitoring and issue escalation process.
Linux Administrator
Confidential
Responsibilities:
- Installing, configuring and upgrading Linux (Primarily REDHAT and UBUNTU) and Windows Servers.
- Installing and partitioning disk drives. Creating, mounting and maintaining file systems to ensure access to system, application and user data.
- Data consistency of file system using fsck and other utility.
- Co-coordinating with the customers vendors for any system upgradation and giving the exact procedure to follow up.
- Scheduling the daily/weekly/monthly backups.
- Patching up the system to the latest version as per the recommendations.
- Monitor the health of the servers, Operating system, database and the network.
- Maintenance of Hard disks (Formatting and Setup, Repair from crashes).
- Installing & administering NFS services using automounter.
- Regular disk management like adding/replacing hard drives on existing servers/workstations, partitioning
- According to requirements, creating new file systems or growing existing one over the hard drives and managing file systems.
- Creating users, assigning groups and home directories, setting quota and permissions; administering file systems and recognizing file access problems.
- Maintaining appropriate file and system security, monitoring and controlling system access, changing
- Permission, ownership of files and directories, maintaining passwords, assigning special privileges to selected users and controlling file access, monitoring status of process in order to increase the system efficiency, scheduling system related and cron jobs.
- Build and maintain windows 2000 and 2005 servers.
- Managed windows servers, configured domain, active directory; created users and groups; assigned Permission to the groups on the folders.
- Setup LANs and WLANs; Troubleshoot network problems. Configure routers and access points.
- Worked on migrating from Windows 2000 to 2005.
- Allocated disk space using EMC disk storage.
Environment: Red Hat Linux 3.9/4.5-4.7, Windows 2000/NT 4.0, Apache 1.3.36, 1.2, 2.0, IIS 4.0 and Oracle 8i, bash shell, Samba, DNS, APACHE, Putty, WinSCP.
Linux Admin
Confidential
Responsibilities:
- Administration of RHEL, which includes installation, testing, tuning, upgrading and loading patches, troubleshooting both physical and virtual server issues.
- Creating, cloning Linux Virtual Machines.
- Installing RedHat Linux using kick-start and applying security polices for hardening the server based on the company policies.
- RPM and YUM package installations, patch and other server management.
- Managing systems routine backup, scheduling jobs like disabling and enabling cron jobs, enabling system logging, network logging of servers for maintenance, performance tuning, testing.
- Tech and non-tech refresh of Linux servers, which includes new hardware, OS, upgrade, application installation, testing.
- Set up user and group login ID's, printing parameters, network configuration, password, resolving permissions issues, and user and group quota.
- Creating physical volumes, volume groups, logical volumes.
- Gathering requirements from customers and business partners and design, implement and provide solutions in building the environment.
- Installing and configuring Apache and supporting them on Linux production servers.
- Troubleshooting Linux network, security related issues, capturing packets using tools such as Iptables, firewall, TCP wrappers, NMAP.
