Hadoop Administration Resume
SUMMARY:
- 6 years of overall experience in a variety of industries, this includes hands on experience of 2 years in Big Data and Hadoop Technology stack and extensive experience of 4 years in Unix Technology.
- Experience in dealing with Apache Hadoop components like HDFS, MapReduce, HIVE, HBase, PIG, SQOOP, OOZIE and Flume.
- Excellent understanding/knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Experience in managing and reviewing Hadoop log files.
- Hands on experience in Import/Export of data using Hadoop Data Management tool SQOOP.
- Good Experience in configuring, installing and administrating Hadoop cluster of major Hadoop distributions (Cloudera, Horton works, Apache).
- Good understanding and knowledge of NOSQL databases like HBase and Cassandra.
- Good Exposure to Hadoop distributions such CDH3 and CDH4.
- Good knowledge of daemons like Resource Manager, Node Manager in YARN architecture.
- Experience in configuring Zookeeper to coordinate the servers in clusters to maintain the data consistency.
- Good experience with Databases such as MySQL, Oracle 11G.
- Installed, Administered, Configured and Deployed Red Hat Enterprise Linux 4.x/5.x/6.x,7.x CentOS and Ubuntu operating systems.
- Installed/Configured/Administrated VMware ESX - 3.5, ESX 4.1, 5.1, 5.5 & 6.0 and migrated existing servers into VMware Infrastructure.
- Implementation and troubleshooting of Logical Volume manager in RHEL/CentOS servers.
- Performing Kickstart to build out Linux environments to automate Linux installations with Kickstart to avoid manual intervention.
- Good knowledge on Cloud Computing Strategies (IaaS, PaaS, SaaS)
- Involved in UNIX Architectural decisions & experience in designing, implementing and supporting UNIX Server technology solutions.
- Experience in Package Management using Red Hat RPM/YUM and Red Hat Satellite server.
- Upgradation of Kernel upgradation in Red hat Linux and CentOS servers.
- Migration of ESX servers from one version to another version and also from one hardware to other.
- Managing VMware infrastructure/VSphere clusters on production and UAT/Deployment environments.
- Extensive knowledge on VSphere/Venter/Motion operations in VMWare environments.
- Experience NIC bounding configuration in Linux and UNIX systems to increase the bandwidth or redundancy based on requirement by the application.
- Installation & Configuring VCS clusters with multiple nodes administration of VCS Clusters like increasing the clustered file systems and performing the flip-over & fallback in between the nodes.
- Troubleshooting of VCS cluster in case of node failure or any hardware failure by recovering them by recovery methods.
- Fast learner with good interpersonal skills, having strong analytical and communication skills and interested in problem solving and troubleshooting.
- Self-motivated, excellent team player, with positive attitude and adhere to strict deadlines.
TECHNICAL SKILLS:
Big Data Technologies: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, HBase, Cassandra, Spark, Storm, Kafka and YARN
RDBMS: Oracle, MySql, and DB2
Web/App. Server: Tomcat, Web Logic, Web Sphere
Languages: SQL, Unix Shell Scripting
Virtualization/Cluster: VMware, VCS, EMC
Integration: Active Directory, LDAP, HMC, IVM, DRAC, HP ILO
Storage Array/NAS: IBM SVC, V7000, EMC Clarion, Netapp filers
Operating Systems: Windows, Red Hat Enterprise Linux 5.x, 6.x, 7.x, Unix, Solaris, HP-UX
Tools: IEDS, INNOTAS, ClearView, AOTS-Trouble Management, AOTS - Change Management, USET, SRTS, VPN, Putty, WinSCP, Remedy, Mercury
PROFESSIONAL EXPERIENCE:
Confidential
Hadoop Administration
Responsibilities:- Coordinated with business users, analysts working in upstream and downstream application and planned development tasks in application with GIT version control repository.
- Responsible for data extraction and data integration from different data sources (CRH, IRH) into Hadoop Data Lake by creating ETL pipelines Using Spark, Pig and Hive.
- Implemented scripts in Spark to import and export data from Cassandra, Teradata, Hadoop and vice-versa.
- Developed RDD's/Data Frames in Spark using Scala and applied several transformation logics to load data from Hadoop Data Lake to Cassandra DB.
- Involved in converting Hive queries into Spark RDD's/Data Frames and apply transformations by using Scala and also applied compression schemas such as Snappy, LZO on files.
- Developed HQL/Impala queries, Pig Scripts, Shell Scripts, Spark programming using Scala for all ETL loading processes and converting the files into compressed parquet form in the Hadoop File System.
- Extensively involved in partitioning, bucketing, distributed cache and used different types of joins on Hive external tables and also implemented customized Hive UDF's and UDAF's in java.
- Responsible for data ingestion from RDBMS to Hadoop using Sqoop and also performed data cleansing, transformations and using Piggybank API for further data analytics using Pig.
- Involved in daily Scrum meetings and reporting development of project activity assuring effective solution on Agile-scrum method and integrated.
- Worked closely with business Intelligence team for generating Confidential using Tableau based on business needs by connecting with Hive external tables and also with Teradata.
Environment: Linux, Java, CDH, Spark, Scala, HDFS, Impala, MPP, Hive, HiveQL, Pig, Sqoop, Oozie, Perl Scripts, Git, Cassandra, Tableau, Agile, SVN and Autosys.
Confidential
Hadoop Developer
Responsibilities:- Developed data pipelines using Sqoop, Pig and Map Reduce (Python) to ingest customer behavior data, insurance and claims data into HDFS to perform data analytics.
- Further used pig to do transformations, event joins, elephant bird API and pre -aggregations performed before loading JSON, AVRO files format onto HDFS.
- Implemented Hive UDF's and UDAF are coded in Java to simplify query logic and to transform data in flexible ways. These functions are used in Impala queries to return scalar values.
- Extensively created optimized HiveQL queries on partitioned external tables to process large data sets in a distributed and minimize consumption of containers in slave servers using bucketing, distributed cache, compressions and speculative execution
- Developed Hive queries to handle change data capture for processing incremental records between new arrived and existing data in hive tables.
- Involved in creating HBase tables and loading large sets of unstructured, semi-structured coming from upstream public data sets to perform data analysis.
- Developed Perl scripts to automated execution of all other scripts such as Pig, Hive, Sqoop and MapReduce and move data files within and outside of Hadoop Data Lake.
Environment: Linux, Java, Cloudera Manager, CDH, HDFS, MapReduce, Hive, HiveQL, Pig, HBase, MySQL, Python, Tableau, Perl Scripts, Oozie, Sqoop and SVN.
Confidential
Hadoop Admin
Responsibilities:- Working as Hadoop Admin in Big Data team.
- Involved in running Ad-Hoc query through PIG Latin language, Hive and MapReduce.
- Developed PIG Latin scripts to extract the data from the web server output files.
- Involved in Big data analysis using Pig and User defined functions (UDF).
- Managed and scheduled Jobs on a Hadoop cluster using Oozie.
- Involved ein log file management where the logs greater than 7 days old were removed from log folder and loaded into HDFS and stored for 3 months.
- Worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
- Extensively worked in acquiring the requirements from the business analysts and involved in all requirement clarification calls.
- Set up standards and processes for Hadoop based application design and implementation.
- Provide support to client applications in production and other environments
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Working on tickets raised by the real time users and continuous interaction with end users.
- Prepared the Technical Design Document, understanding document and test cases (UTCs and ITCs).
- Provided Technical & Functional support to the end users during UAT & Production.
- Continuous monitoring of application for 100% availability.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Supported application on rotational basis.
Environment: Hadoop, HDFS, VMware, Eclipse, PIG, Hive, HBase, Sqoop, Flume, Linux.
Confidential
Big Data Admin
Responsibilities:- Gathering and Streaming the Social Media data using Flume Agents based on Keyword Search for Brands and Products and conducting customer trends and insight analysis both Qualitative and Quantitative Analysis.
- Involved in data movement onto HDFS using SQOOP & FLUME.
- Implemented MapReduce for aggregation of data and doing Aggregate computations.
- Used PIG scripts on parts of data to perform various operations on data.
- To provide output data to the report analytics team in required formats.
Environment: Hadoop, HDFS, MapReduce, VMware, PIG, Hive, Sqoop, Flume, Linux.
Confidential
Hadoop Administrator
Responsibilities:- Built all Linux virtual machines for Hadoop cluster.
- Access to this site is provided for authorized users.
- Used JDBC for database connectivity.
- Implemented the Log4J logging component from Apache into the Application.
- Made Builds and deployed the same onto Common development test Environment, which is a Web sphere Application server Environment to verify its functional requirements.
- Developed the SQOOP scripts in order to make the interaction between Pig and MySQL database.
- Used PIG for extract, transformation & loading of structure data.
- Created Hive tables to store the processed results in a tabular format.
- Developed the UNIX shell scripts for creating the Confidential from Hive data.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data into HDFS using Sqoop.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
Environment: Linux, VMware, Sqoop, Hive, Pig, SQL and Oracle
Confidential
Unix and VMware Administrator
Responsibilities:- Review the requirements and suggest if there are any changes or modifications in the Server Build Document.
- Build Linux Systems as per the SBD, Golden Image and create a VM Template for further replications of the same image, Physical servers in UCS environment as per the SBD.
- Migration of VM from one cluster to another as per the requirement.
- Allocation of additional storage & compute resources at the VM level.
- Install and configure Linux OS, VCS as par the requirement.
- Create respective file systems & configure LVM’s as per SBD.
- Creation of user’s, groups, File System’s permissions, Host File entries, Kernel Parameter changes
- User, Group and Service account Creation. Set the password aging policies as per the standards.
- Create and mount the respective File Systems as per the request and make sure the FSTAB entries are persistent.
- Set permissions at the group level, file level and file system level.
- Mount all the required NFS Shares.
- Configure & assign the Production IP, Management IP & Backup IP for the newly built servers.
- Make sure all the DNS Entries and validating the DNS Server Settings.
- Configure POSTFIX on all the server builds.
- Configure NTP and make sure they are in sync with the NTP master server.
- Provide Sudo access to individual user’s and service accounts.
- Configure and enable SWAP and SSO on all the server builds.
- Make entries in the FSTAB file with all the local file systems as well as the NFS shares.
- Configuring and adding resources to the Zones
- Managing system services using SMF (service management facility).
- Update the kernel and set the KDUMP parameters as per the standards.
