Hadoop Admin / Developer Resume
Tarrytown, NY
SUMMARY:
- Over 7+ years of professional IT experience which includes around 4+ years of hands on experience in Hadoop using Cloudera, Hortonworks, and Hadoop working environment includes Map Reduce, HDFS, HBase, Zookeeper, Oozie, Hive, Sqoop, Pig, Cassandra and Flume.
- 2+ years of Java programming experience in developing web - based applications and Client-Server technologies
- Hands on Experience in Installing, Configuring and using Hadoop Eco System Components like HDFS, Hadoop Map Reduce, Yarn, Zookeeper, Sqoop, Flume, Hive, HBase, Pig, Oozie, Solr, Spark, Spark-Streaming, Storm, Kafka, Cassandra, Impala, Snappy, Green plum and MongoDB
- In depth understanding of Map Reduce and AWS cloud concepts and its critical role in data analysis of huge and complex datasets.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce, Spark.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
- Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle.
- Good knowledge of No-SQL databases Cassandra and HBase.
- Hands on experience in writing Ad-hoc Queries for moving data from HDFS to HIVE and analyzing the data using HIVE QL.
- Working experience in importing and exporting data using Sqoop from Relational Database Systems (RDBMS) to HDFS.
- Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
- Developed Pig Latin scripts for data cleansing and Transformation.
- Experience in Importing and Exporting Data between different Database Tables like MySQL, Oracle and HDFS using Sqoop.
- Good knowledge on Apache Hadoop Cluster planning which includes choosing the Hardware and operating systems to host an Apache Hadoop cluster.
- Experience in Benchmarking, Backup and Disaster Recovery of Name Node Metadata.
- Experience in performing minor and major Upgrades of Hadoop Cluster
- Capable of processing large sets of structured, semi-structured, un-structured data and supporting systems application architecture.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Expert in creating and designing data ingest pipelines using technologies such as spring Integration, Apache Storm-Kafka
- Experience working with popular frame works like Spring MVC, Hibernate.
- Experience with Nagios and writing plugins for Nagios to monitor Hadoop clusters.
- Experience on System Administrator on Linux (Centos, Ubuntu, Red Hat).
- Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.
- Good Team Player, Strong Interpersonal, Organizational and Communication skills combined with Self-Motivation, Initiative and Project Management Attributes
TECHNICAL SKILLS:
Big Data Eco System: HDFS, MapReduce, YARN, Hadoop Streaming, ZooKeeper, Oozie, Sqoop, Hive, Pig, HBase, Spark, Flume.
NoSQL: HBase, Cassandra, MongoDB
Languages: Java/ J2EE, SQL, Shell Scripting, Python
Web Technologies: HTML, JavaScript, CSS, XML, Servlets.
Web/ Application Server: Apache Tomcat Server, LDAP, JBOSS, IIS
Operating system: Windows, Linux and Unix
Frameworks: Springs, MVC, Hibernate
DBMS / RDBMS: Oracle 11g/10g/9i, SQL Server 2012/2008, MySQL
Version Control: SVN, CVS, and GIT
PROFESSIONAL EXPERIENCE:
Confidential, Tarrytown, NY
Hadoop Admin / Developer
Responsibilities:- Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Imported and exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Developed pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
- Developed Hive queries for Analysis across different banners.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into Hive tables, which are partitioned.
- Developed Hive UDF’s to bring all the customers related information into a structured format.
- Experience in setting up Hadoop clusters on cloud platforms like AWS.
- Experience in Upgrading Hadoop cluster HBase/zookeeper from CDH3 to CDH4.
- Developed bash scripts to bring the Tlog files from ftp server and then processing it to load into hive tables.
- Used Flume to handle streaming data and loaded the data into Hadoop cluster
- All the bash scripts are scheduled using Resource Manager Scheduler.
- Developed complex queries using HIVE and IMPALA.
- Moved data from HDFS to Cassandra using MapReduce and BulkOutputFormat class.
- Developed MapReduce programs for applying business rules on the data.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Developed and executed hive queries for denormalizing the data.
- Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analyses.
- Responsible for executing hive queries using Hive Command Line, Web GUI HUE and Impala to read, write and query the data into HBase.
- Worked with different file formats like JSON, AVRO and parquet and compression techniques like snappy.
- Responsible for building scalable distributed data solutions using Hadoop cluster environment with Hortonworks distribution
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Expert in creating and designing data ingest pipelines using technologies such as spring Integration, Apache Storm-Kafka
- Supported Data Analysts in running MapReduce Programs.
- Developed MapReduce ETL in Java and Pig.
- Worked on importing and exporting data into HDFS and Hive using Sqoop.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system
- Involved in joining and data aggregation using Apache Crunch
- Continuously monitored and managed the Hadoop Cluster using Cloudera Manager.
- Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
- Involved in writing, testing, and running MapReduce pipelines using Apache Crunch
Environment: Apache Hadoop, HBase, Hive, Pig, Sqoop, ZooKeeper, Hortonworks, NoSQL, HBase, Storm, MapReduce, Cloudera, HDFS, Scala, Impala, Flume, MySQL, JDK1.6, J2ee, JDBC, JSP, Struts 2.0, Spring 2.0, Hibernate, Python, WebLogic, Spark.
Confidential,San Jose, CA
Hadoop Admin / Developer
Responsibilities:- Imported Data from Different Relational Data Sources like RDBMS, Teradata to HDFS using Sqoop.
- Involved in the installation of CDH3 and up-gradation from CDH3 to CDH4.
- Imported Bulk Data into HBase Using Map Reduce programs.
- Performed analytics on Time Series Data exists in HBase using HBase API.
- Designed and implemented Incremental Imports into Hive tables.
- Used Rest API to Access HBase data to perform analytics.
- Worked in Loading and transforming large sets of structured, semi structured and unstructured data
- Installed and configured Hadoop HDFS, MapReduce, Pig, Hive, and Sqoop
- Worked in the administration activities in providing installation, upgrades, patching and configuration for all hardware and software Hadoop components
- Configured and implemented Apache Hadoop technologies i.e., Hadoop distributed file system (HDFS), MapReduce framework, Pig, Hive, Sqoop, Flume.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Imported and exported data between NoSQL and HDFS
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitoring workload, job performance and capacity planning using Apache Hadoop.
- Experienced in managing and reviewing the Hadoop log files.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Worked on different file formats like Sequence files, XML files and Map files using MapReduce Programs.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Worked on Oozie workflow engine for job scheduling.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Pig Scripts.
Environment: Teradata, Hadoop, HDFS, Kerberos, HBase, MapReduce, Hive, Oozie, Sqoop, Pig, Flume, Ganglia, Nagios, Java, Avro, JSON, Rest API, NoSQL, CDH3, CDH4.
Confidential,Gaithersburg, MD
Hadoop Admin / Developer
Responsibilities:- Importing data from relational data stores to Hadoop using Sqoop.
- Creating various MapReduce jobs for performing ETL transformations on the transactional and application specific data sources.
- Wrote and executed big data analysis using PIG scripts using Grunt Shell User defined functions (UDF).
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra and configured Kafka to read and write messages from external programs.
- Created Reports and Dashboards using structured and unstructured data.
- Performed joins, group by and other operations in MapReduce by using Java and PIG.
- Processed the output from PIG, Hive and formatted it before sending to the Hadoop output file.
- Used HIVE definition to map the output file to tables.
- Setup and benchmarked Hadoop/HBase clusters for internal use.
- Wrote data infesters and map reduce programs.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
- Wrote MapReduce/HBase jobs.
- Worked with HBASE NOSQL database.
- Experienced in using Zookeeper and OOZIE Operational Services for coordinating the cluster and scheduling workflows. Involved in Unit testing and delivered Unit test plans and results documents.
- Experienced in analysing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
- Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.
- Performed Sqooping for various file transfers through the HBase tables for processing of data to several NoSQL DBs- Cassandra, MangoDB.
- Created tables, secondary indices, join indices viewed in Teradata development environment for testing.
- Captured data logs from web server into HDFS using Flume & for analysis.
- Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files
- Analysed the SQL scripts and designed the solution to implement using Scala
Environment: Hadoop, Java, UNIX, Shell Scripting, XML, XSLT, HDFS, HBase, Cassandra, NOSQL, MapReduce, Hive, PIG.
Confidential
Java Developer
Responsibilities:- Designed Java Servlets and Objects using J2EE standards.
- Involved to develop Multithreading for improving Cpu time.
- Used Multithreading to simultaneously process tables as and when a specific user data is completed in one table.
- Used JDBC calls in the Enterprise Java Beans to access Oracle Database.
- Involved in design and development of rich internet applications using Flex, Action Script and Java.
- Design and development of Web pages using HTML 4.0, CSS including Ajax controls and XML.
- Worked closely with Photoshop designers to implement mock-ups and the layouts of the application.
- Played a vital role in defining, implementing and enforcing quality practices in the team organization to ensure internal controls, quality and compliance policies and standards.
- Used JavaScript1.5 for custom client-side validation.
- Involved in designing and developing the GUI for the user interface with various controls.
- Worked with View State to maintain data between the pages of the application.
Environment: Core Java, JavaBeans, HTML 4.0, CSS 2.0, PL/SQL, MySQL 5.1, JavaScript 1.5, Flex, AJAX and Windows
Confidential
Linux Administrator
Responsibilities:- Creating, cloning Linux Virtual Machines.
- Administration, package installation, configuration of Oracle Enterprise Linux 5.x.
- Managing systems routine backup, scheduling jobs like disabling and enabling cron jobs, enabling system logging, network logging of servers for maintenance, performance tuning, testing.
- Tech and non-tech refresh of Linux servers, which includes new hardware, OS, upgrade, application installation, testing.
- Administration of RHEL, which includes installation, testing, tuning, upgrading and loading patches, troubleshooting both physical and virtual server issues.
- Set up user and group login ID's, printing parameters, network configuration, password, resolving permissions issues, and user and group quota.
- Creating physical volumes, volume groups, and logical volumes.
- Gathering requirements from customers and business partners and design, implement and provide solutions in building the environment. installing Red Hat Linux using Kick start and applying security policies for hardening the servers based on company policies