- 7 years of overall IT experience and 3 Years of comprehensive experience as Apache Hadoop Developer. Expertise in writing Hadoop Jobs for analyzing data using Hive, Pig and oozie.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Experience in working with MapReduce programs using Hadoop for working with Big Data.
- Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including PIG, HIVE, Sqoop, Oozie, Flume and zookeeper.
- Experience in providing support to data analyst in running Pig and Hive queries.
- Developed Map Reduce programs to perform analysis.
- Performed Importing and exporting data into HDFS and Hive using Sqoop.
- Experience in writing shell scripts to dump the Sharded data from MySQL servers to HDFS.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Experience in setting up Infiniband network and build Hadoop cluster to improve the map reduce performance.
- Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
- Experience in automating the Hadoop Installation, configuration and maintaining the cluster by using the tools like puppet.
- Experience in working with flume to load the log data from multiple sources directly into HDFS.
- Strong debugging and problem solving skills with excellent understanding of system development methodologies, techniques and tools.
- Worked in complete Software Development Life Cycle analysis, design, development, testing, implementation and support in different application domain involving different technologies varying from object oriented technology to Internet programming on Windows NT, Linux and UNIX/ Solaris platforms and RUP methodologies.
- Familiar with RDBMS concepts and worked on Oracle 8i/9i, SQLServer 7.0., DB2 8.x/7.x
- Involved in writing shell scripts, Ant scripts for Unix OS for application deployments to production region.
- Exceptional ability to quickly master new concepts and capable of working in-group as well as independently with excellent communication skills.
Hadoop Ecosystem: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Hbase, Oozie, Zookeeper, YARN
Programming Languages: Java, SQL, PL/SQL, Unix Shell Scripting, and Perl
Framework: Hibernate 2.x/3.x, Spring 2.x/3.x,Struts 1.x/2.x and JPA
Operating Systems: UNIX, Windows, LINUX
Application Servers: IBM WebSphere, Tomcat, WebLogic
Databases: Oracle 8i/9i/10g, Microsoft SQL Server, DB2 MySQL 4.x/5.x
Java IDE: Eclipse 3.x, IBM Web Sphere Application Developer, IBM RAD 7.0
Tools: TOAD, SQL Developer, SOAP UI, ANT, Maven, Visio, Rational Rose, JSON and Angular
- Working with the administrator to setup Hadoop clusters.
- Installed and configured Hadoop, MapReduce, and HDFS. Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in defining job flow using Oozie.
- Experienced in managing and reviewing Hadoop Log files.
- Extracted files from Teradata through Sqoop and placed in HDFS and processed.
- Experienced in running Hadoop streaming jobs to process terabytes of XML format data.
- Load and transform large sets of structured, semi-structured and unstructured data.
- Supported MapReduce Programs that are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Involved in creating Hive tables, loading them with data, and writing Hive queries which ran internally in MapReduce.
- Gained very good business knowledge on banking and finance, fraud detection, and general ledger reporting.
Environment: Hadoop, MapReduce, HDFS, Hive, Java JDK 1.6, Cloudera, MapR, Teradata, Flat files, Oracle 11g, PL/SQL, Unix Shell Scripting.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Developed pig scripts to transform the data into structured format and automated through Oozie coordinators.
- Developed Hive queries for Analysis across different banners.
- Loading the data from the different Data sources like Teradata and DB2 into HDFS using sqoop and load into Hive tables, which are partitioned.
- Developed Hive UDF's to bring all the customers email id into a structured format.
- Developed bash scripts to bring the Tlog files from ftp server and then processing it to load into hive tables.
- All the bash scripts are scheduled using Resource Manager Scheduler.
- Moved data from HDFS to Cassandra using Map Reduce and BulkOutputFormat class.
- Developed Map Reduce programs for applying business rules on the data.
- Developed and executed hive queries for denormalizing the data.
- Supported Data Analysts in running Map Reduce Programs.
- Worked on importing and exporting data into HDFS and Hive using Sqoop.
- Worked on analyzing data with Hive and Pig.
- Experience in Implementing Rack Topology scripts to the Hadoop Cluster.
- Manage the day-to-day operations of the cluster for backup and support.
- Design and develop Mapreduce/Yarn programs
- Worked on analysing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop
- Analysis of user click stream data through google analytics, Good Data.
- Implement Flume, Spark, Spark Stream framework for real time data processing.
- Developed Social Media component to store data in Cassandra.
- Develop analytical component using Scala
- Cluster coordination services through Zookeeper
- Integration of Hadoop with BI tools like Good Data, Tableau.
- Developed Java Restful API.
- Implemented Solar search
- Setup and monitoring of Hadoop deveopment envirmonment
- Setup Hadoop Cluster on Amazon EC2
- Data Refinement through Pig and HIVE.
- Data migration between RDBMS and Hadoop through sqoop
Environnent: Java, Scala, Python, Play, J2EE, Ruby on Rails, Hadoop, Cloudera, CDH5, Yarn, HBase, Cassandra, Hive, Sqoop, MySQL, Android, NDK. Github, Micro Strategy, Tableau, Linux, Mac, Android, Windows 7 32/64bit.