- Over 7+ years of professional IT experience including 4+ years in Big data ecosystem related technologies. Expertise in Big Data technologies as consultant, proven capability in project based teamwork and also as an individual developer with good communication skills.
- Excellent understanding / knowledge of Hadoop architecture and various c omponents such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Experience in working with Hadoop clusters using AWS EMR, Cloudera (CDH5), MapR and HortonWorks Distributions .
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce(MR), HDFS, HBase, Oozie, Hive, Sqoop, Pig and Flume .
- Hands - on development and implementation experience in Big Data Management Platform (BMP) using HDFS, Map Reduce, Hive, Pig, Oozie, Apache Kite an d other Hadoop related eco-systems as a Data Storage and Retrieval systems .
- Performed importing and exporting data into HDFS and Hive using Sqoop .
- Experience in managing and reviewing Hadoop log files.
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Extending Hive and Pig core functionality by writing UDFs.
- Good experience installing, configuring, testing Hadoop ecosystem components.
- Highly knowledgeable in Writer Comparable, Writer interfaces, Mapper and Reducer abstract classes, Hadoop Data Objects such as IntWritable, ByteWritable, Text objects .
- Well-experienced Mapper, Reducer, Combiner, Partitioner, Shuffling and Sort process along with Custom Partitioning for efficient Bucketing.
- Good experience in writing PIG and Hive UDF’s to solve the purpose of util classes.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Experience in installation, configuration, supporting and managing - Cloud Era’s Hadoop platform along with CDH4&5 clusters.
- Hands on experie nce in Agile and Scrum methodologies .
- Extensive experience in working with the Customers to gather required information to analyze, provide data fix or code fix for technical problems, and providing Technical Solution documents for the users.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Worked on multiple stages of Software Development Life Cycle including Development, Component Integration, Performance Testing, Deployment and Support Maintenance .
- Have flair to adapt to new software applications and products, self-starter, have excel lent communication skills and good understanding of business work flow.
- Expertise in Object-oriented analysis and programming(OOAD) like UML and use of various design patterns
- Working knowledge in SQL, PL/SQL, Stored Procedures, Functions,Packages, DB Triggers and Indexes.
- Good experience in design the jobs and transformations and load the data sequentially & parallel for initial and incremental loads.
- Good experience in using various PDI / Kettle (Pentaho Data Integrator)steps in cleansing and load the data as per the business needs
Languages: C,C++,JAVA, SQL and PL/SQL
Big Data Framework and Eco Systems: Hadoop, MapReduceHive, Pig, HDFS, Zookeeper, Sqoop, Apache Crunch, Oozie and Flume
No SQL: Cassandra, HBase and MemBase
Databases: Oracle 8i/9i/10g/11g, MySQL, PostGre SQL and MS-Access
Operating Systems: Windows XP/2000/NT, Linux, UNIX
Tools: Ant, Maven, TOAD, AgroUML, WinSCP, Putty, Lucene
IDE Tools: Eclipse 4.x, Eclipse RCP, NetBeans 6, Editplus
Version Control Tools: CVS, SVN
ETL Tools: PDI / Kettle (Pentaho Data Integration)
- Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology
- Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer behavioural data and purchase histories into HDFS for analysis.
- Developed job flows in Oozie to automate the workflow for extraction of data from warehouses and weblogs.
- Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS
- Written Hive queries to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting .
- Experienced in managing and reviewing the Hadoop log files.
- Working with Apache Crunch library to write, test and run HADOOP MapReduce pipeline jobs.
- Involved in joining and data aggregation using Apache Crunch.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Worked on Oozie workflow engine for job scheduling.
- Developed custom implementation for Partioner, Input / Output Formats, Record Reader and Writers.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Loaded the aggregated data onto DB2 for reporting on the dashboard.
- Monitoring and Debugging Hadoop jobs/Applications running in production.
- Worked on Providing User support and application support on Hadoop Infrastructure.
- Reviewing ETL application use cases before on boarding to Hadoop.
- Worked on Evaluating, comparing different tools for test data management with Hadoop.
- Helped and directed testing team to get up to speed on Hadoop Application testing.
- Worked on Installing 20 node UAT Hadoop cluster.
- Created ETL jobs to generate and distribute reports from MySQL database using Pentaho Data Integration.
- Created ETL jobs using Pentaho Data Integration to handle the maintenance and processing of data.
- Created ETL jobs using Pentaho Data Integration to handle the generation and distribution of reports.
Environment: JDK1.6, RedHat Linux, HDFS, Mahout, Map-Reduce, Apache Crunch, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie, DB2, HBase and Pentaho.
- Developed Map Reduce jobs in java for data cleansing and pre-processing.
- Moving data from Oracle to HDFS and vice-versa using SQOOP .
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Worked with different file formats and compression techniques to determine standards
- Developed Hive queries and UDF’s to analyze/transform the data in HDFS.
- Developed Hive scripts for implementing control tables logic in HDFS.
- Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
- Developed Pig scripts and UDF’s as per the Business logic.
- Importing log files using Flume into HDFS and load into Hive tables to query data.
- Developed pig scripts to convert the data from Avro to Text file format.
- Developed hive scripts for implementing control tables logic in HDFS.
- Developed Sqoop commands to pull the data from Teradata.
- End to End implementation with AVRO and Snappy .
- Analyzing/Transforming data with Hive and Pig .
- Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis.
- Designed and developed read lock capability in HDFS.
- Implemented Hive UDF Float equivalent to the Oracle Decimal .
- Involved in End to End implementation of ETL logic.
- Effective coordination with offshore team and managed project deliverable on time.
- Worked on QA support activities, test data creation and Unit testing activities.
Environment: JDK1.6, CDH4.x, Hive, Pig, MapReduce, Oozie, Oracle, Sqoop, Flume, Avro.
- Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Designed and developed Oozie workflows for sequence flow of jod execution.
- Mainly working on handling of BigData Analytics and infrastructure of Hadoop, MapReduce
- Got good experience with NoSQL database.
- Performed Map Reduce Programs those are running on the cluster.
- Installed and configured Hive and also written Hive UDFs.
- Implemented CDH3 Hadoop cluster.
- Installing cluster, monitoring/administration of cluster recovery, capacity plan ning, and slots configuration.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Implemented best income logic using Pig scripts.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the Business Intelligence (BI) team.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Writing Hadoop MR programs to get the logs and feed into Cassandra for Analytics purpose
- Building, packaging and deploying the code to the Hadoop servers.
- Unix Scripting to manage the Hadoop Operation stuffs.
- Written Puppet program for installation and configuration of Cloudera Hadoop CDH3u1.
Environment: JDK 1.6, Hadoop, MapReduce, HDFS, Hive, Java, SQL,Datameter, PIG, Sqoop, CentOS,Cloudera.
- Designed and developed Struts like MVC 2 Web framework using the front-controller design pattern, which is used successfully in a number of production systems.
- Normalized Oracle datab ase, conforming to design concepts and best practices.
- Maintained records in Excel Spread Sheet and exported data into SQL Server Database using SQL Server Integration Services (SSIS).
- Experience in providing Logging, Error handling by using Event Hand ler, and Custom Logging for SSIS Packages.
- Resolved product complications at customer sites and funneled the insights to the development and deployment teams to adopt long term product development strategy with minimal roadblocks.
- Convinced business us ers and analysts with alternative solutions that are more robust and simpler to implement from technical perspective while satisfying the functional requirements from the business perspective.
- Applied design patterns and OO design concepts to improve the existing Java/JEE based code base.
- Identified and fixed transactional issues due to incorrect exception handling and concurrency issues due to unsynchronized block of code.
Environment: Java 1.2/1.3, Swing, Applet, Servlets, JSP, custom tags, JNDI, JDBC, XML, XSL, DTD, HTML, CSS, Java Script, Oracle, DB2, PL/SQL, Weblogic, JUnit, Log4J and CVS.