- Over all 8 years of experience in Information Technology which includes in Big data and Hadoop Ecosystem.
- 5+ years of experience with Big data, Hadoop, HDFS, Map Reduce and Hadoop Ecosystem (Pig & Hive) technologies.
- Hands on experience in configuring HDFS and Hadoop ecosystem components like HBase, Solr, Hive, Tez, Sqoop, Pig, Flume, Oozie, Zookeeper etc.
- Have hands on experience in writing MapReduce jobs using Java.
- Hands on experience in writing pig Latin scripts and Hive Query Language.
- Experience in database development using SQL and PL/SQL and experience working on databases like Oracle, Informix and SQL Server.
- Upgrading the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications.
- Experience with all aspects of development from initial implementation and requirement discovery, through release, enhancement and support SDLC & Agile techniques.
- Expert in performing Data Analysis, Gap Analysis, Co - ordinate with the business, Requirement gathering and technical documents preparation. Experience in multiple distributions i.e. Horton works, cloudera etc.
- Hands on experience on build tools like Hudson/Jenkins, Maven, Ant and Virtualization and Containers (Docker) and Hypervisors ESXI, ESX.
- Have continuous learning approach, learned and practices newer tools like Solr, Elastic search, Kibana, Lucerne, Spotfire.
- Experience working on NoSQL databases like HBase and knowledge in Cassandra, Redis, MongoDB.
- Experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- In depth knowledge of Cassandra and hands on experience with installing, configuring and monitoring DataStax Enterprise cluster.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Experience of Hadoop Architecture and various components such as HDFS, Name Node, Data Node, Job Tracker, Task Tracker, YARN and Map Reduce.
- Experience in Spring REST API to get the data from HBase.
- Expertise in developing numerous multi-tier components including Business Components (EJB), Presentation tier components (Servlets and JSP), Database programming PL/SQL and JDBC.
- Developed a data pipeline using Kafka, HBase, Mesos Spark and Hive to ingest, transform and analyzing customer behavioral data
- Working experience in databases such as Oracle, SQL Server, Sybase and DB2 in the areas of Object-Relational DBMS Architecture, physical and logical structure of database, Application Tuning and Query optimization.
- Worked on creating Virtual machines using VMware and CHP software.
- Handle Virtual Machine Migrations from Azure Classic to Azure Rm using power shell
- Experience with installation and configuration of Web sphere, Web Logic, Tomcat and deployment of 3-tier applications.
- Proficient in SQL and PL/SQL using Oracle, DB2, Sybase and SQL Server.
- Installed, Configured Talend ETL on single and multi-server environments
- Created standard and best practices for Talend ETL components and jobs.
- Effective team player and excellent communication skills with insight to determine priorities, schedule work and meet critical deadlines.
- Strong technical and architectural knowledge in solution development.
- Effective in working independently and collaboratively in teams.
- Good analytical, communication, problem solving and interpersonal skills.
- Flexible and ready to take on new challenges.
Big Data Ecosystem: Hadoop, Map Reduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Spark, Oozie, Flume, Impala, Tez, Kafka, Storm, Solar, Flume, Hcatalog, yarn, Cassandra, And Mesos.
Big Data Distributions: Horton Works, Cloudera, Apache
Programming: Java, C, C++, Python, SAS, R and PL/SQL
J2EE Technologies: J2EE (EJB 2.0/3.0, JSP2.0 and SERVLETS 2.3) J2EE Design Patterns, UML, Log4j, JMS, Spring REST and JDBC
Database: Oracle 10g, DB2, SQL, No sql (MongoDB, Cassandra, HBase)
Web/App Server: WebSphere Application Server 7.0, Apache Tomcat 5.x 6.0, Jboss 4.0
ETL: Talend and Informatic 9.x/8.X (Integration Service / Power Center) IWX (Info works)
Messaging Systems: JMS, Kafka and IBM MQ Series
Version Tools: Git, SVN, and CVS
Analytics: Tableau, SPSS, SAS EM and SAS JMP
Scripts: Shell, Python, Maven and ANT
OS & Others: Windows, Linux, IBM Http Server, SVN, Clear Case, Putty, WinSCP and FileZilla
Cloud(AWS): AWS (EC2, S3, CloudWatch, RDS, ElastiCache, ELB IAM), Microsoft Azure, Rackspace, Openstack, CloudFoundry.
Confidential, Dallas, TX
Sr. Hadoop Developer
- Data is ingested from sources like Oracle and DB2, performed data transformations and then export the transformed data to Cubes as per the business requirement.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from Oracle, DB2 and Teradata into HDFS using Sqoop.
- Involved in creating Hive tables, loading with data using HQL scripts which will run internally in map reduce way
- Written customized Hive UDFs in Java where the functionality is too complex.
- Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
- Run various Hive queries on the data dumps and generate aggregated datasets for downstream systems for further analysis.
- Applied windowing functions, aggregations, time and date functions on the data as per the business logic.
- Developed dynamic partitioned Hive tables and store data by timestamp, source type for efficient performance tuning.
- Scheduled sqoop ingestions and Hive transformations (Hql scripts) using Oozie, Maestro schedulers
- Worked with different File Formats like TEXTFILE and ORC for HIVE querying and processing.
- Worked on implementing the streaming data ingestion framework where the data is being pushed by kafka as messenger service to HDFS.
- Experienced in Querying data using SparkSQL on top of Spark Engine, implementing Spark RDD's in Scala.
- Worked on Apache spark writing python applications to convert txt, xls files and parse.
- Experience in integrating Apache Kafka with Apache Spark for real time processing.
- Used NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Performed various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.
Environment: HDFS, Hive, Map Reduce, Java, HBase, Pig, Sqoop, Oozie, MySQL, SQL Server, Windows and Linux.
Confidential, Gresham, OR
- Develop JAVA MapReduce Jobs for the aggregation and interest matrix calculation for users.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way
- Experienced in managing and reviewing application log files.
- Ingest the application logs into HDFS and processes the logs using map reduce jobs.
- Create and maintain Hive warehouse for Hive analysis.
- Generate test cases for the new MR jobs.
- Involved in the pilot of Hadoop cluster hosted on Amazon Web Services (AWS)
- Run various Hive queries on the data dumps and generate aggregated datasets for downstream systems for further analysis.
- Developed dynamic partitioned Hive tables to store data by date and workflow id partition.
- Use Apache Scoop to dump the user incremental data into the HDFS on a daily basis.
- Run clustering and user recommendation agents on the weblogs and profiles of the users to generate the interest matrix.
- Experience with creating ETL jobs to load JSON data and server data into MongoDB and transformed MongoDB into the Data Warehouse.
- Worked on installing and configuring EC2 instances on Amazon Web Services (AWS) for establishing clusters on cloud.
- Installed and configured Hive and also written Hive UDFs in java and python
- Prepare the data for consumption by formatting it for upload to the UDB system.
- Lead & Programmed the recommendation logic for various clustering and classification algorithms using JAVA.
- Involved in migration Hadoop jobs into higher environments like SIT, UAT and Prod.
Environment: Hortonworks Hadoop, MapReduce, MongoDB HDFS, Hive, Java, SQL, PL/SQL, Scala, Cassandra, Pig, Sqoop, Oozie, Zookeeper, Teradata, MySQL, Windows, Oozie, HBase
Confidential, Bridgewater, NJ
- Analyzing the functional specs provided by the client and developing detailed solution design document with the Architect and the team.
- Discussing with the client business teams to confirm the solution design and changing the requirements if needed.
- Responsible for building scalable distributed data solutions using Hadoop.
- Installed and configured Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from Oracle into HDFS using Sqoop.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN.
- Used Pig UDF's to implement business logic in Hadoop.
- Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
- Loaded the dataset into Hive for ETL (Extract, Transfer and Load) operation.
- Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Worked on reading multiple data formats on HDFS using Scala.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, Spark and loaded data into HDFS.
- Create Hive DDL’s on top of Hbase tables as requested by the source team.
- Created schema check for hive tables against corresponding Hbase tables.
- Using Jenkins to build the code and send the nexus link to QA team.
Environment: Java 8, Apache Hadoop 2.6.0, Hive 1.2.1000, Sqoop 1.4.6, Hbase 1.1.2, Pig 0.14.0, Oozie 4.1.0, Storm 0.9.3, Oracle 11g, Oracle, shell script, Tomcat 7, YARN, Spring 3.2.3, Ambari, Java Script, JSON, XML, XSLT, XPath, SSMS, Tera Data SQL Assistant, Eclipse, Kafka, STS 3.8.
Java J2EE Developer
- Coded the business methods according to the IBM Rational Rose UML model.
- Used Apache log 4j Logging framework for logging of trace and Auditing.
- Extensively used Core Java, Servlets, JSP and XML.
- Used Struts 1.2 in presentation tier.
- Used IBM Web-Sphere Application Server.
- Generated the Hibernate XML and Java Mappings for the schemas
- Used DB2 Database to store the system data
- Used IBM Rational ClearCase as the version controller.
- Used Rational Application Developer (RAD) as Integrated Development Environment (IDE).
- Used unit testing for all the components using JUnit.
Environment: s: Java 1.6, log4j, IBM Rational Application Developer (RAD) 6, AJAX, Rational Clear case, Web sphere 6.0, iText, Rational Rose, Oracle 9i, JSP, Struts1.2, Servlets.
Java/ J2EE Developer
- Worked as Research Assistant and a Development Team Member
- Coordinated with Business Analysts to gather the requirement and prepare data flow diagrams and technical documents.
- Identified Use Cases and generated Class, Sequence and State diagrams using UML.
- Used JMS for the asynchronous exchange of critical business data and events among J2EE components and legacy system.
- Worked in Designing, coding and maintaining of Entity Beans and Session Beans using EJB 2.1 Specification
- Worked in the development of Web Interface using MVC Struts Framework.
- User Interface was developed using JSP and tags, CSS, HTML and Java Script.
- Database connection was made using properties files.
- Used Session Filter for implementing timeout for ideal users.
- Used Stored Procedure to interact with database.
- Development of Persistence was done using DAO and Hibernate Framework.
- Used Log4j for logging.
Environment: J2EE, Struts1.0, Java Script, CSS, HTML, XML, XSLT, DTD, Junit, EJB, Oracle, Tomcat, Eclipse, Web logic 7.0/8.1