- Over 9 years of professional IT experience with expertise in Java, J2EE, Hadoop and Big data ecosystem related technologies.
- 4 years of exclusive experience in Big Data technologies and Hadoop ecosystem components like Spark, MapReduce, Hive, Pig, YARN, HDFS, NoSQL systems like HBase, Cassandra, Oozie, Sqoop, Flume, Zookeeper, Hue and Kafka.
- Strong Knowledge on Architecture of Distributed systems and Parallel processing frameworks.
- In - depth understanding of MapReduce Framework and Spark execution model.
- Worked extensively on fine-tuning long running Spark Applications to utilize better parallelism and executor memory for more caching.
- Strong experience working with both batch and real-time processing using Spark framework.
- Hands on experience in installing, configuring and deploying Hadoop distributions in cloud environments (Amazon Web Services).
- Experience in optimizing Map-Reduce algorithms by using Combiners and Custom partitioners.
- Expertise in back-end/server-side java technologies such as: Web services, Java persistence API (JPA), Java Messaging Service (JMS), Java Database Connectivity (JDBC)
- Strong knowledge on performance tuning Hive queries and troubleshoots distinct kinds issues in Hive.
- Experience in NoSQL Column-Oriented Databases like HBase, Apache Cassandra, MongoDB and its Integration with Hadoop cluster.
- Experienced in writing custom Map Reduce programs & UDF's in Java to extend Hive and Pig core functionality.
- Extensive experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.
- Created Talend Mappings to populate the data into dimensions and fact tables.
- Broad design, development and testing experience with Talend Integration Suite and knowledge in Performance Tuning of mappings.
- Worked with Sqoop to move (import/export) data from a relational database into Hadoop.
- Experience working with Hadoop clusters using Cloudera, Amazon AWS and Horton works distributions.
- Experience in installation, configuration, support and management of a Hadoop Cluster.
- Knowledge in UNIX Shell Scripting for automating deployments and other routine tasks.
- Experienced in using agile methodologies including extreme programming, SCRUM and Test-Driven Development (TDD).
- Experience in creating Hive tables with different file formats like Avro, Parquet, ORC.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Mastered in using different columnar file formats like Avro, RCFile, ORC and Parquet formats.
- Proficient in integrating and configuring the Object-Relation Mapping tool, Hibernate in J2EE applications and other open source frameworks like Struts and Spring.
- Experience in building and deploying web applications in multiple applications servers and middleware platforms including Web logic, Web sphere, Apache Tomcat, JBoss.
- Experience in writing test cases in Java Environment using JUnit.
- Hands on experience in development of logging standards and mechanism based on Log4j
- Experience in building, deploying and integrating applications with ANT, Maven.
- Good knowledge in Web Services, SOAP programming, WSDL, and XML parsers like SAX, DOM, AngularJS, Responsive design/Bootstrap.
- Flexible, enthusiastic and project-oriented team player with excellent communication skills with leadership abilities to develop creative solutions for challenging requirement of client.
Big Data Ecosystem: Hadoop, Map Reduce, YARN, HDFS, HBase, Zookeeper, Hive, Hue, Pig, Sqoop, Spark, Oozie, Storm, Flume, Talend, Cloudera Manager, Amazon AWS, NiFi, Apache Ambari, Zookeeper, Hortonworks, Impala, Redshift, Airflow, Phoenix, Pachyderm, Tableau
Languages: C, C++, Java, Advanced PL/SQL, Pig Latin, Python, HiveQL, Scala, SQL
Java/J2EE: J2EE, Servlets, JSP
Frame works: Struts, Spring 3.x, ORM (Hibernate), JPA, JDBC
Web Services: SOAP, Restful, JAX-WS
Web Servers: Web Logic, Web Sphere, Apache Tomcat, Glassfish 4.0
Scripting Languages: Shell Scripting, Java script.
Database: Oracle 9i/10g, Microsoft SQL Server, MySQL, DB2, Teradata, PostgreSQL
NOSQL Data Base: MongoDB, Cassandra, HBase
IDE & Build Tools: NetBeans, Eclipse, ANT, Jenkins and Maven.
Version Control System: GITHUB, CVS, SVN.
Confidential, Santa Monica, CA
Sr. Hadoop Developer
- Developed series of data ingestion jobs for collecting the data from multiple channels and external applications.
- Worked on both batch and streaming ingestion of the data.
- Worked on batch processing and stream processing of data using Spark and Spark Streaming.
- Worked with Kafka extensively for writing the streaming data to Kafka topics.
- Imported data from S3 and performed various data transformations and actions using Spark RDD API and Spark-SQL API.
- Worked on developing Oozie workflows to automate the data pipelines.
- Worked on ingesting data from SQL-SERVER to S3 using Sqoop with in AWS EMR.
- Migrated Map-reduce jobs to Spark applications and integrated with Apache Phoenix and HBase.
- Involved in loading and transforming large sets of data and analyzed them using Hive Scripts.
- Created Tables in Google cloud using Big Query. Extracting the data from the Google Cloud for reporting on Tableau.
- Loaded portion of processed data into Redshift tables and automated the process.
- Worked on migrated Oozie workflows into Apache Airflow DAGs.
- Worked on various performance optimizations in spark like using distributed cache, dynamic allocation, proper resource allocations and custom Spark UDFs.
- Worked on fine tuning long running hive queries by utilized proven standards like using Parquet Columnar format, partitioning , vectorized execution etc.,
Environment: Hadoop 2.x, Pig, HDFS, Scala, Spark, Apache Airflow, Kafka, Sqoop, HBase, Oozie, Java, Maven, IntelliJ, HBase, Putty, AWS EMR, S3, RedShift, Tableau.
Confidential, Denver, CO
Sr. Hadoop Developer
- Created custom input adapters for pulling the raw click stream data from FTP servers and AWS S3 buckets.
- Created Kafka producers for streaming real time click stream events from third party Rest services into our topics.
- Developed Spark streaming applications for consuming the data from Kafka topics.
- Implemented Spark batch applications using Scala for performing various kinds of cleansing, de-normalization and aggregations on hourly click stream logs.
- Worked on automation of delta feeds from Teradata using Sqoop.
- Implemented Hive tables and HQL Queries for the reports. Written and used complex data type in Hive.
- Worked on NiFi for tracking the data from ingestion to aggregation.
- Successfully loaded files to HDFS from Teradata and loaded from HDFS to HIVE.
- Implemented Kafka, Spark streaming and HBase for establishing real time pipeline.
- Apache Storm to process this data from Kafka and eventually persist that data into HDFS and HBase.
- Responsible for troubleshooting and maintaining the accuracy of the jobs running in production.
- Used Sqoop to import the data from databases to Hadoop Distributed File System (HDFS) and performed automated data auditing to validate the accuracy of the loads.
- Involved in loading and transforming large sets of data and analyzed them by running Hive queries.
- Scheduled and executed workflows in Oozie to run Hive and Pig jobs.
- Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HBase.
- Implemented daily workflow for extraction, processing and analysis of data with Oozie .
- Created components like Hive UDFs for missing functionality in HIVE for analytics.
- Created HBase tables to store various data formats coming from different portfolios.
- Worked on running spark job using Maven dependencies.
Environment: Hadoop 2.x, Hive, HDFS, Scala, Spark, NiFi, Storm, Kafka, Sqoop, HBase, Oozie, Java, Maven, Eclipse, Cassandra, Putty, CDH 5.7
Confidential, Los Angeles, CA
Sr. Hadoop Developer
- Involved in requirement analysis, design, coding and implementation.
- Processed data into HDFS by developing solutions, analyzed the data using Map Reduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data on to HDFS.
- Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Talend Integration Suite.
- Implemented Change Data Capture technology in Talend to load deltas to a Data Warehouse.
- Responsible for writing Map Reduce programs.
- Established custom Map Reduces programs in order to analyze data and used Pig Latin to clean unwanted data.
- Auto Populate HDFS with data coming from Flume sink
- Create/Modify Shell scripts for scheduling data cleansing scripts and ETL loading process.
- Created tables, views in Teradata , according to the requirements.
- Implemented Python scripts for auto deployments in AWS.
- Developed Hive queries to analyze reducer output data.
- Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
- Used Sqoop to import the data to Hadoop Distributed File System (HDFS) from RDBMS.
- Created components like Hive UDFs for missing functionality in HIVE for analytics.
- Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
- Used IMPALA to analyze the data present in Hive tables.
- Worked on Teradata Query man to validate the data in warehouse for sanity check
- Designed and developed REST web service for validating address.
- Writing the recurring workflows using Oozie to automate the scheduling flow.
- Addressing the issues occurring due to the huge volume of data and transitions.
- Migration of database objects from previous versions to the latest releases using latest data pump methodologies, when the solution was upgraded.
- Worked on ingesting the data from Amazon S3 buckets to podium data repository.
- Build and run our RESTful web services using maven repository.
- Setup Jenkins on Amazon EC2 servers and configured the notification server to Jenkin server for any changes to the repository.
- Supported in Production rollout which includes monitoring the solution post go-live and resolving any issues that are discovered by the client and client services teams.
- Designed, documented operational problems by following standards and procedures using JIRA.
Environment:: HDFS, Hadoop, Pig, Hive, HBase, Sqoop, Talend, Flume, Map Reduce, Podium Data, Oozie, Java 6/7, Oracle 10g, YARN, UNIX Shell Scripting, SOAP, REST services, Oracle 10g, Maven, Agile Methodology, JIRA, Auto Sys.
Confidential, St Louis, MO
- Installed and configured Hadoop Ecosystem components and Cloudera manager using CDH distribution.
- Developed multiple Map Reduce jobs in Java for complex business requirements including data cleansing and preprocessing.
- Developed Sqoop scripts to import/export data from Oracle to HDFS and into Hive tables.
- Worked on analyzing Hadoop clusters using Big Data Analytic tools including Map Reduce, Pig and Hive.
- Involved in developing and writing Pig scripts and to store unstructured data into HDFS.
- Involved in creating tables in Hive and writing scripts and queries to load data into Hive tables from HDFS.
- Scripted complex Hive QL queries on Hive tables for analytical functions.
- Optimized the Hive tables utilizing improvement techniques like partitions and bucketing to give better execution Hive QL queries.
- Worked on Hive/Hbase vs RDBMS, imported data to hive, created internal and external tables, partitions, indexes, views, queries and reports for BI data analysis.
- Developed Java custom record reader, partition and serialization techniques.
- Used different data formats (Text format and Avro format) while loading the data into HDFS.
- Created tables in HBase and loading data into HBase tables.
- Developed scripts to load data from HBase to Hive Meta store and perform Map Reduce jobs.
- Created custom UDF’s in Pig and Hive.
- Created partitioned tables and loaded data using both static partition and dynamic partition methods.
- Installed Oozie workflow engine and scheduled it to run data/time dependent Hive and Pig jobs
- Designed and developed Dashboards for Analytical purposes using Tableau.
- Ran JSON scripts using Java with Maven repository.
- Used Jenkins for mapping the maven and the source tree.
- Analyzed the Hadoop log files using Pig scripts to oversee the errors.
Environment: HDFS, Map Reduce, Hive, Sqoop, Pig, HBase, Oozie, CDH distribution, Java, Eclipse, Shell Scripts, Tableau, Windows, Linux.
Confidential, Boston, MA
Java/ J2EE Developer
- Performed requirement gathering, design, coding, testing, implementation and deployment.
- Worked on modeling of Dialog process, Business Processes and coding Business Objects, Query Mapper and JUnit files.
- Installed & Configured Oracle Golden gate 11g using Integrated Extracts & Replicates.
- Monitored the Oracle Golden Gate processes and checking the performance using the Golden Gate Director.
- Involved in the design and creation of Class diagrams, Sequence diagrams and Activity Diagrams using UML models
- Created the Business Objects methods using Java and integrating the activity diagrams.
- Involved in developing JSP pages using Struts custom tags, jQuery and Tiles Framework.
- Worked in web services using SOAP, WSDL.
- Wrote Query Mappers and MQ Experience in JUnit Test Cases.
- Managed software configuration using Clear Case and SVN.
- Design, develop and test features and enhancements.
- Performed error rate analysis of production issues and technical errors.
- Developed test environment for testing all the Web Service exposed as part of the core module and their integration with partner services in Integration test.
- Analyze user requirement document and develop test plan, which includes test objectives, test strategies, test environment, and test priorities.
- Assisted with the development of the call center's operations, quality and training processes. (Enterprise Contact Center Services )
- Responsible for performing end-to-end system testing of application writing JUnit test cases
- Perform Functional testing, Performance testing, Integration testing, Regression testing, Smoke testing and User Acceptance Testing (UAT).
- Used Jenkins for building and configuring the Java application using Maven.
- Converted Complex SQL queries running at mainframes into pig and Hive as a part of a migration from mainframes into Hadoop cluster.
Sr. Java Developer
- Developed MVC design pattern based User Interface using JSP, XML, Prime faces 5.1, HTML, CSS and Struts.
- Involved in the design and development phases of Scrum Agile Software Development.
- Responsible for creating the detailed design and technical documents based on the business requirements.
- Used Struts validator framework to validate user input.
- Creating activity diagrams, Class diagrams and Sequence diagrams for the tasks.
- Used spring framework configuration files to manage objects and to achieve dependency injection.
- Involved in implementing DAO pattern for database connectivity and Hibernate for object persistence.
- Configure Batch jobs, Job steps, job listners, readers, writers and tasklets using spring batch.
- Integrate spring batch and apache camel using spring xml to define service beans, batch jobs, camel routes, camel end points
- Applied Object Oriented Programming (OOP) concepts (including UML use cases, class diagrams, and interaction diagrams).
- Developed utility classes, which allows easy translation from XML to Java and back and also Property Reader to read properties from a flat file.
- Used Java/J2EE Design patterns like Business Delegate and Data Transfer Object (DTO).
- Developed window layouts and screen flows using Struts Tiles.
- Used ANT Script to build WAR and EAR files and deployed on WebSphere.
- Used XML, XSL for Data presentation, Report generation and customer feedback documents.
- Used Java Beans to automate the generation of Dynamic Reports and for customer transactions.
- Developed JUnit test cases for regression testing and integrated with ANT build.
- Implemented Logging framework using Log4J.
- Junit, log4j were used for unit testing and as logging frameworks.
- Involved in Iterative development using Agile Process.
- Used SVN for version control of the source code.
- Created Web services using Apache Axis 2 for communication with other application.
- Created and executed unit and regression test scripts; created personal and common test data, tracked actual vs. expected results, and evaluated quality of modules created.
- Developed and tested the Efficiency Management module using EJB, Servlets, and JSP & Core Java components in WebLogic Application Server.
- Used spring as middle tier application framework, persistence strategy using spring support for Hibernate for integrating with database.
- Implemented Hibernate in the data access object layer to access and update information in the Oracle Database.
- Developed the XML data object to generate the PDF documents, and reports.
- Employed Hibernate, DAO, and JDBC for data retrieval and medications from database.
- Messaging and interaction of web services is done using SOAP
- Configured the deployment descriptors in Hibernate to achieve object relational mapping.
- Involved in developing Stored Procedures, Queries and Functions.
- Write SQL queries to pull some information from the Backend.
- Compiling and running the applications by implementing Logging framework using Log4J.
- Writing the test plans and test cases for the developed screens.
- Executing test cases and fixing the bugs through unit testing.
Environment: Java, J2EE (Servlets, JDBC, EJB, JSP, JMS), HTML, CSS, struts 1.2, Hibernate, spring, XML, CVS, Eclipse, Oracle 8i, PL/SQL, Windows, UNIX.