- 8+ years of IT experience as Big Data/Hadoop Developer in all phases of Software Development Life Cycle which includes hands on experience in Java/J2EE Technologies and Big Data.
- Experience in Apache Hadoop ecosystem components like HDFS, Map Reduce, Pig, Hive, Impala, HBase, SQOOP, Flume, Oozie, Accumulo.
- Well versed experience in Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
- Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context. Spark - SQL, Data Frame, Pair RDD's, Spark YARN
- Proficient in Core Java, Enterprise technologies such as etc.
- Good exposure to Service Oriented Architectures (SOA) built on Web services (WSDL) using SOAP protocol.
- Written multiple MapReduce programs in Python for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file EJB, Hibernate, Java Web Service, SOAP, REST Services, Java Thread, Java Socket, Java Servlet, JSP, JDBC formats.
- Experience in working on the Hadoop Eco system, also have little experience on installing and configuring of the Hortonworks distribution and Cloudera distribution (CDH3 and CDH4).
- Experience in NoSQL database HBase, MongoDB and Cassandra.
- Good understanding of Hadoop architecture and hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming.
- Experience in importing and exporting data between HDFS and RDBMS using Sqoop.
- Extracted & processed streaming log data from various sources and integrated in to HDFS using Flume.
- Extensively worked with different data sources non-relational databases such as XML files, parses like SAX, DOM and other relational databases such as Oracle, MySQL.
- Experience working on Application servers like IBM WebSphere, JBoss, BEA WebLogic and Apache Tomcat.
- Expert in deploying the code trough web application servers like Web Sphere/Web Logic/ Apache Tomcat in AWS CLOUD.
- Expertise in core Java, J2EE, Multithreading, JDBC, Hibernate, Shell Scripting Servlets, JSP, Spring, Struts, EJBs, Web Services, XML, JPA, JMS, JNDI and proficient in using Java API's for application development
- Good working experience in Application and web Servers like JBoss and Apache Tomcat.
- Experience in writing Pig and Hive scripts and extending the core functionality by writing custom UDF's.
- Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
- Integrated Kafka-Spark streaming for high efficiency throughput and reliability
- Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
- Extensive experience with Agile Development, Object Modeling using UML and Rational Unified Process (RUP).
- Strong knowledge of Object Oriented Programming (OOP) concepts including the use of Polymorphism, Abstraction, Inheritance and Encapsulation.
Confidential, Bloomington, IL
Sr. Hadoop/Spark Developer
Roles & Responsibilities:
- Creating end to end Spark-Solr applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement
- Implemented Moving averages, Interpolations and Regression analysis on input data.
- Tuning spark application to improve performance. Worked collaboratively to manage build outs of large data clusters and real time streaming with Spark.
- Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Performance tuning the Spark jobs by changing the configuration properties and using broadcast variables.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
- Worked Spark on Treadmill to deploy a cluster from scratch under couple of minutes. Responsible in handling Streaming data from web server console logs.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Responsible for gathering the business requirements for the Initial POCs to load the enterprise datawarehouse data to Greenplum databases.
- Oracle to Greenplum migration - Designed an automation script using PL/SQL procedure to convert the Oracle DDL to Greenplum standard.
- Extract data from heterogeneous sources like Flat files VSAM, Oracle, SQL server, Greenplum into HDFS using Sqoop.
- Analyzed the SQL scripts and designed the solution to implement using Scala. Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Design and improve internal search engine using Big data and SOLR/Fusion.
- Data migration from various data sources to SOLR via stages according to the requirement
- Used Akka as a framework to create reactive, distributed, parallel and resilient concurrent applications in Scala.
- Extensively worked on Jenkins for continuous integration and for End to End automation for all build and deployments.
- Work with cross functional consulting teams within the data science and analytics team to design, develop, and execute solutions to derive business insights and solve clients' operational and strategic problems.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Converted Ant application into Gradle. Onsite-Offshore synchronization. Teams at both the ends should be well connected to have a smooth flow in the project and solve the roadblocks
- Monitoring the ticketing tool for any tickets indicating an issue/incident reported and resolving with the appropriate fix in the project.
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets.
- Used Cloud watch logs to move app logs to S3. Create alarms based on exceptions raised by applications.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured dataand analyzed them by running Hive queries.
- Continuous monitoring and managing the Hadoop/spark cluster using Cloudera Manager
- Apply data science and machine learning techniques using Zeppelin to improve search engine in Wealth management firm.
- Work with Architecture and Development teams to understand usage patterns and work load requirements of new projects to ensure the Hadoop platform can effectively meet performance requirements and service levels of application.
Environment: Java 1.8, Scala 2.10.5, Apache Spark 1.6.0, Apache Zeppelin, GreenPlum 4.3 (PostgreSQL), Treadmill, CDH 5.8.2, Spring 3.0.4, ivy 2.0, Gradle 2.13, Hive, HDFS, YARN, MapReduce, Sqoop 1.4.3, Flume, SOLR, UNIX Shell Scripting, Python 2.6, AWS, Kafka, Jenkins, Akka
Confidential, Denver, CO
Sr. Hadoop Developer
Roles & Responsibilities:
- Worked on loading disparate data sets coming from different sources to BDpaas (HADOOP) environment using SQOOP.
- Developed UNIX scripts in creating Batch load and driver code for bringing huge amount of data from Relational databases to BIGDATA platform.
- Ingested data from one tenant to the other. Developed Pig queries to load data to HBase Leveraged Hive queries to create ORC tables
- Created ORC tables to improve the performance for the reporting purposes. Involved in the coding and integration of several business critical modules of CARE application using Java, spring, Hibernate and REST web services on Web Sphere application server.
- Involved in project to provide eligibility, structure and transactional feeds to River Valley Facets platform where heritage and neighborhood health plans and related commercial products are maintained and administered.
- Developed web pages using JSPs and JSTL to help end user make online submission of rebates. Also used XML Beans for data mapping of XML into Java Objects.
- Worked with Systems Analyst and business users to understand requirements for feed generation.
- Created Health Allies Eligibility and Health Allies Transactional feeds extracts using Hive, HBase, Python and UNIX to migrate feed generation from a mainframe application called CES (Consolidated Eligibility Systems) to big data.
- Used bucketing concepts in Hive to improve performance of HQL queries.
- Used numerous user defined functions in hive to attain complex business logic in feed generation.
- Developed Spark scripts by using Scala shell commands.
- Created reusable Python script and added it to distributed cache in Hive to generate fixed width datafiles using an offset file.
- Created a MapReduce program which looks into data in HBase current and prior versions to identify transactional updates. These updates are loaded into Hive external tables which are in turn referred by Hive scripts in transactional feeds generation.
- Worked on agile methodology using Rally
Environment: MAPR, Sqoop, Hive, Pig, Python, UNIX, HBase, Spark, Rally.
Confidential, Columbus, OH
- Involved in requirement analysis, design, coding and implementation.
- Worked in Agile Methodology and used JIRA for maintain the stories about project.
- Analyzed large data sets by running Hive queries.
- Involved in Design, develop Hive Data model, loading with data and writing Java UDF for Hive.
- Handled importing and exporting data into HDFS by developing solutions, analyzed the data using Map Reduce, Hive and produce summary results from Hadoop to downstream systems.
- Used Sqoop to import and export the data from Hadoop Distributed File System (HDFS) to RDBMS.
- Created Hive tables and loaded data from HDFS to Hive tables as per the requirement. Established custom Map Reduces programs in order to analyze data and used HQL queries to clean unwanted data.
- Created components like Hive UDFs for missing functionality in Hive to analyze and process the large volumes of data.
- Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
- Involved in writing complex queries to perform join operations between multiple tables.
- Involved actively verifying and testing data in HDFS and Hive tables while Sqooping data from Hive to RDBMS tables.
- Developing Scripts and Scheduled Autosy's Jobs to filter the data.
- Involved monitoring Auto Sys's file watcher jobs and testing data for each transaction and verified data weather it ran properly or not.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
- Used IMPALA to pull the data from Hive tables.
- Used Apache Maven 3.x to build and deploy application to various environments Installed Oozie workflow engine to run multiple Hive jobs which run independently with time and data availabilities
Environment: HDFS, Hadoop, Pig, Hive, Sqoop, Flume, Map Reduce, Oozie, Mongo DB, Java 6/7, Oracle 10g, Sub Version, Toad, UNIX Shell Scripting, SOAP, REST services, Oracle 10g, Agile Methodology, JIRA, Auto Sys
- Responsible and active in the analysis, design, implementation and deployment of full software development life-cycle (SDLC) of the project.
- Developed struts action classes, action forms and performed action mapping using Struts Framework and performed data validation in form beans and action classes.
- Involved in multi-tiered J2EE design utilizing MVC architecture (Struts Framework) and Hibernate.
- Extensively used Struts Framework as the controller to handle subsequent client requests and invoke the model based upon user requests.
- Involved in system design and development in core java using Collections, multithreading.
- Defined the search criteria and pulled out the record of the customer from the database. Make the required changes to the record and save the updated information back to the database.
- Developed build and deployment scripts using Apache ANT to customize WAR and EAR files.
- Used DAO and JDBC for database access.
- Developed applications with ANT based build scripts.
- Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
- Design and develop XML processing components for dynamic menus on the application.
- Involved in postproduction support and maintenance of the application.
Environment: Oracle 11g, Java 1.5, Struts 1.2, Servlets, HTML, XML, MS MS SQL Server 2005, J2EE, JUnit, Tomcat 6.
- Involved in database design.
- Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, Stored procedures, Views in Oracle 10g.
- Created User Interface using JSP.
- Involved in integration testing the Business Logic layer and Data Access layer.
- Involved in JUnit testing of the application using JUnit framework.
- Implemented Stored Procedures functions and views to retrieve the data.
- Used Rational Application Developer (RAD) as Integrated Development Environment (IDE).
- Used unit testing for all the components using JUnit.
- Used Apache log 4j Logging framework for logging of trace and Auditing.
- Used IBM Web-Sphere as the Application Server.
- UsedIBM Rational Clear case as the version controller.
- Responsible to mentor/work with team members to make sure the standards and guidelines are followed and delivery of tasks in time.
Environment: Oracle, MYSQL, HTML,, SQL,XML,JSP, Servlets, JDBC, JAVA, Eclipse, UNIX.