- 6+ Years of experience with emphasis on Big Data Technologies, Development, and Design of Java based enterprise applications.
- 4 years of experience in Hadoop Developer in Big Data/Hadoop technology development and 3 years of Java Application Development.
- Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
- Expertise in the Hadoop ecosystem components like Map Reduce, HDFS, MapR-FS, Hive, Spark Streaming, Oozie, Pig, Kafka, Flume, HBase.
- Extensive experience in importing and exporting data using stream processing platforms like Flume and Kafka.
- Hands on databases like Oracle, MS SQL, DB2 and developing in RDBMS that includes SQL queries, Stored procedures and triggers.
- Expertise also in NoSQL databases like MongoDB, MapR-DB and Cassandra.
- Worked as Java developer with great experience on Java Libraries, API’s and frameworks like Spring, Hibernate.
- Worked with the famous frameworks like hibernate and MVC. Expertise with MVC, Spring Core, IOC, Spring-MVC, JDBC, Web modules.
- Worked in the different environments like Hortonworks, Cloudera Hadoop and MapR Distributions.
- Hands on programming in MapReduce by using Java. knowledge on Amazon Web Services(AWS).
- CI/CD pipeline management through Jenkins. Automation of manual tasks using Shell scripting.
- Experience with Application Servers and Web Servers such as Web Logic, J boss, Web Sphere and Apache.
- Knowledge in writing Spark Applications in Python(Pyspark)
- Experienced in migrating ETL transformations using Pig Latin Scripts, transformations, join operations.
- Expert in scheduling spark jobs with Airflow.
- Flexible with Unix/Linux and Windows Environments working with Operating Systems like Centos 6/7, Ubuntu 14/16, Cosmos.
- Strong in working in both Agile Scrum and Waterfall SDLC methodologies.
- Adept in Agile/Scrum methodology and familiar with SDLC life cycle from requirement analysis to system study, designing, testing, debugging, documentation, and implementation.
- Capable to finish the tasks before the time with the proven abilities along with good communication skills.
- Collaboration in team work with strong analytical and problem-solving skills.
- Included leadership skills, technically competent, research-minded.
HDFS, MapR: FS, Map Reduce, HBase, Kafka, Pig, Hive, Sqoop, Impala, Flume, Cloudera, Hortonworks, MapR, Cassandra, MapR-DB Oozie and Zookeeper
Realtime/Stream Processing: Apache Spark
Operating Systems: Windows, Unix and Linux
Data Base: Oracle 9i/10g, SQL Server
IDE Development Tools: Eclipse, NetBeans
Java Technologies: Servlets, struts, JSP, Spring, Web Services, Hibernate, JDBC
Methodologies: Agile, Scrum and Waterfall
Confidential, Phoenix, Arizona
Big Data Developer
- Developing the logic based on the requirements of business as wells as end user(partner) and provide a scalable & self-service solution
- Creating tables in Hive and sending them through SFT (secure file transfer) files with appropriate formats as per the partner requirement and publishing in Tableau dashboards.
- Building HQL in Magellan 2.1 which IDN developed tool (Amex tool) by importing cornerstone data sources within the use case
- Updating day to day work in Rally, completing tasks in User stories and fixing the defects raised from SIT
- Co ordinate with different SOR teams to understand the nature of data and solutioning to fit in to our use case needs
- Making changes to the logic based on the business requirements which are requested from external partners (end users)
- Using Event Engine console to run the jobs in the Amex cluster by specifying time intervals.
- Create Hive tables and reviewing and managing Hive log files.
- Optimizing the hive queries to better the performance by using Partitioning, Bucketing, Hints & assigning appropriate cluster resources through the Hive properties based on the volume of data being processed
- Using Apache Spark, to give some rules to the data to check data streaming (for mandatory fields not be NULL or for unexpected datatype things)
- Creating parquet file formats to save disk space utilization for better performance in executing queries
- Created shell scripts and python scripts to automate our daily tasks (includes our production tasks as well)
- Loading the data from Hive to Jethro database to accelerate the Business Intelligence reporting
- Used HBase No SQL Database for real time and read/write access to huge volumes of data in the use case
- Delivered around 430+ KPI’s for partner reporting
- Discussing different ideas to in daily scrum calls with team to take out the best option to move forward
Environment: Hadoop, MapR distribution, Hive, Jethro, MYSQL, Magellan, Apache Spark, HBase, Event Engine, Tableau, Linux, Rally, Jira.
Confidential, EI Paso, Texas
Big Data Developer
- Analyzing the business requirements thoroughly, from the Business Partners.
- Part of the team installed and configured Hadoop Map Reduce and HDFS
- Used Flume to create FANIN and FANOUT multiplexing flows and custom interceptors for data conversion along with Flume.
- Installed Oozie workflow engine to run multiple Hive, Shell Script, Sqoop, pig and Java jobs.
- Setting up and managing Kafka for stream processing and Broker and topic configuration and creation.
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, HBase and Hive by integrating with Storm.
- Created, altered and deleted topics using Kafka Queues when required with varying.
- Using HIVEmapside/skew join queries to join multiple tables of a source system and load them into Elastic Search Tables.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to Elastic MapReduce jobs.
- Creating HiveUDFs in java, compiling them into jars and adding them to the HDFS and executing them with Hive Queries.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Proficient in data modelling with Hive partitioning, Indexing, bucketing and other optimization techniques in Hive.
- Involved in Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Java.
- Involved in importing metadata into Hive using Java and migrated existing tables and applications to work on Hive. creating entities in Scala and Java along with named queries to interact with database.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Developed expandable rest client for the restful web services using Spring MVC, Java.
- Developed a framework of RESTful web services using Spring MVC, JPA and APIs to help Hadoop developers to automate data quality checks.
- Designed HBase schema to avoid Hot spotting and exposed the data from HBase tables to REST API on UI.
- Data storage in HBase using Pig and Involved in Parsing of data using Pig.
- Involved in implementation of script to transform information from Oracle to HBase using Sqoop
- Development activities have been carried out by using Agile Methodologies.
Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, J2EE, Eclipse, ORC, Parquet, HBase, Kafka, Oozie, HBase, Zookeeper, Spring boot, Spring core, Spark RDD, Spark SQL and Spark streaming.
Big Data Developer
- Involved in Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups.
- Worked with the environment of MapR Distribution with MapR-File System.
- Used MapR Streams with the Apache Kafka API.
- Also involved with HDFS, Apache Hadoop and MapReduce APIs.
- Used MapR-DB for analytical applications, operational operations and real-time sharing.
- Implemented Spark Streaming to read real-time data from Kafka in parallel and processed in parallel and save the result as parquet format in Hive.
- Involved in complete analysis between Avro, Parquet file, ORC file and decided to go with parquet format.
- Participated in Map Reduce Programs those are running on the cluster and Log files are managed and reviewed .
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Part of team in plug-in for Hadoop that provides the ability to use MongoDB as an input source and an output destination for MapReduce, Hive and Pig
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed
- Involved in data modeling and sharing and replication strategies in MongoDB
- Worked in integration part of storing data from Rest to MongoDB
- Involved migrations process from Hadoop java map-reduce program to Spark-Scala APIs
- Writing Scala programs to create Spark RDDs & SQL data frame to load processed data into RDBMS for mortgage analysis dashboard.
- Composing the application classes as Spring Beans using Spring IOC/Dependency Injection.
- Designed and Developed server-side components using Java, REST.
- Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
- Optimize Hive/Impala Queries for fast results and pushing data from impala to micro strategy
- Used Impala as the primary analytical tool for allowing visualization servers to connect and perform reporting on top Hadoop directly.
- Used Impala and Tableau to create various reporting dashboards.
- Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
- Created Spark based Bigdata Integration jobs to do lighting speed analytics over the spark cluster.
- Involved in migrating MapReduce jobs into Spark jobs and used Spark SQL and Data frames API to load structured data into Spark clusters.
- Responsible for design development of Spark, SQL Scripts bases on Functional Specifications
- Developed Oozie workflow for scheduling and orchestrating the ETL process.
Environment: MapR Distribution, MapR-File System, MapReduce APIs Flume, Pig, Scala, Spark RDD, Spark SQL, Hive, Impala, Oozie, Kafka, MongoDB, MapR-DB, SQL, Spring MVC, Spring IOC.
- Developing requirements Proto according to the Business requirements, definition and business process flows.
- Involved in requirements gathering and documenting the functional specifications.
- Worked with installation, configuration and testing on several Hadoop ecosystem components like Hive, Pig, HBase, and Sqoop.
- Worked on MapReduce jobs in java for the development in order to make data preprocessing and cleaning.
- Developed the Oozie workflow to automate the tasks loading data into HDFS with the help of Pig by preprocessing.
- Monitored Hadoop cluster using tools like Cloudera manager, managing and scheduling the jobs on Hadoop cluster.
- Developed Kafka producer, consumer, HBase clients, spark jobs along with the components Hive, HDFS.
- Worked and had knowledge on co-ordination services through Zookeeper.
- Analysis and Design of the Object models using JAVA/J2EE Design Patterns in various tiers of the application
- Developers using the framework builds the graphical components and define actions, popup menus in XML
- Developed server-side code in Servlet and JSP and Designed and implemented a Struts framework for Swing.
- Designed the use cases, sequence diagrams, class diagrams and Activity diagrams
- Creation Test plan and Development and coding of Test classes and Test Cases.
- Execution of Test cases in JBuilder. Defect fixing and client communication & Query resolution.
- Testing of the product using Regression Testing, Unit Testing and Integration Testing.
- Created struts-config file and resource bundles for Distribution module using Struts Framework.
- Worked on core java for multithreading, arrays and Developing the JSPs for the application.
- Designed and developed code for MVC architecture using Struts framework using Servlets, JSPs
- Designed the application by implementing JSF Framework based on MVC Architecture with EJB, simple Java Beans as a Model, JSP and JSF UI Components as View and Faces Servlet as a Controller.
- Involved in deployment of local initialization of application in Oracle WebLogic server.
Environment: Java, J2SE5.0, Struts, Servlets, JSP, Eclipse, Oracle 8i, CouchDB, Oracle, XML, HTML/DHTML, JBuilder.
Junior Java Developer
- Implemented Action classes, Action From classes for the entire Reports module using Struts framework.
- Created tile definitions, struts-config files and resource bundles using Struts framework.
- Working with Core java while implementing multithreading and executing in struts framework.
- Used to work with OOPS concepts and memory concepts like string pools.
- Implemented various design patterns like, MVC, Factory, Singleton.
- Deployed and tested the JSP pages in Tomcat server.
- Developed Struts Framework Action Servlets classes for Controller and developed Form Beans for transferring data between Action class and the View Layer.
- Implemented Struts validators framework to validate the data.
- Used Java Message Service (JMS) for reliable and asynchronous exchange of important information, such as loan status report, between the clients and the bank.
- Developed interfaces using HTML, JSP pages and Struts -Presentation View.
- Used Hibernate for object-relational mapping and for database operations in Oracle.
Environment: Java, Servlets, Core java, Multi-Threading, Struts, Hibernate, UML, Oracle, Tomcat, Eclipse, Windows XP.HTML, CSS, JSP.