- Over 7 years of IT experience in the field of Information Technology that includes analysis, design, development and testing of complex applications.
- Strong working experience with Big Data and Hadoop Ecosystems including HDFS, PIG, HIVE, HBase, Yarn, Sqoop, Flume, Oozie, Hue, MapReduce and Spark.
- Hands on experience in installing supporting and managing Hadoop Clusters using Cloudera and Hortonworks distribution of Hadoop.
- Extensive experience in analyzing data using Hive QL, Pig Latin and MapReduce programs in Java.
- Extensively implemented POC's on migrating to Spark - Streaming to process the live data.
- Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
- Hands on with real time data processing using distributed technologies Storm and Kafka.
- Used Different Spark Modules like Spark core, Spark RDD's, Spark Dataframe, Spark SQL.
- Converted Various Hive queries into Spark transformations and Actions that are required.
- Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
- Worked on analyzing Hadoop cluster and different big data analytic tools including HBase database and Sqoop.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using CSS, Avro, Parquet, JSON, CSV.
- Having good knowledge of Oracle9i, 10g, 11g as Database and excellent in writing the SQL queries and scripts.
- Experience in implementing Kerberos authentication protocol in Hadoop for data security.
- Strong command over relational databases: MySQL, Oracle, SQL Server and MS Access.
- Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration.
- Experience in setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud.
Big Data Technologies: HDFS, Hive, MapReduce, Pig, Sqoop, Oozie, Flume, Kafka, YARN and Spark
Scripting Languages: Shell, Python
Programming Languages: Java, Scala, Python, SQL, C
Hadoop Distributions: Cloudera (CDH4, and CDH5), Hortonworks
NoSQL databases: HBase, Cassandra
Frameworks: Spring, Hibernate
SCM Tools: SVN, GitHub
Web Services: SOAP, REST
Operating systems: UNIX, LINUX, Mac OS and Windows
Web servers: Web logic, Web Sphere, Apache Tomcat
Databases: Oracle, SQL Server, MySQL.
Confidential, New Jersey NJ
- Responsible for building scalable distributed data solutions using Hadoop.
- Load the data into spark RDD and performed in-memory data computation to get faster output response.
- Developed Spark jobs and Hive Jobs to transform data.
- Developed Spark scripts by writing custom RDDs in Python for data transformations and perform actions on RDDs.
- Worked on Oozie workflow engine for job scheduling Imported and exported data into MapReduce and Hive using Sqoop.
- Developed Sqoop scripts to import, export data from relational sources and handled incremental loading on the data by date.
- Developed Kafka consumer component for Real-Time data processing in Java and Scala.
- Used Impala to query Hive tables for faster query response times.
- Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Created Partitioned and Bucketed Hive tables in Parquet and Avro File Formats with Snappy compression and then loaded data.
- Written Hive queries using spark SQL that integrates with spark environment.
- Developed MapReduce programs to parse the raw JSON data and store the refined data in tables
- Used Kafka to load data in to HDFS and move data in to HBase.
- Captured the data logs from web server into HDFS using Flume for analysis.
- Worked on moving some of the data pipelines from CDH cluster to run on AWS.
- Involved in moving data from HDFS to AWS Simple Storage Service (S3) and extensively worked with S3 bucket in AWS.
- Developed spark application for filtering Json source data in AWS S3 location and store it into HDFS with partitions and used spark to extract schema of Json files.
- Responsible for migrating the code base from Cloudera Platform to Amazon EMR and evaluated Amazon eco systems components like Redshift.
Environment: Linux, Hadoop 2, Python, Scala, CDH 5.12.1, SQL, Sqoop, HBase, Hive, Spark, Oozie, Cloudera Manager, Oracle 188.8.131.52, Windows, Yarn, Spring, Sentry, AWS, S3, SQL.
- Involved in the Complete Software development life cycle (SDLC) to develop the application.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce.
- Loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Generated Java APIs for retrieval and analysis on No-SQL Cassandra database.
- Helped with the sizing and performance tuning of the Cassandra cluster.
- Developed Hive queries to process the data and generate the results in a tabular format.
- Handled importing of data from multiple data sources using Sqoop, performed transformations using Hive, MapReduce and loaded data into HDFS.
- Worked on extracting data from CSV, JSON Files and stored them in Avro and parquet formats.
- Implemented Partition, bucketing concepts in Hive and designed both Managed and External tables in Hive.
- Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement in project.
- Load and transform large sets of structured, semi structured using Hive.
- Involved in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Worked on Creating Kafka topics, partitions, writing custom partitioned classes.
- Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the data.
- Monitoring and controlling local file system disk space usage, log files, cleaning log files with automated scripts.
- Involved in writing OOZIE jobs for workflow automation.
- Involved in collecting metrics for Hadoop clusters using Ganglia and Ambari.
Environment: Unix, Linux, Hortonworks 2.6.2, Scala, HDFS, Map Reduce, Hive, Flume, Sqoop, Ganglia, Ambari, Oracle 11g 184.108.40.206, Ranger, Python, Apache Hadoop, Cassandra.
- Implemented authentication and authorization service using Kerberos authentication protocol.
- Worked with different teams to install operating system, Hadoop updates, patches, version upgrades of Cloudera as required.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig and Sqoop, Hive, Spark and Zookeeper.
- Developed data pipeline using Flume, Pig and Java map reduce to ingest claim data into HDFS for analysis.
- Experience in analyzing log files for Hadoop and ecosystem services and finding root cause.
- Experience monitoring and troubleshooting issues with hosts in the cluster regarding memory, CPU, OS, storage and network
- Configured Hive meta store with MySQL, which stores the metadata for Hive tables.
- Experience in scheduling the jobs through Oozie.
- Performed HDFS cluster support and maintenance tasks like adding and removing nodes without any effect to running nodes and data.
- Involved in Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster Monitoring.
- Worked with big data developers, designers and scientists in troubleshooting map reduce job failures and issues with Hive, Pig.
- Involved in Setup and benchmark of Hadoop HBase clusters for internal use.
- Developed Pig Latin scripts to extract data from the web server output files to load into HDFS.
- Exported the result set from HIVE to MySQL using Shell scripts.
- Actively involved in code review and bug fixing for improving the performance.
- Successful in creating and implementing complex code changes .
Environment: Hadoop, Cloudera 5.4, Java, HDFS, MapReduce, Pig, Hive, Impala, Sqoop, Flume, Kafka, Kerberos, Sentry, Oozie, HBase, SQL, Spring, Linux, Eclipse.
- Involved in all the phases of SDLC including Requirements Collection, Design and Analysis of the Customer Specifications, Development and Customization of the application.
- Designed and Implemented MVC architecture using Spring MVC.
- Used Eclipse as an IDE for developing the application.
- Primarily focused on the spring components such as Dispatcher Servlets, Controllers, Model and View Objects, View Resolver.
- Implemented Multithread concepts in Java classes to avoid deadlocking.
- Implemented Java design patterns like Singleton, Factory, Command patterns.
- Developed test cases and performed as a unit test using JUnit Framework.
- Used REST and SOAP UI for testing web service for server-side changes.
- Designed and developed Web Services to provide services to the various clients using SOAP and WSDL.
- Responsible for development of configuration, mapping and java beans for Persistent.
- Involved in Production Support. Solved many production issues based on priority.
- Developed the User Interface Screens for presentation using JSP, JSTL tags, HTML and CSS.
- Created automated test cases for the web application using Selenium web driver.
- Used JIRA as a defect tracking system for all projects, and GitHub as a code repository to manage project code.
Environment: Java 1.5, EJB 2.0, Springs, Struts, JSP, JSTL, Hibernate, Web Services (SOAP, WSDL), XML, Web logic 10.3, Ant 1.6, JUnit, Oracle 11g.
- Involved in Analysis, Design, Development, Integration and Testing of application modules and followed agile methodology.
- Involved in developing UML diagrams like Use-case, Class diagrams and Activity diagrams.
- Designed and developed java components using design patterns like Singleton, Strategy and Decorator and used J2EE patterns like Facade and Service Locator.
- Developed core Java programs for all business rules and workflows using spring framework.
- Worked with TOAD for Data Modeling design in Oracle11g database creating schemas and tabled for applications.
- Used AJAX to get the data from the server asynchronously by using JSON object.
- Involved in transforming XML data in to Java Objects using a JAXB binding tool.
- Developed various Action classes and Form bean classes using Struts framework.
- Used JDBC API for Connection with Oracle11g database.
- Developed the Test Cases and Test Suits to Test the application using Junit.
- Worked on Eclipse3.1 IDE in developing and debugging the application.
- Involved in SDLC using methodologies like Waterfall.
- Application deployed in Linux and Solaris servers using WebLogic on Red Hat Enterprise Linux 5.0.
- Performed Requirement Gathering & Analysis by actively soliciting, analyzing and negotiating customer requirements and prepared the requirements specification document for the application using Microsoft Word.
- Developed Use Case diagrams, business flow diagrams, Activity/State diagrams.
- Developed presentation layer using Java Server Faces (JSF) MVC framework.
- Used JSP, HTML and CSS, JQuery as view components in MVC.
- Developed custom controllers for handling the requests using the spring MVC controllers.
- Used JDBC to invoke Stored Procedures and used JDBC for database connectivity to SQL.
- Deployed the applications on weblogic Application Server.
- Developed Web services using Restful and JSON.
- Created and managed microservices using Spring Boot that create, update, delete and get the data.
- Used Oracle database for tables creation and involved in writing SQL queries using Joins and Stored Procedures.
- Developed JUnit Test Cases for Code unit test.
- Worked with configuration management groups for providing various deployment environments set up including System Integration testing, Quality Control testing etc.
Environment: Java/J2EE, SQL, Oracle, JSP 2.0, JSON, Java Script, Web Logic 10.0, HTML, JDBC, Spring, Hibernate, XML, JMS, log4j, JUnit, Servlets, MVC, Eclipse.