- Over 7+ years of experience with emphasis on Big Data Technologies, Development and Design of Java based enterprise applications.
- 4+ years of experience in developing applications that perform large scale Distributed Data Processing using Big Data ecosystem tools Hadoop, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Kafka, Oozie, Zoo Keeper, Flume, Yarn and Avro.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, MapReduce programming paradigm and good hands - on experience in Pyspark and SQL queries.
- Hands-on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase & Hive Integration, Sqoop, Flume & knowledge of Mapper/Reduce/HDFS Framework.
- Set up standards and processes for Hadoop based application design and implementation.
- Worked on NoSQL databases including HBase, Cassandra and MongoDB.
- Experience on Horton works and Cloudera Hadoop environments.
- Setting up data in AWS using S3 bucket and configuring instance backups to S3 bucket.
- Good experience in analysis using Pig and Hive and understanding of SQOOP and Puppet.
- Expertise in database performance tuning data modeling.
- Experienced in providing security to Hadoop cluster with Kerberos and integration with LDAP/AD at Enterprise level.
- Involved in best practices for Cassandra, migrating application to Cassandra database from the legacy platform for Choice, upgraded Cassandra 3.
- Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
- Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
- Used the Spark - Cassandra Connector to load data to and from Cassandra.
- Hands on experience in Apache Spark creating RDD’s and Data Frames applying Operations Transformation and Actions and concerting RDD’s to Data Frames.
- Migrating various Hive UDF's and queries into Spark SQL for faster requests.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka .
- Experience in using Apache Kafka for log aggregating.
- Developed a data pipeline using Kafka and Spark Streaming to store data into HDFS and performed the real-time analytics on the incoming data.
- Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Loading the data into EMR from various sources S3 process it using Hive Scripts.
- Exploring with Spark various modules of Spark and working with Data Frames , RDD and Spark Context .
- Performed map-side joins on RDD and Imported data from different sources like HDFS/HBase into Spark RDD .
- Familiarity and experience with data warehousing and ETL tools. Good working Knowledge in OOA & OOD using UML and designing use cases.
- Experience working on Solr to develop search engine on unstructured data in HDFS.
- Used Solr to enable indexing for enabling searching on Non-primary key columns from Cassandra key spaces.
- Good understanding of Scrum methodologies, Test Driven Development and Continuous integration.
- Experience in production support and application support by fixing bugs.
- Used HP Quality Center for logging test cases and defects.
- Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
- Designed and implemented Hive and Pig UDF's using Java for evaluation, filtering, loading and storing of data.
- Experience working with JAVA J2EE, JDBC, ODBC, JSP, Java Eclipse, Java Beans, EJB, Servlets.
- Expert in developing web page interfaces using JSP, Java Swings, and HTML scripting languages.
- Excellent understanding on Java beans and Hibernate framework to implement model logic to interact with RDBMS databases.
- Experience in using IDEs like Eclipse, NetBeans and Maven.
Hadoop Core Services: HDFS, Map Reduce, Spark, YARN, Hive, Pig, Scala, Kafka, Flume, Tez, Impala, Solr, Oozie, Zookeeper.
Hadoop Distribution: Horton works, Cloudera
NO SQL Databases: HBase, Cassandra, MongoDB
Cloud Computing Tools: Amazon AWS
Languages: Java/J2EE, Python, SQL, PL/SQL, Pig Latin, HiveQL, Unix Shell Scripting
Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JMS, EJB
Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.
Databases: Oracle, MySQL, SQL
Operating Systems: UNIX, Windows, LINUX
Build Tools: Jenkins, Maven, ANT
Development Tools: Microsoft SQL Studio, Eclipse
Development methodologies: Agile/Scrum, Waterfall
Sr. Hadoop Developer
Confidential, Alpharetta, GA
- Participated with team to gather and analyze the Client requirements.
- Analyzed large data sets distributed across cluster of commodity hardware.
- Worked on MR phases by using Core Java and scripting language, create and export as jar files into HDFS and ran web UI for name node, job tracker and task tracker.
- Setting required Hadoop environments for cluster to preform Map Reduce jobs.
- Data was formatted using Hive queries and stored on HDFS.
- Created complex schema and tables for analyzing using Hive.
- Involved in extracting, transforming and loading data sets from local to HDFS using Hive.
- Imported and exported data from RDBMS to HDFS and vice-versa using Sqoop.
- Involved in writing Pig scripts to analyze or query structured, semi-structured and unstructured data in a file.
- Worked on HBase and MySQL for optimizing the data.
- Worked over Sequence files, AVRO and Parquet file formats.
- Monitor Hadoop cluster connectivity and security using tools such as Zookeeper and Hue.
- Manage and review Hadoop log files.
- Developed and implemented automation processes to increase deployment efficiency.
- Developed scripts to preforming Ad hoc requests.
- Coordinate and communicate with team and preparing technical design documents.
- AGILE(Scrum) development methodology has been followed to develop the application. Participated in daily sprint meetings.
- Involved in managing the backup and disaster recovery for Hadoop data.
Environment: CDH 5.0, Hadoop, HDFS, Map Reduce, Hive, Sqoop, Pig, HBase, MySQL, Zookeeper, Hue Linux.
Sr. Hadoop Developer
Confidential, Foster City, CA
- Installed, configured and maintained Apache Hadoop cluster.
- Used Sqoop to import data into HDFS/Hive from multiple relational databases, performed operations and exported the results back.
- Extensively used Spark Streaming to perform the analysis of sales data on the real-time regular window time intervals coming from sources like Kafka.
- Performed Spark transformations and actions on large datasets. Implemented Spark SQL to perform complex data manipulations, and to work with large amounts of structured and semi-structured data stored in a cluster using Data Frames/Datasets.
- Performed Spark join optimizations, troubleshooted, monitored and wrote efficient codes using Scala.
- Created Hive tables based on business requirements. Wrote many Hive queries, UDFs and implemented concepts like Partitioning, Bucketing for efficient data access, Windowing operations and more.
- Integrated Hive, Sqoop with HBase and performed transactional and analytical processing.
- Configured, designed, implemented and monitored Kafka clusters and connectors. Wrote Kafka producers and consumers using Java.
- Implemented proof of concept (POC) for processing stream data using Kafka -> Spark -> HDFS.
- Developed a data pipeline using Kafka, Spark, and Hive/ HDFS to ingest, transform and analyze data. Automated jobs using Oozie.
- Generated Tableau dashboards and worksheets for large datasets.
- Implemented custom interceptors for Flume to filter data, and defined channel selectors to multiplex the data into different sinks.
- Implemented many Spark jobs and wrote Function definitions, Case and Object classes using Scala.
- Involved in the process of Cassandra data modeling, performing data operations using CQL and Java.
- Performed data integration with a goal of moving more data effectively, efficiently and with high performance to assist in business-critical projects using Talend Data Integration.
- Used SQL queries and other data analysis methods to know the quality of the data.
- Exported the aggregated data onto Oracle using Sqoop for reporting on the Tableau dashboard.
- Involved in QA, test data creation, and unit testing activities.
- Implemented security on Hadoop cluster using Kerberos.
- Involved in design, development and testing phases of Software Development Life Cycle.
- Agile Scrum Methodology to help manage and organize a team with regular code review sessions.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Environment: Hadoop (Cloudera), Spark, Hive, Kafka, Sqoop, Oozie, Java 8, Cassandra, Oracle 12c, 11g, Impala, Scala, Talend studio, Tableau.
Confidential, Dearborn, Michigan
- Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
- Developed custom Input Adaptor utilizing the HDFS File system API to ingest click stream log files from FTP server to HDFS.
- Developed end-to-end data pipeline using FTP Adaptor, Spark, Hive and Impala.
- Implemented Spark and utilized SparkSQL heavily for faster development, and processing of data.
- Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's.
- Involved in converting Hive/SQL queries into Spark transformations using Spark with Scala.
- Used Scala collection framework to store and process the complex consumer information.
- Implemented a prototype to perform Real time streaming the data using Spark Streaming with Kafka
- Handled importing other enterprise data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HBase tables.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
- Analyzed the data by performing Hive queries (Hive QL) and running Pig scripts (Pig Latin) to study customer behavior.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Created components like Hive UDFs for missing functionality in HIVE for analytics.
- Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
- Created validate and maintain scripts to load data using Sqoop manually.
- Created Oozie workflows and coordinators to automate Sqoop jobs weekly and monthly.
- Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume .
- Used Oozie and Oozie coordinators to deploy end-to-end data processing pipelines and scheduling the workflows.
- Continuous monitoring and managing the Hadoop cluster.
- Developed interactive shell scripts for scheduling various data cleansing and data loading process.
- Experience with data wrangling and creating workable datasets.
Environment: HDFS, Pig, Hive, Sqoop, Flume, Spark, Scala, MapReduce, Scala, Oozie, Oracle 11g, YARN, UNIX Shell Scripting, Agile Methodology
Java/ J2EE Developer
- Involved in Requirements analysis, design, and development and testing .
- Involved in setting up the different roles & maintained authentication to the application .
- Designed, deployed and tested Multi-tier application using the Java technologies .
- Involved in front end development using JSP, HTML & CSS .
- Implemented the Application using Servlets.
- Deployed the application on Oracle Web logic server.
- Implemented Multithreading concepts in java classes to avoid deadlocking .
- Used MySQL database to store data and execute SQL queries on the backend.
- Prepared and Maintained test environment.
- Tested the application before going live to production.
- Documented and communicated test result to the team lead on daily basis .
- Involved in weekly meeting with team leads and managers to discuss the issues and status of the projects.
Environment: J2EE (Java, JSP, JDBC, Multi-Threading), HTML, Oracle Web logic server, Eclipse, MySQL, JUnit.
Java/ J2EE Developer
- Used JSP pages through Servlets Controller for client-side view .
- Always used the best practices of Java/J2EE to minimize the unnecessary object creation.
- Implement RESTful web services with the Struts framework .
- Verify them with the J Unit testing framework .
- Working experience in using Oracle 10g backend Database.
- Used JMS Queues to develop Internal Messaging System .
- Developed the UML Use Cases , Activity, Sequence and Class diagrams using Rational Rose.
- Developed Java, JDBC, and Java Beans using JBuilder IDE .
- Developed JSP pages and Servlets for customer maintenance.
- Apache Tomcat Server was used to deploy the application.
- Involving in Building the modules in Linux environment with ant script.
- Used Resource Manager to schedule the job in Unix server.
- Performed Unit testing, Integration testing for all the modules of the system.
- Developed JAVA BEAN components utilizing AWT and SWING classes.