We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

College Park, MD

SUMMARY:

  • IT Professional with 8+ years of professional experience in BigData Ecosystem and Java related technologies.
  • Strong technical, administration, and mentoring experience in Linux and Big Data/Hadoop technologies.
  • Hands on experience on major components of Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Pentaho, HBase, Zookeeper, Sqoop, Oozie, Cassandra, Flume and Avro.
  • Solid understanding of RDD operations in Apache Spark i.e., Transformations &Actions, Persistence (Caching), Accumulators, Broadcast Variables, Optimizing Broadcasts.
  • In depth understanding of Apache spark job execution Components like DAG, lineage graph, Dag Scheduler, Task scheduler, Stages and task.
  • Experience in exposing Apache Spark as Webservices.
  • Good understanding of Driver, Executor Spark web UI.
  • Experience in submitting Apache Spark job and map reduce jobs to YARN.
  • Experience in real time processing using Apache Spark and Kafka.
  • Migrated Python Machine learning modules to scalable, high performance and fault - tolerant distributed systems like Apache Spark.
  • Strong experience in Spark SQL UDFs, Hive UDFs, Spark SQL Performance, Performance Tuning. Hands on experience in working with input file formats like orc, parquet, JSon and Avro.
  • Good expertise in coding in Python, Scala and Java.
  • Experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Experience in working with the Columnar NoSQL Database like HBase
  • Extensive experience in writing Pig and Hive scripts for processing and analyzing large volumes of structured data.
  • Experience in writing MapReduce programs and using Apache Hadoop API for analyzing the data.
  • Expertise in Spark framework for batch and real-time data processing
  • Experience in converting MapReduce applications to Spark
  • Working knowledge in Spark Streaming and Spark SQL.
  • Good experience on RDBMS technologies like Oracle, SQL server and My SQL
  • Expertise in Core Java, data structures, algorithms, Object Oriented Design (OOD) and Java concepts such as OOP Concepts, Collections Framework, Exception Handling and I/O System.
  • Hands-on experience in J2EE technologies such as Servlets, JSP, EJB, JDBC and developing Web Services providers and consumers using SOAP, REST.
  • Hands-on experience in developing web applications using MVC (Model View Controller) architecture including Spring MVC, Struts, and Servlets.
  • Able to work independently and collaboratively to solve problems and deliver high quality results in a fast-paced, unstructured environment.

TECHNICAL COMPETENCIES:

Big Data Frameworks: Hadoop, Spark, Scala, Hive, Kafka, AWS, Cassandra, HBase, Flume, Pig, Sqoop, Map Reduce, Cloudera, MongoDB

Big data: distribution Cloudera, Amazon EMR

Programming languages: Core Java, Scala, Python, SQL, Shell Scripting

Operating Systems: Windows, Linux (Ubuntu)

Databases: Oracle, SQL Server

Designing Tools: Eclipse

Java Technologies: JSP, Servlets, Junit, Spring, Hibernate

Web Technologies: XML, HTML, JavaScript, JVM, JQuery, JSON

Linux Experience: System Administration Tools, Puppet, Apache

Web Services: Web Service (RESTful and SOAP)

Frame Works: Jakarta Struts 1.x, Spring 2.x

Development methodologies: Agile, Waterfall

Logging Tools; Log4j:

Application / Web Servers: CherryPy, Apache Tomcat, WebSphere

Messaging Services: ActiveMQ, Kafka, JMS

Version Tools: Git, SVN and CVS

Analytics: Tableau, SPSS, SAS EM and SAS JMP

PROFESSIONAL EXPERIENCE:

Confidential, College Park, MD

Sr. Hadoop Developer

Responsibilities:

  • Migrated the existing data from Oracle to HDFS using Sqoop for processing the data.
  • Loaded the recent transactions data from Oracle to HBase
  • Used Apache Avro to de-serialize data from compact binary format to JSon format
  • Designed HBase RowKey and column family structure
  • Combined & Processed data coming from HBase and HDFS using Spark Streaming according to the requests of BI Team
  • Used Apache Spark and Scala language to find patients with similar symptoms in the past and medications used for them to achieve best results.
  • Log data from webservers across the environments is pushed into associated Kafka topic partitions, SparkSQL is used to calculate the most prevalent diseases in each city from this data.
  • Created multiple scripts in Pig Latin or Hive to perform MapReduce jobs for data transformation and cleaning
  • Analysis of Web logs using Hadoop tools for operational and security related activities.
  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Involved in loading data from UNIX file system to HDFS.
  • Involved in creating Hive tables, loading the data using it and in writing Hive queries to analyze the data.
  • Development and ETL Design in Hadoop
  • Developed ETL processes for data warehouse and/or Hadoop environment
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores, NOSQL, Hadoop, PIG, MySQL and Oracle databases
  • Experience in using Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
  • Optimized MapReduce jobs to use HDFS efficiently by using Gzip, LZO, Snappy compression techniques.
  • Imported and exported data into HDFS and Hive using Sqoop.
  • Worked on loading and transforming huge sets of structured, semi structured and unstructured data.
  • Worked on different file formats like XML files, Sequence files, JSON, CSV and Map files using Map Reduce Programs.
  • Continuously monitored and managed Hadoop cluster using Cloudera Manager.
  • Performed POC’s using latest technologies like spark, Kafka, Scala.
  • Responsible for loading data from Teradata database into a Hadoop Hive data warehousing layer, and performing data transformations using Hive
  • Wrote custom MapReduce codes, generated JAR files for user defined functions and integrated with Hive to help the analysis team with the statistical analysis.
  • Worked extensively on creating Oozie workflows for scheduling different jobs of hive, map reduce and shell scripts.
  • Developed continuous flow of data into HDFS from social feeds using Apache Storm Spouts and Bolts.
  • Worked on migrating tables in SQL to Hive using Sqoop.
  • Implemented Kafka/RabbitMQ messaging services to stream large data and insert into database.
  • Developed Map reduce programs for Third Party files to analyze data i.e. Funds sold by a parent company to subsequent chain of companies.
  • Used SparkSQL to query data from Db2, Oracle using the respective connectors available.
  • Involved in data ingestion into HDFS using Sqoop and Flume from variety of sources.

Environment: Cloudera 5.x Hadoop, Linux, IBM DB2, HDFS, Yarn, Impala, Pig, Hive, Sqoop, Spark, Scala, HBase, MapReduce, Hadoop Datalake, Informatica BDM 10

Confidential, Little rock, AR

Sr. Hadoop Developer

Responsibilities:

  • Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
  • Gathering the requirements from the business and performing the requirement analysis on various Data points business are interested to view as part of their output
  • Support all business areas of ADAC with critical data analysis that helps team members make profitable decisions as a forecast expert and business analyst and utilize tools for business optimization and analytics.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Involved in design decision to pick appropriate Maps and Reducers to implement the algorithms.
  • Developed data preparation logic to pull daily sales data using Sqoop and transformation through Hive QL and MapReduce Jobs
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Configured Apache Sentry server to provide authentication to various users and provide authorization for accessing services that they were configured to use.
  • Used Pig as ETL tool to do transformations, joins and pre-aggregations before loading data onto HDFS.
  • Involved in designing tables in Hive and developing Hive Query using HQL for batch analysis to monitor Key Performance Indicator.
  • Scheduling batch jobs in Oozie.
  • Implemented JDBC call to Hive Warehouse and batch framework to invoke the MapReduce Jobs on Scheduled basis
  • Reviewing peer table creation in Hive, data loading and queries.
  • Involved in analyzing system failures, identifying root causes and recommended course of actions.
  • Developed Scala programs to perform data scrubbing for unstructured data
  • Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
  • Worked with Avro Data Serialization system to work with JSON data formats.
  • Exported the result set from Hive to Oracle Db using Sqoop after processing the data.
  • Assisted in designing, building, and maintaining database to analyze Confidential cycle of claim processing and transactions.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Monitored System health and logs and responded accordingly to any warning or failure conditions through the Cloudera Manager.
  • Job Scheduling using Oozie and tracking progress.
  • Worked extensively in creating map Reduce to power data for search and aggregation.
  • Wrote MapReduce Programs for different types of input formats like JSON, XML and CSV formats.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior

Environment: HortonWorks 2.2 Hadoop, Linux, HDFS, Yarn, Pig, Hive, Sqoop, Spark, Scala, Flume, MapReduce, Oracle DB, Java

Confidential, Mayfield Village, OH

Hadoop Developer

Responsibilities:

  • Used Kafka as the messaging system to collect data sent by the sensor in the cars.
  • Implemented Big Data platforms using Cloudera CDH4 as data storage, retrieval and processing systems.
  • Configured Apache Sentry server to provide authentication to various users and provide authorization for accessing services that they were configured to use.
  • Worked on Kafka while dealing with raw data, by transforming into new Kafka topics for further consumption.
  • Installed and configured Hortonworks Sandbox as part of POC involving Kafka-Storm-HDFS data flow.
  • Log data from webservers across the environments is pushed into associated Kafka topic partitions, SparkSQL is used to calculate the most prevalent diseases in each city from this data.
  • Wrote Java code to de-serialize data from protocol buffer format to JSon
  • Loaded unstructured and semi-structured data into Hadoop cluster coming from different sources using Flume.
  • Used Sqoop to import data from Oracle to HDFS.
  • Designed HBase RowKey and column family structure
  • Used the HBase Java API to migrate data between the HDFS and HBase
  • Wrote the MapReduce programs to process the data according to requests by BI Teams
  • Wrote Hive custom UDF to analyze data by given schema.
  • Worked with BI teams and developed Pig scripts for ad hoc queries.
  • Implemented Virtualization of data sources using Spark by connecting to DB2, Oracle using Spark connectors.
  • Used SparkSQL to query data from Db2, Oracle using the respective connectors available.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala
  • Analyzed various RDDS using Scala, Python with Spark.
  • Worked on the conversion of existing MapReduce batch applications to Spark for better performance.
  • Worked on different file formats (ORCFILE, RCFILE, SEQUENCEFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO)
  • Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data
  • Implemented daily workflow for extraction, processing and analysis of data with Oozie
  • Responsible for troubleshooting Spark/MapReduce jobs by reviewing the log files
  • Wrote PIG scripts and executed by using Grunt shell.
  • Big data analysis using Pig and User defined functions (UDF)
  • Worked on the conversion of existing MapReduce batch applications to Spark for better performance

Environment: Apache Hadoop 2.x, MapReduce, HDFS, Hive, Pig, Sqoop, Oozie, Kafka, Oracle 11g, Linux, Java 7, Eclipse

Confidential, Bentonville, AR

Hadoop Developer

Responsibilities:

  • Importing Large Data Sets from DB2 to Hive Table using Sqoop.
  • Created Hive Managed and External Tables as per the requirements
  • Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
  • Installed and configured Hive and also written Hive UDFs in java and python
  • Integrated the hive warehouse with HBase
  • Migrating the needed data from MySQL into HDFS using Sqoop and importing various formats of flat files into HDFS.
  • Defined workflows using Oozie.
  • Used Hive to create partitions on hive tables and analyzes this data to compute various metrics for reporting.
  • Created Data model for Hive tables
  • Load the data into HBase tables for UI web application
  • Designing and developing tables in HBase and storing aggregating data from Hive
  • Developing Hive Scripts for data aggregating and processing as per the Use Case.
  • Writing Java Custom UDF's for processing data in Hive.
  • Developing and maintaining Workflow Scheduling Jobs in Oozie for importing data from RDBMS to Hive.
  • Created Hive Tables as per the requirement and defined external tables with appropriate static and dynamic partitions, intended for efficiency.
  • Implemented Partitioning, Bucketing in Hive for better organization of the data
  • Optimized Hive queries for performance tuning.
  • Involved with the team of fetching live stream data from DB2 to HBase table using Spark Streaming and Apache Kafka.
  • Supported Map Reduce Programs those are running on the cluster
  • Monitored System health and logs and respond accordingly to any warning or failure conditions.
  • Managing and reviewing application log files.
  • Ingest the application logs into HDFS and processes the logs using map reduce jobs
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.

Environment: s: Hadoop v2.6.0, HDFS, CDH 5.3.x, Map Reduce, HBase, Sqoop, Core Java, Hive, Oozie DB, Spark Streaming and Apache Kafka

Confidential, Eden Prairie, MN

Java Developer

Responsibilities:

  • Involved in development, testing and maintenance process of the application
  • Used Spring MVC framework to implement the MVC architecture.
  • Developed Stored Procedures, Triggers and Functions in Oracle.
  • Developed spring services, DAO's and performed object relation mappings using Hibernate.
  • Involved in understanding the business processes and defining the requirements.
  • Involved in designing, developing and deploying reports in MS SQL Server environment using SSRS-2008 and SSIS in Business Intelligence Development Studio (BIDS).
  • Used ETL (SSIS) to develop jobs for extracting, cleaning, transforming and loading data into data warehouse.
  • Worked with Cassandra Query Language (CQL) to execute queries on the data persisting in the Cassandra cluster.
  • Developed database objects in SQL Server 2005 and used SQL to interact with the database during to troubleshoot the issues.
  • Updated and saved the required data in the DB2 database using JDBC, corresponding to actions performed in the struts class.
  • Involved in bug fixing and resolving issues with the QA.
  • Developed SQL scripts to store data validation rules in Oracle database.
  • Build test cases and performed unit testing.
  • Logging done using Log4j.
  • Used CVS for version control.
  • Responsible for development of Business Services.
  • Developed Business Rules for the project using Java.
  • Developed portal screens using JSP, Servlets, and Struts framework.
  • Developed the test plans and involved in testing the application.
  • Implementing the Design Patterns like MVC-2, Front Controller, Composite view and all Struts framework design patterns to improve the performance.
  • Re-engineered OMT Wholesale Internet Service Engine (WISE) using an “n” tiered architecture involving latest technologies like EJB, CORBA, XML and JAVA.
  • Involved in Java application testing and maintenance in development and production.
  • Involved in developing the customer form data tables. Maintaining the customer support and customer data from database tables in MySQL database.
  • Involved in mentoring specific projects in application of the new SDLC based on the Agile Unified Process, especially from the project management, requirements and architecture perspectives.
  • Designed and developed Views, Model and Controller components implementing MVC Framework

Environment: s: JAVA 1.6, J2EE1.6, Servlets, JDBC, Spring, Hibernate3.0, JSTL, JSP2, JMS, Oracle10g, Web Services, SOAP, Restful, Maven, Apache AXIS, SOAP UI, XML1.0, JAXB2.1, JAXP, HTML, JavaScript, CSS3, AJAX, JUnit, Eclipse, WebLogic10.3, SVN, Shell Script

Confidential, Herndon, VA

Java/J2EE Programmer

Responsibilities:

  • Involved in full Confidential cycle including requirements analysis, high level design, detailed design, UMLs, data model design, coding, testing and creation of functional and technical design documentation.
  • Used Spring Framework for MVC architecture with Hibernate to implement DAO code and also used Web Services to interact other modules and integration testing.
  • Developed and implemented GUI functionality using JSP, JSTL, Tiles and AJAX.
  • Designed database and involved in developing SQL Scripts.
  • Used SQL navigator as a tool to interact with DB Oracle 10g.
  • Developed portal screens using JSP, Servlets, and Struts framework.
  • Developed the test plans and involved in testing the application.
  • Using RUP and Rational Rose, developed Use Cases, created Class, Sequence and UML diagrams.
  • Application Modeling, developing Class diagrams, Sequence Diagrams, Architecture / Deployment diagrams using IBM Rational Software Modeler and publishing them to web perspective with Java Doc.
  • Participation did in Design Review sessions for development / implementation discussions.
  • Designed & coded Presentation (GUI) JSP’s with Struts tag libraries for Creating Product Service Components (Health Care Codes) using RAD.
  • Developing Test Cases and unit testing using JUnit
  • Extensive use of AJAX and JavaScript for front-end validations, and JavaScript based component development using EXT JS Framework with cross browser support.
  • Appropriate use of Session handling, data Scope levels within the application.
  • Designed and developed DAO layer with Hibernate3.0 standards, to access data from IBM DB2 database through JPA (Java Persistence API) layer creating Object-Relational Mappings and writing PL/SQL procedures and functions
  • Integrating Spring injections for DAOs to achieve Inversion of Control, updating Spring Configurations for managing Java objects using callbacks
  • Application integration with Spring Web Services to fetch data from external Benefits application using SOA architecture, configuring WSDL based on SOAP specifications and marshalling and un-marshalling using JAXB
  • Prepared and executed JUNIT test cases to test the application service layer operations before DAO integration
  • Creating test environments with WAS for local testing using test profile. And interacting with Software
  • Quality Assurance (SQA) end to report and fix defects using Rational Clear Quest.
  • Implementing the Design Patterns like MVC-2, Front Controller, Composite view and all Struts framework design patterns to improve the performance.
  • Used Clear case, and also subversion for maintaining the source version control.
  • Wrote Ant scripts to automate the builds and installation of modules.
  • Involved in writing Test plans and conducted Unit Tests using JUnit.
  • Used Log4j for logging statements during development.
  • Design and implementation of log data indexing and search module, and optimization for performance and accuracy.
  • To provide a full text search capability for archived log data, utilizing Apache Lucene library.
  • Involved in the testing and integrating of the program at the module level.
  • Worked with production support team in debugging and fixing various production issues.

Environment: JDK 1.5, JSP, JSP Custom Tag libraries, JavaScript, EXT JS, AJAX, XSLT, XML, DOM4J 1.6, EJB, DHTML, Web Services, SOA, WSDL, SOAP, JAXB, IBM RAD, IBM WebSphere Application server, IBM DB2 8.1, UNIX, UML, IBM Rational ClearCase, JMS, Spring Framework, Hibernate 3.0, PL/SQL, JUNIT 3.8, log4j 1.2, Ant 2.7

Confidential

Java Developer

Responsibilities:

  • Coordinated with the business analysts, project managers to analyze new propose Ideas/Requirements, designed the integrated tool, developed and implemented all the modules.
  • Designed database and involved in developing SQL Scripts.
  • Used Case Studio for developing the DB Design and generating SQL file for various databases.
  • Contributed significantly in designing the Object Model for the project as senior developer and Architect.
  • Responsible for development of Business Services.
  • Developed Business Rules for the project using Java.
  • Developed portal screens using JSP, Servlets, and Struts framework.
  • Developed the test plans and involved in testing the application.
  • Implementing the Design Patterns like MVC-2, Front Controller, Composite view and all Struts framework design patterns to improve the performance.
  • Re-engineered OMT Wholesale Internet Service Engine (WISE) using an “n” tiered architecture involving latest technologies like EJB, XML and JAVA.
  • Used CVS for maintaining the source version control.
  • Used Log4j for logging statements during development.
  • Design and implementation of log data indexing and search module, and optimization for performance and accuracy. To provide a full text search capability for archived log data, utilizing Apache Lucene library.
  • Used Spring Framework for MVC architecture with Hibernate to implement DAO code and also used Web Services to interact other modules and integration testing.
  • Developed and implemented GUI functionality using JSP, JSTL, Tiles and AJAX.
  • Designed database and involved in developing SQL Scripts.
  • Used SQL navigator as a tool to interact with DB Oracle 10g.
  • Developed portal screens using JSP, Servlets, and Struts framework
  • Involved in writing Test plans and conducted Unit Tests using JUnit.
  • Used Log4j for logging statements during development

Environment: Java, J2EE, Struts 1.2/2.0, JDK, JSP, Servlets, EJB 3.0, Java Beans, JavaScript, HTML XML, Eclipse, CORBA SSL, JUnit, Log4j, CVS, Deployment in WebLogic

Hire Now