We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Sterling, VA

PROFESSIONAL SUMMARY:

  • 8+ years of professional IT experience which includes experience in Hadoop Big Dataecosystem and JAVA/J2EE related technologies.
  • Hands on experience in using Hadoop ecosystem components like Hadoop, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Spark Streaming, Spark SQL, Oozie, ZooKeeper, Kafka, Flume, MapReduce and Yarn.
  • Hands on experience in using various Hadoop distributions (Cloudera, Hortonworks).
  • Good understanding on architecture and components of Spark, and efficient in working with Spark Core, SparkSQL, Spark streaming.
  • Implemented Spark RDD Transformations, actions to migrate Map reduce algorithms.
  • Good experience working with both batch and real - time processing using Spark framework.
  • Implemented Spark using Scala, Java and utilizing Data frames and Spark SQL API for faster processing of data.
  • Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Experience in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
  • Experience in Configure, Design, Implement and monitor Kafka Cluster and connectors.
  • Good Knowledge on Kafka in understanding and performing thousands of megabytes of reads and writes per second on streaming data.
  • Experience in loading streaming data into HDFS and, in performing streaming analytics using stream processing platforms like Flume and Apache Kafka messaging system.
  • Experience in importing and exporting data using stream processing platforms like Flume and Kafka.
  • Good understanding and knowledge of NoSQL databases like MongoDB, Hbase and Cassandra.
  • Involved in design Cassandra data model, used CQL (Cassandra Query Language) to perform Crud Operations on Cassandra file system.
  • Good knowledge in querying data from Cassandra for searching grouping and sorting.
  • Good knowledge in performance troubleshooting and tunning Cassandra clusters and understanding of Cassandra Data Modeling based on applications.
  • Experience in using MongoDBfor storing large data objects, real-time analytics, Logging and Full Text search.
  • Worked on HBase to load and retrieve data for real time processing using Rest API.
  • Experienced in designing different time driven and data driven automated workflows using Oozie.
  • Worked on developing ETL Workflows on the data obtained using Python for processing it in HDFS and HBase using Oozie.
  • Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
  • Capable of using AWS utilities such as EMR, S3 and cloud watch to run and monitor Hadoop and spark jobs on AWS.
  • Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.
  • Experience in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
  • Experienced in developing UDFs for Pig and Hive using Java.
  • Very good experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Good experience working with different File Formats like Text file, JSON, Avro file, ORC for HIVE Querying and Processing
  • Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
  • Experienced in using Pig scripts to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
  • Experience with Informatica ETL for data movement, applying data transformations and data loads.
  • Good experience with Talend open studio for designing ETL Jobs for Processing of data.
  • Good Knowledge in UNIX Shell Scripting for automating deployments and other routine tasks.
  • Experience in relational databases like Oracle, MySQL and SQL Server.
  • Very good experience in complete project life cycle design development testing and implementation of Client Server and Web applications.
  • Experience in using various IDEs Eclipse, NetBeans and repositories SVN and Git.
  • Experience of using build tools Ant, Maven.
  • Experience in Java development GUI using JFC, Swing, JavaBeans, and AWT.
  • Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC, SOAP and RESTful web services.
  • Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle.
  • Ability to adapt evolving technology inkeen sense of responsibility and accomplishment.

TECHNICAL SKILLS:

Platforms: Windows XP/7/8,10 Ubuntu, Windows Server R2 2008 & 2012, UNIX, Red Hat Linux

Big Data Ecosystems: HDFS, Sqoop, Hive, Pig, Map Reduce, Apache Spark, Oozie, HBase, Zookeeper, AWS, Cassandra, MongoDB,Kafka

Languages: Java/J2EE, Python, Scala, HTML, XML

Scripting Languages: JSP & Servlets, JavaScriptand Shell

Databases: MSSQL, MySQL, Oracle

IDE Tools: Eclipse, NetBeans

Application Servers: Apache Tomcat, Web Sphere

Methodologies: Agile, UML, Design Patterns

PROFESSIONAL EXPERIENCE:

Confidential, Sterling, VA

Sr. Hadoop Developer

Responsibilities:

  • Responsible for ingesting large volumes of user behavioral data and customer profile data to Analytics Data store.
  • Worked on troubleshooting spark application to make them more error tolerant.
  • Experienced on Scala programming as a part of spark batch and streaming data pipelines development.
  • Used Spark Data Frame Operations to perform required Validations in the data and to perform analytics on the Hive data.
  • Developed Scala scripts, UDAFs with Data frames in Spark for Data Aggregation queries.
  • Used Spark SQL with Python for creating data frames and performed transformations on data frames like adding schema manually, casting, joining data frames before storing them.
  • Involved in performance tuning of Spark jobs using cache and using complete advantage of cluster environment.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and persists into Cassandra.
  • Experience in creating Kafka producer and Kafka consumer for Spark streaming.
  • Performed Data Ingestion from multiple internal clients using Apache Kafka.
  • Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
  • Tested the cluster Performance using Cassandra-stress tool to measure and improve the Read/Writes.
  • Used Apache Kafka to gather log data and fed into HDFS.
  • Implemented Python script to call the Cassandra Rest API, performed transformations and loaded the data into Hive.
  • Worked with Amazon Web Services (AWS) using EC2 for hosting and Elastic map reduce (EMR) for data processing with S3 as storage mechanism.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
  • Responsible for continuous monitoring and managing Elastic MapReduce cluster through AWS console.
  • Used Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Involved in developing a pipe line to load the data into tables using Spark streaming and Kafka which is integrated with ZooKeeper.
  • Created various kinds of reports using Tableau based on the client's needs.
  • Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Started using ApacheNiFi to copy the data from local file system to HDP.
  • Experience on tracking the data flow in a real time manner using Nifi.
  • Tested Apache Tez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
  • Used Apache Tez for highly optimized data processing.
  • Used ELK (Elasticsearch, Logstash and Kibana) for name search pattern for a customer.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Responsible for developing PIG Latin scripts enabling the extraction of data from the web server output files to load into HDFS.
  • Used Pig in three distinct workloads like pipelines, iterative processing and research.
  • Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
  • Used Hive to do transformations, joins, filter and some pre-aggregations after storing the data to HDFS.
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
  • Developed SQOOP scripts for importing and exporting data into HDFS and Hive.
  • Worked in Agile Methodology and used JIRA to maintain the stories about project.
  • Experience in using version control tools like GITHUB to share the code snippet among the team members.

Environment: Hadoop, Map Reduce, HDFS, AWS, Hive, Java, Eclipse, Horton works, Apache Kafka, Pig, Cassandra, Scala, Nifi, Spark, Tableau, ELK, Spark Streaming, Sqoop, Agile, Python, Apache Tez

Confidential, Santa Clara, CA

Hadoop Developer

Responsibilities:

  • Good in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
  • Used Spark RDD for faster Data sharing.
  • Experienced in querying data using SparkSQL on top of Spark engine for faster data sets processing.
  • Worked on Ad hoc queries, Indexing, Replication, Load balancing, Aggregation in MongoDB.
  • Extracted and restructured the data into MongoDB using import and export command line utility tool.
  • Worked on the large-scale Hadoop YARN cluster for distributed data processing and analysis using Hive and MongoDB.
  • Wrote XML scripts to build Oozie functionality.
  • Experience in workflow Scheduler Oozie to manage and schedule job on Hadoop cluster for generating reports on Day and weekly basis.
  • Used Flume to collect, aggregate, and store the web log data from various sources like web servers, mobile and network devices and pushed to HDFS.
  • Implemented custom serializer, interceptor, source and sink in Flume to ingest data from multiple sources.
  • Involved in writing query using Impala for better and faster processing of data.
  • Responsible for analyzing and cleansing raw data by performing Hive/Impala queries and running Pig scripts on data.
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
  • Worked on partitioning the HIVE table and running the scripts in parallel to reduce the run time of the scripts.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Programmed pig scripts with complex joins like replicated and skewed to achieve better performance.
  • Developed data pipeline expending Pig and Java MapReduce to consume customer behavioral data and financial antiquities into HDFS for analysis.
  • Designing & creating ETL jobs through Talend to load huge volumes of data into MongoDB, Hadoop Ecosystem and relational databases.
  • Created Talend jobs to connect to Quality Stage using FTP connection and process data received from Quality Stage.
  • Migrated data from Mysql server to Hadoop using Sqoop for processing data.
  • Exported the analyzed data to the relational databases using Scoop for visualization and to generate reports for the BI team.
  • Experienced in developing Shell scripts and Python scripts for system management.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: CDH 3.x and 4.x, Java,Hadoop, Python,Map Reduce, Hive, Pig, Impala, Flume, MongoDB, Sqoop,Talend,Spark, Mysql,AWS.

Confidential - Birmingham, Al

Hadoop Developer

Responsibilities:

  • Gathered the business requirements from the Business Partners and Subject Matter Experts.
  • Developed shell scripts to automate loading of tables from database and performed the incremental loads using Sqoop.
  • Used to manage and review the Hadoop log files.
  • Extracted files from Hbase through Sqoop and placed in HDFS and processed
  • Generated Java APIs for retrieval and analysis on No-SQL databases such as Hbase.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in HDFS maintenance and loading of structured and unstructured data.
  • Wrote MapReduce job using Java API.
  • Used Pig to convert the fixed width file to delimited file.
  • Extensively used Pig for data cleansing.
  • Installed and configured Pig and written PigLatin scripts.
  • Developing Scripts and Batch Job to schedule various Hadoop Program.
  • Written Hive queries for data analysis to meet the business requirements.
  • Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
  • Transformed and aggregated data for analysis by implementing work flow management of Sqoop, Hive and Pig scripts.
  • Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
  • Implemented testing scripts to support test driven development and continuous integration.

Environment: Hadoop, MapReduce, HDFS, Hive, Java jdk1.7, Pig, Linux, XML, Hbase, MySQL, MySQL Workbench.

Confidential - Baton Rouge, LA

Java/Hadoop Developer

Responsibilities:

  • Involved in Installation and configuration of JDK, Hadoop, Pig, Sqoop, Hive, HBase on Linux environment. Assisted with performance tuning and monitoring.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive
  • Worked on creating MapReduce programs to parse the data for claim report generation and running the Jars in Hadoop. Co-ordinated with Java team in creating MapReduce programs.
  • Worked on creating Pig scripts for most modules to give a comparison effort estimation on code development.
  • Collaborated with BI teams to ensure data quality and availability with live visualization
  • Created HIVE Queries to process large sets of structured, semi-structured and unstructured data and store in Managed and External tables
  • Created HBase tables to store variable data formats coming from different portfolios Performed real-time analytics on HBase using Java API and Rest API.
  • Performed test run of the module components to understand the productivity.
  • Written Java program to retrieve data from HDFS and providing REST services.
  • Shared responsibility and assistance for administration of Hadoop, Hive, Sqoop, HBase and Pig in team
  • Shared the knowledge of Hadoop concepts with team members.
  • Used JUnit for unit testing and Continuum for integration testing.

Environment:: Cloudera, Hadoop, Pig, Sqoop, Hive, HBase, Java, Eclipse, MySQL, MapReduce

Confidential

Java Developer

Responsibilities:

  • Involved in various phases of Software Development Life Cycle, such as requirements gathering, modeling, analysis, design and development.
  • Designed and implemented application using JSP, Spring MVC, Spring IOC, Spring Annotations, SpringBatch, Hibernate, Oracle and WebLogic server.
  • Implemented and designed user interface for web-based customer application.
  • Designed business applications using web technologies like HTML, XHTML, and CSS based on the W3C standards.
  • Involved in writing application level code to interact with APIs, Web Services using AJAX, JSON and XML.
  • Created SQL queries, Sequences, Views for the backend database in Oracle database.
  • Used Hibernate Framework for object relational mapping and persistence
  • Written SQL Statements for retrieving and updating data with varied complexity.
  • Involved in writing JUnit test cases and suits using Eclipse IDE.
  • Used Maven for building and deployment purpose.
  • To provide technical support to the client, understanding of the issues and provide quick solution.
  • Implemented clustering of Oracle and WebLogic server to achieve High availability and Load balancing.
  • Used Log4j package for debugging, info and error tracings.
  • Extensively used Bugzilla as an issue tracking and bug-reporting tool

Environment: Java, Spring, SOAP/REST web services, Junit, SVN, Maven, JavaScript, JQuery, Angular JS, HTML, CSS, AJAX, Oracle, Agile, Scrum

Confidential

Java Developer

Responsibilities:

  • Documented functional and technical requirements, wrote Technical Design Documents.
  • Developed analysis level documentation such as Use Case, Business Domain Model, Activity & Sequence and Class Diagrams.
  • Developed presentation layer components comprising of JSP, AJAX, Servlets and JavaBeans using the Struts framework.
  • Implemented MVC (Model View Controller) architecture.
  • Developed XML configuration and data description using Hibernate.
  • Developed Web services using CXF to interact with Mainframe applications.
  • Responsible for the deployment of the application in the development environment using BEA WebLogic 9.0 application server.
  • Participated in the configuration of BEA WebLogic application server.
  • Designed and developed front end user interface using HTML and Java Server Pages (JSP) for customer profile setup.
  • Developed ANT Script to compile the Java files and to build the jars and wars.
  • Responsible for Analysis, Coding and Unit Testing and Production Support.
  • Used JUnit for testing Modules.

Environment: Java 1.6, J2EE, JDBC, Struts Framework, Hibernate, Servlets, MVC, JSP, Web Services, CXF, SOAP, BEA WebLogic 9, Oracle 9i, JavaScript, XML, HTML, Ant, JUnit, SVN, My Eclipse

We'd love your feedback!