We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Dallas, TX

PROFESSIONAL SUMMARY:

  • 8 years of professional Java Development experience which includes around 4 years of Solid understanding and hands - on experience in Big Data development on Hadoop and Spark including data ingestion, data cleansing and implementing complete Hadoop solutions.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Data Node, Name Node and Map-Reduce concepts.
  • Very good understanding of Hadoop ecosystem components like Pig, Hive, HDFS, MapReduce, HBase, Sqoop, Impala, Kafka, Spark, Scala, Oozie, and Zookeeper.
  • Experience in Data Analysis, Data Validation, Data Verification, Data Cleansing, Data Completeness and identifying data mismatch.
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4,CDH5) distributions and on Amazon web services (AWS).
  • Exposure to administrative tasks such as installing Hadoop and its ecosystem components
  • Hands on experience on Amazon AWS cloud services: EC2, S3, Data pipeline and EMR
  • Experience in installation, configuration, management and deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster.
  • Experience in analyzing data using Hive QL, Pig Latin, SPARK (with Python and Scala) and custom MapReduce programs in Java.
  • Worked on Performance Tuning of Hadoop jobs by applying techniques such as Map Side Joins, Partitioning and Bucketing. Having good knowledge in NoSQL databases like HDFS, MongoDB and Cassandra.
  • Experience in working with Map Reduce, PIG scripts &HIVE query Language and extending Hive and Pig core functionality by writing custom UDFs
  • Worked with Importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
  • Successfully loaded files to Hive and HDFS by using Flume and extensively worked on collecting, aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked with Cassandra for non-relational data storage and retrieval on enterprise use cases.
  • Excellent understanding / knowledge of Spark architecture and various components such as Spark streaming, Spark SQL, Spark R programming paradigm
  • Extensive experience in using the MOM with Active MQ, Apache KAFKA and Oozieand Zookeeper.
  • Experience in cluster automation using Puppet and Shell Scripting
  • Written Apache Spark streaming API on Big Data distributions in the active cluster environment.
  • Expertise in n-tier and three-tier Client/Server development architecture and Distributed Computing Architecture.
  • Profound knowledge of the principle of DW using Fact Table, Dimension Table, Star Schema and Snowflake Schema modeling.
  • Worked on HDFS commands, Impala, Hive, Sqoop, Pig and developed applications using Scala and spark-SQL programming.
  • Solid experience in developing workflow using Oozie for running Map Reduce jobs and Hive Queries.
  • Involved in migrating MongoDB version 2.4 to2.6 and implementing new security features.
  • Detailed understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Waterfall and Agile.
  • Experience in writing and implementing SQL queries in Cassandra or SQL shell for analytics
  • Developed scripts for generating serialization table using Cassandra and invoked them using CQLSH.
  • Experience in analyzing and deploying middleware services in Web Logic container.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Design and development of web-based applications using different Web and application servers such as ApacheTomcat, Web Sphere, JBoss and WebLogic.
  • Involved in data modeling and sharding and replication strategies in MongoDB.
  • Designed and developed business components using spring core and navigation from presentation layer using Spring MVC.
  • Developing applications using all Java/J2EE technologies like Servlets, JSP, EJB, JDBC, JNDI, JMS .
  • Proficient in programming with Java/J2EE and strong experience in technologies such as JSP, Servlets, Struts, Spring, Hibernate, EJBs, Session Beans, JDBC, JavaScript, HTML, Java Script Libraries, Web Services.
  • Efficient in packaging & deploying J2EE applications using ANT, Maven & Cruise Control on WebLogic, WebSphere &JBoss. Worked on the performance & load test related tools like JProfiler and JMeter.

TECHNICAL SKILLS:

Hadoop Ecosystem: HDFS, MapReduce, Pig, Hive, SparkSQL, Hbase, Apache, Crunch, Solr, Flume, Sqoop, Spark, Streaming, Spark, Scala, Oozie, Zookeeper, Hue, Kafka,AVRO and JSON

Web Technologies: Core Java,J2EE, Servlets, JSP, JDBC, XML, AJAX, SOAP, WSDL

Methodologies: Agile, Waterfall Model, UML

Frameworks: Hibernate, Spring, Apache Maven, Struts, MVC, MAPR

Programming Languages: C, C++, Java, Python, Scala, Shell Scripting, Unix

Data Bases: MySQL, Oracle, SQL Server, DB2

No SQL: Hbase, Cassandra, MongoDB

Web Services: WebLogic, Apache Tomcat Server, Web Sphere

Monitoring & Reporting Tools: Nagios,Ganglia, Cloudera Manager

PROFESSIONAL EXPERIENCE:

Confidential, Dallas, TX

Sr. Hadoop Developer

Responsibilities:

  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
  • Managed and reviewed Hadoop log files.
  • Tested raw data and executed performance scripts.
  • Shared responsibility for administration of Hadoop, Hive and Pig.
  • Built wrapper shell scripts to hold this Oozie workflow.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Involved in creating Hadoop streaming jobs using Python.
  • Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Worked on Mapreduce Joins in querying multiple semi-structured data as per analytic needs.
  • Used SQOOP to dump data from MySQL relational database into HDFS for processing and exporting data to RDMS.
  • Created many Java UDF and UDAFs in hive for functions that were not pre-existing in Hive like the rank, Csum, etc.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Developed POC for Apache Kafka.
  • Gained knowledge on building Apache Spark applications using Scala.
  • Do various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
  • Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS Filers.
  • Created concurrent access for hive tables with shared and exclusive locking that can be enabled in hive with the help of Zookeeper implementation in the cluster.
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Extracted files from NoSQL database like CouchDB, Cassandra using Sqoop.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.

Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, ZooKeeper, CDH3, MongoDB, Cassandra, CouchDB, Oracle11, NoSQL, Java/J2EE, JDBC, Agile, SVN, Git, Eclipse, SUnix/Linux, Spark, Kafka, Amazon web services.

Confidential, Austin, TX

Sr. Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Kafka, Pig, Hive and Map Reduce.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data intoHDFS for analysis.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting. POC work is going on using Spark and Kafka for real time processing.
  • Involved in developing Hive DDLs to create, alter and drop Hive tables and Storm& Kafka.
  • Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
  • Involved in loading data from UNIX file system to HDFS. Installed and configured Hive and also written Hive UDFs. And Cluster coordination services through Zoo Keeper.
  • Involved in Sqoop, HDFS Put or Copy from Local to ingest data and Map Reduce jobs.
  • Developed Hadoop streaming Map/Reduce works using Python.
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
  • Extracted files from NoSQL database like CouchDB, Cassandra using Sqoop .
  • Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes and VMware Vm's as required in the environment.
  • Expertise in AWS data migration between different database platforms like SQL Server to Amazon Aurora using RDS tool.
  • Developed the Sqoop scripts in order to make the interaction between HDFS and RDBMS (Oracle, MySQL).
  • Supported Map Reduce Programs those are running on the cluster.
  • Gained experience in managing and reviewing Hadoop log files.
  • Involved in scheduling Oozie workflow engine to run multiple pig jobs.
  • Experienced in running Hadoop streaming jobs to process terabytes of data in xml format.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Responsible to manage data coming from different sources
  • Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
  • Responsible for developing data pipeline using Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
  • Developed Hadoop streaming Map/Reduce works using Python.
  • Administrating Tableau Server backing up the reports and providing privileges to users.
  • Represented the retrieved results through tableau.
  • Used Eclipse and ant to build the application. Also NoSQL database with Cassandra.

Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, Spark, AWS EC2, S3, RDS, Kafka, Solr, LINUX, Cloudera, Big Data, Java APIs, Java collection, Python, SQL, NoSQL, Cassandra, Tableau, HBase.

Confidential, Bronx, NY

Hadoop Developer

R esponsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in gathering requirements and participating in the Agile planning meetings in-order to finalize the scope of each development.
  • Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Design, Implement and Administer LinuxWeb and Mysql Database Servers.
  • Worked on importing and exporting data from Oracle and DB2into HDFS and HIVE using Sqoop.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Formulated procedures for installation of Hadoop patches, updates and version upgrades.
  • Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
  • Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Responsible for migrating tables from traditional RDBMS into Hive tables using Sqoop and later generate required visualizations and dashboards using Tableau.
  • Responsible for Load, aggregate and move large amounts of log data using Flume.
  • Developed the Sqoop scripts in order to make the interaction betweenHDFS and RDBMS (Oracle, MySQL).
  • Involved in loading data from UNIX file system to HDFS.
  • Worked on loading and transformation of large datasets of structured, semi structured and unstructured data into Hadoop ecosystem.
  • Extracted files from NoSQL database like CouchDB, Cassandra using Sqoop.
  • Responsible to manage data coming from different data sources.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Used Hive and created Hive tables and involved in data loading and writing custom Hive UDF's.
  • Created Partitions, Dynamic Partitions and Buckets for granularity and optimization using HiveQL.
  • Involved in identifying job dependencies to design workflow for Oozie and resource management for YARN.
  • Used Cassandra to store the analyzed and processed data for scalability.
  • Responsible for maintaining and implementing code versions using CVS for the entire project.

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Impala, Cassandra, Cloudera Manager, ETL, Sqoop, Flume, Oozie, ZooKeeper, Java (jdk 1.6), MySQL, Eclipse, Tableau.

Confidential, Windsor, CT

Java/ Hadoop Developer

Responsibilities:

  • Developed an end to end vertical slice for a JEE based application using popular frameworks Spring, Hibernate, JSF, Facelets, XHTML, Maven, and AJAX by applying OO Design Concepts, JEE, and GoF Design Patterns and hosted on AWS cloud.
  • Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 9i database.
  • Tuned SQL statements, Hibernate Mapping, and Web Sphere application server to improve performance, and consequently met the SLAs.
  • Worked on STRUTS, Tiles, AJAX, for developing the application.
  • Developed the application based on MVC Architecture, integrating JSP with Hibernate and struts frameworks.
  • Implemented Service Oriented Architecture (SOA) using JMS for sending and receiving messages while creating web services.
  • Developed Web Services for data transfer from client to server and vice versa using Apache Axis, SOAP, WSDL, and UDDI.
  • Designed User Interface using Java Server Pages (JSP),Cascading Style Sheets (CSS), and XML.
  • Collected business requirements and wrote functional specifications and detailed design documents.
  • Implemented various J2EE Design patterns like Singleton, Service Locator, Business Delegate, DAO, Transfer Object, and SOA.
  • Detected and fixed transactional issues due to wrong exception handling and concurrency issues because of unsynchronized block of code.
  • Worked on AJAX to develop an interactive Web Application and JavaScript for Data Validations.
  • Used Subversion to implement version control System.
  • Build ANT Script for the application and used Log4J for debugging.
  • UsedMVCStruts framework for application design.
  • Assisted in designing, building, and maintaining database to analyze life cycle of checking and debit transactions.
  • Implemented Custom File loader for Pig so that we can query directly on the large Data files such as build logs.
  • Create Hive external tables on the map reduce output before partitioning, bucketing is applied on top of it.
  • Optimized Mapreduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Used Web Sphere to develop JAX-RPC web services.
  • Developed Unit Test Cases, and used JUNIT for Unit Testing of the application.
  • Involved in the design team for designing the Java Process Flow architecture.
  • Worked with QA, Business and Architect to solve various Defects in to meet deadlines.

Environment: Java J2EE, JNDI, Spring, Hibernate, Struts,MVC, AJAX, Web Sphere, Maven, Java Script, JUnit, XHTML, HTML, XML, CSS, Subversion, Web Services, Web Sphere, DB2, SQL, UML, Oracle, Eclipse, Windows, Hadoop, HDFS, Hive, Pig, MapReduce.

Confidential

Java J2EE Developer

Responsibilities:

  • Involved in the analysis, design, and development and testing phases of Software Development Life Cycle (SDLC).
  • Developed test cases and performed unit testing using JUnit.
  • Designed and developed framework components, involved in designing MVC pattern using Struts and Spring framework.
  • Used Ant builds script to create EAR files and deployed the application in Web Logic app server.
  • Coded HTML pages using CSS for static content generation with JavaScript for validations.
  • Implemented Java Naming/Directory Interface (JNDI) to support transparent access to distributed components, directories and services.
  • Used JDBC API to connect to the database and carry out database operations. Coded JavaServer Pages for the Dynamic front end content that use Servlets and EJBs.
  • Designed UI screens using JSP and HTML.
  • Used JSP and JSTL Tag Libraries for developing User Interface components.
  • Used JUnit framework for unit testing and ANT to build and deploy the application on WebLogic Server.
  • Used SOAP for exchanging XML based messages.
  • Responsible for developing Use case, Class diagrams and Sequence diagrams for the modules using UML and Rational Rose.
  • Actively involved in designing and implementing Factory method, Singleton, MVC and Data Access Object design patterns.
  • Web services used for sending and getting data from different applications using SOAP messages. Then used DOM XML parser for data retrieval.
  • Developed Custom Tags to simplify the JSP code.
  • Developed the user interface using JSP and Java Script to view all online trading transactions.
  • Used application server like WebSphere and Tomcat.
  • Designed and developed framework components, involved in designing MVC pattern using Struts and Spring framework.
  • Developed the Action Classes, Action Form Classes, created JSPs using Struts tag libraries and configured in Struts-config.xml, Web.xml files.
  • Involved in Deploying and Configuring applications in Web Logic Server.
  • Used JUnit framework for unit testing of application and ANT to build and deploy the application on WebLogic Server.

Environment: Java J2EE, JNDI,Spring, Struts, MVC, Websphere, SOAP, JDBC, JSP, JSTL, Javascript, Junit,XML, UML, ANT, Oracle, VSAM, HTML,CSS, MVC, ANT, WebSphere, WebLogic, Eclipse.

Confidential

Java Developer

Responsibilities:

  • Involved in various phases of Software Development Life Cycle (SDLC) as design development and unit testing.
  • Developed and deployed UI layer logics of sites using JSP, XML, JavaScript, HTML/DHTML, and Ajax.
  • CSS and JavaScript were used to build rich internet pages.
  • Agile Scrum Methodology been followed for the development process.
  • Designed different design specifications for application development that includes front-end, back-end using design patterns.
  • Developed proto-type test screens in HTML and JavaScript.
  • Involved in developing JSP for client data presentation and, data validation on the client side with in the forms.
  • Developed the application by using the Spring MVC framework.
  • Collection framework used to transfer objects between the different layers of the application.
  • Developed data mapping to create a communication bridge between various application interfaces using XML, and XSL.
  • Spring IOC being used to inject the parameter values for the Dynamic parameters.
  • Developed JUnit testing framework for Unit level testing.
  • Actively involved in code review and bug fixing for improving the performance.
  • Documented application for its functionality and its enhanced features.
  • Created connection through JDBC and used JDBC statements to call stored procedures.

Environment: Java, Spring MVC, Oracle 11g J2EE,JDBC, Servlets, JSP, XML, Design Patterns, CSS, HTML, JavaScript 1.2, Junit, Apache Tomcat, My SQL Server 2008.

We'd love your feedback!