We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Framingham, MA

SUMMARY:

  • 8 years of experience in Developing,Implementing, Testing and maintenance of various web applications using JAVA along with 4 years of experience in phases of Hadoop and Bigdata development
  • Experience in Software Development Life Cycle (SDLC) methodologies like Agile, Scrum, and Waterfall Methodologies
  • Experience in using Hive to analysethe partitioned and bucketed data and compute various metrics for reporting
  • Experience in Using Pig as ETL tool to perform testing on transformations, event joins and some pre - aggregations before storing the data onto HDFS
  • Good experience with MapReduce (MR), Hive, Pig, HBase, Sqoop, Spark, Scala for data extraction, processing, storage and analysis
  • Experience writing Hive QL queries and Pig Latin scripts for ETL
  • Expertise in processing and analyzing archived and real-time data using Core Spark,Spark-SQL and Spark Streaming
  • Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics
  • Expertise in Data Development in Hortonworks HDP platform &Hadoop ecosystem tools like Hadoop, HDFS, Spark, Zeppelin, Hive, HBase, SQOOP, flume, Atlas, SOLR, Pig, Falcon, Oozie, Hue, Tez, ApacheNiFi, Kafka
  • Expertise in Java Script, JavaScript MVC patterns, Object Oriented JavaScript Design Patterns and AJAX and developed core modules in large cross-platform applications using JAVA, JSP, Servlets, JDBC, JavaScript, XML, and HTM
  • Experience with multiple Hadoop distributions such as Cloudera, Hortonworks and AWS
  • Experience with VMWare, VirtualBox, Docker and Vagrant
  • Experience with Java SE 8 and Java EE frameworks such as Spring MVC 4.0, Spring
  • In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLlib
  • Hands on experience in application development using Java, RDBMS, and UNIX shell scripting
  • Experience in working with BI team and transform big data requirements into Hadoop centric technologies
  • Hands-on experience on development tools like Eclipse, IntelliJ, RAD, MyEclipse
  • Good knowledge of Hadoop Architecture and various components such as YARN, HDFS, Node Manager, ResourceManager, JobTracker, TaskTracker, NameNode, DataNode and MapReduce concepts
  • Experienced with different scripting language like Python and shell scripts
  • Good Data Warehouse experience in MSSQL
  • Solid SQL skills, can write complex SQL queries; functions, triggers and stored procedures for Backend testing, Database Testing and End-to-End testing
  • Good experience in Linux, UNIX, Windows and MacOS environment
  • Experience in working with small and large groups and successful in meeting new technical challenges and finding solutions to meet the needs of the customer
  • Strong understanding in Agile and Waterfall SDLC methodologies

PROFESSIONAL EXPERIENCE:

Confidential, Framingham, MA

Hadoop Developer

Responsibilities:

  • Worked on Hadoop eco-systems including Hive, Sqoop and Kafka with Hortonworks Distributed Platform
  • Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop,PIG, Hive and Shells Scripts (for scheduling of jobs)
  • Handled importing of data from various data sources, performed transformations using Hive,Pig, and loaded data into HDFS
  • Extracted files from Couch DB and placed into HDFS using Sqoop and pre-process the data foranalysis
  • Handled importing of data from various data sources like MySQL, Oracle, DB2
  • Exported the result set from Hive to MySQL using Sqoop after processing the data
  • Participated in the managing and reviewing of the Hadoop log files
  • Developed a Python code for Testing and QA On ETL Job
  • Unit tested and tuned SQLs and ETL Code for better performance
  • Developed Python scripts to find vulnerabilities with SQL Queries by doing SQL injection
  • Used Python for pattern matching in build logs to format warnings and errors
  • Performed multiple MapReduce jobs in Pig and Hive for Data Cleaning and pre-processing
  • Involved in HDFS maintenance and loading of structured and unstructured data
  • Created custom python/shell scripts to import data via SQOOP from Oracle databases
  • Used Python scripts to update content in the database and manipulate files
  • Utilized standard Python modules such as csv, ConfigParser, ibm db, cx Oracle etc.
  • Developed test plan, test scripts and test procedures from the specification documentin Python and automating them to run in the real time environment
  • Developed and designed automation framework using Python and Shell scripting
  • Developed the project in Linux environment
  • Knowledge on JSON and SimpleJSON based web services
  • Utilized Agile Scrum Methodology to help manage and organize with developers and regularcode review sessions and setup High Availability Cluster to Integrate the HIVE with existingapplications
  • Imported and exported the analyzed data to the relational databases using Sqoop

Confidential, Minneapolis, MN

Hadoop Developer

Responsibilities:

  • Followed Agile Methodology especially SCRUM software development process throughout Project
  • Involved in Bi-Weekly sprint meetings with the Business Analysts and Business Managers to drive out testing efforts and implement an elegant solution to the tasks
  • Worked with Hadoop Ecosystem components like HDFS, HBase, Sqoop, Hive, Spark-Scala and Pig with HORTONWORKS Hadoop distribution
  • Prepared Positive and Negative testcases by understanding user stories/requirements for different interfaces
  • Preparing the system test plan covering testing scope, requirements, environment, approach, risks and issues
  • Involved in designing test plans, test cases and overall Unit and Integration testing of system
  • Performed comparisons between DDL's and tables structures
  • Handled importing of data from various data sources like SQL, Mainframes and Oracle DB2 (AAH)
  • Validated the data between the Data lake tables and target tables in Teradata, Oracle DB2 (AAH) and SQL
  • Performed data validation between the source and target tables by using HIVE and PIG
  • Involved in creating Hive tables and loading them into dynamic partition tables
  • Had hands-on experience on Hive CDC and SCD logics
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
  • Used Pig as ETL tool to perform testing on transformations, event joins and some pre-aggregations before storing the data onto HDFS
  • Inserted Overwriting the HIVE data with HBase data daily to get fresh data every day
  • Experienced in using Zena component to trigger the files from Source and load the data into the target tables
  • Monitoring and Verifying Hadoop Zena jobs and analyzing the data, generated reports to meet business requirements
  • As a part of audit testing, performed the testing on Duplicate file check, Zero Byte file check through Zena
  • Performed the Data Quality checks and Data Quality threshold checks on the HDFS files which we get from the source
  • Experienced in writing test scripts in HiveQL and PIG Latin for validating the data in Tables
  • Involved in defect management process - Created all the bugs and logs the bugs in JIRA and ALM for development and business review
  • Involved in defect triage meetings with business, developers and explained them the severity and risk of the issues found
  • Worked on SQOOP to import data from various relational data sources
  • Worked on strategizing SQOOP jobs to parallelize data loads from source systems
  • Participated in Daily Stand up calls and update the progress to all the stake holders
  • Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit

Confidential, Birmingham, AL

Hadoop Developer

Responsibilities:

  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files
  • Involved Low level design for MR, Hive, Impala, Shell scripts to process data
  • Involved in complete Big Data flow of the application starting from data ingestion upstream to HDFS, processing the data in HDFS and analyzing the data
  • Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment implemented in Scala
  • Used Spark Streaming API with Kafka to build live dashboards; Worked on Transformations & actions in RDD, Spark Streaming, Pair RDD Operations, Check-pointing, and SBT
  • Implemented POC to migrate map reduce jobs into Spark RDD transformation using Scala IDE for Eclipse
  • Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team
  • Installing and configuring Hive, Sqoop, Flume, Oozie on the Hadoop clusters
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs
  • Developed a process for the Batch ingestion of CSV Files, Sqoop from different sources and also generating views on the data source using Shell Scripting and Python
  • Integrated a shell script to create Collections/morphline, SolrIndexes on top of table directories using MapReduce Indexer Tool within Batch Ingestion Framework
  • Implemented partitioning, dynamic partitions and buckets in HIVE
  • Developed Hive Scripts to create the views and apply transformation logic in the Target Database
  • Involved in the design of Data Mart and Data Lake to provide faster insight into the Data
  • Involved in using Stream Sets Data Collector tool and created Data Flows for one of the streaming applications
  • Experienced in using Kafka as a data pipeline between JMS (Producer) and Spark Streaming Application (Consumer)
  • Involved in the development of Spark Streaming application for one of the data sources using Scala, Spark by applying the transformations
  • Developed a script in Scala to read all the Parquet Tables in a Database and parse them as Json files, another script to parse them as structured tables in Hive
  • Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster
  • Configured Zookeeper for Cluster co-ordination services
  • Developed a unit test script to read a Parquet file for testing PySpark on the cluster
  • Involved in exploration of new technologies like AWS, Apache Flink, and Apache NIFIetc which can increase the business value

Confidential

Java Developer

Responsibilities:

  • Attended various meetings with users. Gone through and understand the client requirements
  • Developed application on Spring 4.x framework by utilizing its features like Spring
  • Dependency injection, Spring Beans, Spring JDBC, Spring Web flow using Spring MVC
  • Worked on Spring MVC application with XML configurations and annotations
  • Used Dispatcher servlet to route incoming requests, controllers to handle requests and Model to send values to user interface
  • Used Agile principles to implement the projects using two-week sprints, planning meeting,daily standups, grooming, estimation and retrospectives
  • Developed a portal application from scratch to interact with third party application token exchange model for authentication, get the data needed and Spring MVC to handle incoming requests and RESTful web services (Implementing JAX-RS API) with Jackson parser to send data on Web Service Calls in JSON format
  • Participated in Scrum meetings and project planning and coordinated the status sessions
  • Developed the presentation layer by using Servlet, HTML 5, CSS 3, JavaScript, JSP's, JSON and XML
  • Developed Data Access Layer using Hibernate ORM framework
  • Used Hibernate named queries concept to retrieve data from the database and integrate with Spring MVC to interact with backend persistence system (Oracle11g)
  • Extensively involved in creating complex SQL queries and calling Stored Procedures
  • Maintain high-quality of RESTful services and implemented REST Services using Spring MVC and JAX-RS
  • Used Maven to build and deploy application onto JBOSS Application Server to deploy code onto server
  • Used JIRA tracking tool to manage and track the issues reported by QA and prioritize and act based on the severity
  • Used GitHub extensively as versioning tool and used Maven for automated building of projects
  • Involved in the analysis of finding out the performance issues of DAO classes
  • Extensively used the LOG4j to log regular Debug and Exception statements and involved in design, analysis and architectural meetings
  • Implemented Unit Testing using JUnit and involved in Integration Testing with Database Layer.

Confidential

Java Developer

Responsibilities:

  • Involved in SDLC Requirements gathering, Analysis, Design, Development, and Testing of application developed using AGILE methodology
  • Developed the application using Spring Framework that leverages classical Model View Layer (MVC) architecture
  • Worked in the different parts of the MVC pattern like Dispatcher Servlet, Handler Mapping, Controllers, Model, and Views
  • Used spring core for Business Layer
  • Used Hibernate in conjunction with Spring functionality to implement Object-relation mapping in the persistence layer
  • Created and consumed Web Services using REST and SOAP
  • Created webpages using HTML5, CSS3, JavaScript
  • Asynchronous calls and preloading the data are made using AJAX
  • Worked on Complex SQL queries and created stored procedures for different business functionalities
  • Used SONAR tool to maintain code quality compliance
  • Performed Unit testing for various modules using JUnits
  • Used SPLUNK to get the Debug logs

We'd love your feedback!