We provide IT Staff Augmentation Services!

Sr Hadoop Developer Resume

Syosset, NY


  • 8+ years of experience in IT industry with extensive experience in Java, J2ee and Big data technologies.
  • 3+ years working of exclusive experience on Big Data technologies and Hadoop stack
  • Strong experience working with HDFS, Mapreduce, Spark. Hive, Pig, Sqoop, Flume, Kafka, Oozie and HBase.
  • Good understanding of distributed systems, HDFS architecture, Internal working details of Mapreduce and Spark processing frameworks.
  • More than one year ofhands on experience usingSpark framework with Scala.
  • Good exposure to performance tuning hive queries, map - reduce jobs, spark jobs.
  • Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files
  • Has good understanding of various compression techniques used in Hadoop processing like Gzip, SNAPPY, LZO etc.,
  • Expertise in Inbound and Outbound (importing/exporting) data form/to traditional RDBMS using ApacheSQOOP.
  • Tuned PIG and HIVE scripts by understanding the joins, group and aggregation between them.
  • Extensively worked on HiveQL, join operations, writing custom UDF’s and having good experience in optimizing Hive Queries..
  • Worked on various Hadoop Distributions (Cloudera, Hortonworks, Amazon AWS) to implement and make use of those.
  • Mastered in using the using the differentcolumanar file formats like RCFile, ORC and Parquet formats.
  • Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
  • Hands on experience in installing, configuring and deploying Hadoop distributions in cloud environments (Amazon Web Services).
  • Good experience in optimizing Map-Reduce algorithms by using Combiners and Custom partitioners.
  • Hands on experience in NOSQL databases like HBase and MongoDB.
  • Expertise in back-end/server side java technologies such as: Web services, Java persistence API (JPA), Java Messaging Service (JMS), Java Data base Connectivity (JDBC)
  • Experience includes application development in Java (client/server), JSP, Servlet programming, Enterprise Java Beans, Struts, JSF, JDBC, spring, Spring Integration, Hibernate.
  • Very good understanding in AGILE scrum process.
  • Experience in using version control tools like Bit-Bucket, SVN etc.
  • Having good knowledge of Oracle 8i, 9i, 10g as Database and excellent in writing the SQL queries
  • Performed performance tuning and productivity improvement activities
  • Extensively use of use case diagrams, use case model, sequence diagrams using rational rose.
  • Proactive in time management and problem solving skills, self-motivated and good analytical skills.
  • Have analytical and organizational skills with the ability to multitask and meet the deadlines.
  • Excellent interpersonal skills in areas such as teamwork, communication and presentation to business users or management teams.


Big Data Ecosystems: Hadoop, Teradata, Map Reduce,Spark, HDFS, HBase,Pig, Hive, Sqoop, Oozie, Storm, Kafka and Flume.

Spark Streaming Technologies: Spark Streaming, Storm

Scripting Languages: Python, Bash,Java Scripting, HTML5, CSS3

Programming Languages: Java, Scala,SQL, PL/SQL

Databases: RDBMS, NoSQL, Oracle.

Java/J2EE Technologies: Servlets, JSP (EL, JSTL, Custom Tags),JSF, Apache Struts, Junit, Hibernate 3.x,Log4J Java Beans, EJB 2.0/3.0, JDBC,RMI, JMS, JNDI.

Tools: Eclipse, Maven,Ant, MS Visual Studio, Net Beans

Methodologies: Agile, Waterfall


Confidential, Syosset, NY

Sr Hadoop Developer


  • Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
  • Developed custom FTP adaptors to pull the clickstream data from FTP servers to HDFS directly using HDFS File System API.
  • Used Spark SQL and Data Frame API extensively to build spark applications.
  • Used spark engine Spark SQL for data analysis and given to the data scientists for further analysis.
  • Performed streaming data ingestion using Kafka to the spark distribution environment.
  • Built a prototype for real time analysis using Spark streaming and Kafka.
  • Closely worked with data science team in building Spark MLlib applications to build various predictive models.
  • Portioned, bucketing and perform joins on hive tables and utilizing hive SerDes like REGEX, JSON and AVRO.
  • Developed and integrated java programs to move flat files from linux systems to Hadoop eco systems and file validations before loading it to hive tables.
  • Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using Hive QL
  • Optimized HIVE analytics SQL queries, Created tables/views, written custom UDFs and Hive based exception processing.
  • Involved in transforming the relational database to legacy labels to HDFS, and HBASE tables using Sqoop and vice versa.
  • Written Sqoop scripts to inbound and outbound data to HDFS and validated the data before loading to check the duplicated data.
  • Created HBase tables to store variable data formats of data coming from different portfolios
  • Used Sqoop job to import the data from RDBMS using Incremental Import. Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
  • Worked with using different kind of compression techniques to save data and optimize data transfer over network using Lzo, Snappy, etc.

Environment: Hadoop, HDFS, hive, Sqoop, Spark, Scala, MapReduce, Cloudera, Kafka, Zookeeper, HBase, Shell Scripting, AWS UNIX Shell Scripting.

Confidential, NYC, NY

Sr Hadoop Developer


  • Integrated Kafka with Spark Streaming for real time data processing
  • Stored the processed data by using low level Java API’s to ingest data directly to HBase and HDFS.
  • Experience in writing Spark applications for Data validation, cleansing, transformations and custom aggregations.
  • Imported data from different sources into Spark RDD for processing.
  • Developed custom aggregate functions usingSparkSQL and performed interactive querying.
  • Worked oninstalling cluster, commissioning & decommissioning ofData node, Name nodehigh availability, capacity planning, and slots configuration.
  • Developed Spark applications for the entire batch processing by using Scala.
  • Utilized spark dataframe and spark sqlapi extensively for all the processing
  • Experience in managing and reviewingHadooplog files.
  • Experience in hive partitioning, bucketing and perform joins on hive tables and utilizing hive SerDes like REGEX, JSON and AVRO.
  • Exported the analyzed data to the relational databases using Sqoopand to generate reportsfor the BI team.
  • Executed tasks for upgrading cluster on the staging platform before doing it on production cluster.
  • Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters.
  • Installed and configured various components of Hadoop ecosystem.
  • Optimized HIVE analytics SQL queries, Created tables/views, written custom UDFs and Hive based exception processing.
  • Involved in transforming the relational databse to legacy labels to HDFS, and HBASE tables using Sqoop and vice versa.
  • Replaced default Derby metadata storage system for Hive with MySQL system.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig.
  • Configured Fair Scheduler to provide fair resources to all the applications across the cluster.

Environment: Cloudera 5.4, Cloudera Manager, Hue, Spark, Kafka, HBase, HDFS, Hive, Pig, Sqoop, Kafka, Mapreduce, DataStax, IBM DataStage 8.1(Designer, Director, Administrator), Flat files, Oracle 11g/10g, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting.


Hadoop Developer


  • Aggregations and analysis done on large set of log data, collection of log data done using custom built Input Adapters and Sqoop.
  • Developed Map/Reduce jobs using Javafor data transformations.
  • Extensively worked on performance tuning of Hive scripts.
  • Using sqoopto extract the data back to relational database for business reporting
  • Partitioning and bucketing done for the log file data to differentiate data on daily basis and aggregating based on business requirements.
  • Responsible for developing data pipeline using Sqoop, MR and hive to extract the data from weblogs and store the results for downstream consumption.
  • Developed Internal and External tables, good exposure on Hive DDLs to create, alter and drop tables.
  • Stored non-relational data on Hbase, worked extensively on that.
  • Intensively worked on documentation of the project, maintained technical documentation for Hive queries and Pig Scripts we created.
  • Created and developed the UNIX shell scripts for creating the reports from Hive data.
  • Apache PIG scripts written to process the HDFS data .
  • Developed the Sqoop scripts in order to make the interaction between Pig and MySQL Database
  • Has extensive experience in resource allocation, scheduling project tasks and risk analysis activities.
  • Successfully loaded files to HDFS from Teradata, and loaded from HDFS to HIVE.
  • Experience in using Zookeeper and Oozie for coordinating the cluster and scheduling workflows.
  • Involved in loading data from UNIX file system to HDFS.
  • Experience in working with Flume to load the log data from multiple sources directly into HDFS.
  • ManagedHadooplog files.
  • Analyzed the web log data using the HiveQL.

Environment: Hadoop, HDFS, Map Reduce, HIVE, Pig, Sqoop, HBase, Oozie, My Sql, SVN, Putty, Zookeeper, UNIX, Shell scripting, HiveQL, NOSQL database (HBASE), RDBMS, Eclipse, Oracle 11g.

Confidential, Dallas, TX

Hadoop Developer


  • Expertise in data loading techniques like SQOOP, FLUME. Performed transformations of data using Hive, Pig according to business requirements to HDFS for aggregations.
  • Developed Map-Reduce programs from scratch of medium to complex.
  • Work closely with business, transforming business requirements to technical requirements.
  • Be a part of Design Reviews & Daily Project Scrums
  • Hands on experience on PIG, to join raw data using pig scripting.
  • Written custom UDF’s in PIG and HIVE according to business requirements.
  • Hands on experience in working with Cloudera distributions.
  • Hands on experience on extracting data from different databases and scheduled oozie workflow to execute this job daily.
  • Successfully loaded files to HDFS from Teradata, and loaded from HDFS to HIVE.
  • Experience in using Zookeeper and Oozie for coordinating the cluster and scheduling workflows.
  • Involved in loading data from UNIX file system to HDFS.
  • Extensively worked on HIVE, created numerous of Internal and external tables.
  • Partitioned and bucketing the hive tables, keep on adding data daily and perform aggregations.
  • Knowledge in installing and configuring Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Having daily scrum calls on status of the deliverables with business users.
  • Communicate deliverables status to user/stakeholders, client and drive periodic review meetings.
  • On time completion of tasks and the project per quality goals.

Environment: Hadoop, HDFS, Map Reduce, HIVE, Pig, Sqoop, HBase, Oozie, My Sql, SVN, Putty, My Sql, Zookeeper, UNIX, Shell scripting, JSP & Servlets, PHP, JavaScript, XML, HTML, Python and Bash

Confidential, Dover, NH

Java/ J2EE Developer


  • Involved in developing the application using Java/J2EE platform. Implemented the Model View Control (MVC) structure using Struts.
  • Responsible to enhance the Portal UI using HTML, Java Script, XML, JSP,Java, CSS as per the requirements and providing the client side Java script validations and Server side Bean Validation Framework (JSR 303).
  • Used Spring Core Annotations for Dependency Injection.
  • Used Hibernate as Persistence framework mapping the ORM objects to table using Hibernate annotations.
  • Responsible to write the different service classes and utility API which will be used across the frame work.
  • Used Axis to implementing Web Services for integration of different systems.
  • Developed Web services component using XML, WSDL, and SOAP with DOM parser to transfer and transform data between applications.
  • Exposed various capabilities as Web Services using SOAP/WSDL.
  • Used SOAP UI for testing the Restful Webservices by sending an SOAP request.
  • Used AJAX framework for server communication and seamless user experience.
  • Created test framework on Selenium and executed Web testing in Chrome, IE and Mozilla through Web driver.
  • Used client side Java scripting: JQUERY for designing TABS and DIALOGBOX.
  • Created UNIX shell scripts to automate the build process, to perform regular jobs like file transfers between different hosts.
  • Used Log4j for the logging the output to the files.
  • Used JUnit/Eclipse for the unit testing of various modules.
  • Involved in production support, monitoring server and error logs and Foreseeing the Potential Issues, and escalating to the higher levels.

Environment: Java,J2EE, JSP, Servlets, Spring, Servlets, Custom Tags, Java Beans, JMS, Hibernate, IBM MQ Series, Ajax, JUnit, Log4j, JNDI, Oracle, XML, SAX, Rational Rose, UML


Java Developer


  • Involved in development of General Ledger module, which streamlines analysis, reporting and recording of accounting information. General Ledger automatically integrates with a powerful spreadsheet solution for budgeting, comparative analysis and tracking facility information for flexible reporting.
  • Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
  • Designed user-interface and checking validations using JavaScript.
  • Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
  • Developed various EJBs for handling business logic and data manipulations from database.
  • Involved in design of JSP’s and Servlets for navigation among the modules.
  • Designed cascading style sheets and XML part of Order entry Module & Product Search Module and did client side validations with java script.

Environment: JAVA 1.5, J2EE, JSP, Servlets, JSTL, JDBC, Struts, ANT, XML, HTML, JavaScript, SQL, Oracle 9i, Spring 2.0, Hibernate 2.0, Log4j, WebLogic 8.1, Unix

Hire Now