We provide IT Staff Augmentation Services!

Sr. Integration Developer (hadoop And Etl) Resume

White Plains, NY


  • 7+ years of experience in IT industry with extensive experience in Hadoop stack, big data technologies, AWS, Java, Scala, RDBMS and ETL.
  • More than 2 year of hands on experience using Spark framework with Scala.
  • 3+ years of experience working on ETL development (1+ years PENTAHO Data Integration)
  • Strong experience working with HDFS, MapReduce, Spark, AWS, Hive, Impala, Pig, Sqoop, Flume, Kafka, NIFI, Oozie, HBase, MSSQL and Oracle.
  • Good understanding of distributed systems, HDFS architecture, internal working details of Mapreduce and Spark processing frameworks.
  • Worked on Building Hadoop Clusters Both Cloudera and Horton Works.
  • Tableau visualization experience for Data analytics and data visualization.
  • Good exposure to performance tuning hive queries, mapreduce jobs, spark jobs.
  • Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files
  • Has good understanding of various compression techniques used in Hadoop processing like Gzip, SNAPPY, LZO etc.,
  • Expertise in Inbound and Outbound (importing/exporting) data form/to traditional RDBMS using Apache SQOOP.
  • Tuned PIG and HIVE scripts by understanding the joins, group and aggregation between them.
  • Extensively worked on HiveQL, join operations, writing custom UDF’s and having good experience in optimizing Hive Queries.
  • Worked on various Hadoop Distributions (Cloudera, Hortonworks, Amazon AWS) to implement and make use of those.
  • Mastered in using different columnar file formats like RCFile, ORC and Parquet formats.
  • Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
  • Hands on experience in installing, configuring and deploying Hadoop distributions in cloud environments (Amazon Web Services).
  • Good experience in optimizing Map - Reduce algorithms by using Combiners and Custom Practitioners.
  • Hands on experience in NOSQL databases like HBase, Cassandra and MongoDB.
  • Expertise in back-end/server side java technologies such as: Web services, Java persistence API (JPA), Java Messaging Service (JMS), Java Data Base Connectivity (JDBC)
  • Very good understanding in AGILE scrum process.
  • Experience in using version control tools like Bit-Bucket, GIT, and SVN etc.
  • Having good knowledge of Oracle and MSSQL and excellent in writing the SQL queries
  • Performed performance tuning and productivity improvement activities
  • Extensively use of use case diagrams, use case model, sequence diagrams using rational rose.
  • Proactive in time management and problem solving skills, self-motivated and good analytical skills.
  • Have analytical and organizational skills with the ability to multitask and meet the deadlines.
  • Excellent interpersonal skills in areas such as teamwork, communication and presentation to business users or management teams.


Big Data Ecosystems: Hadoop, Map Reduce, Spark, AWS, NiFi, Spark, HDFS, HBase, Pig, Impala, Hive, Sqoop, Oozie, Kafka and Flume and Tableau

ETL: Pentaho data Integration, Talend, Informatica.

Spark Streaming Technologies: Spark Streaming, Storm

Scripting Languages: Python, Bash, Java Scripting.

Programming Languages: Java, Scala, SQL, PL/SQL

Databases: RDBMS, NoSQL, Oracle, MSSQL, MySQL

Tools: Eclipse, IntelliJ, GIT, JIRA, MS Visual Studio, Net Beans, Tableau, Pentaho PDI, Talend, Informatica

Methodologies: Agile, Waterfall


Confidential, White Plains, NY

Sr. Integration Developer (Hadoop and ETL)


  • Integrated Kafka with Spark Streaming for real time data processing.
  • Creating end to end Spark applications to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
  • Used Spark SQL and data frame API extensively to build spark applications.
  • Created custom FTP adaptors to pull the Sensor data from FTP servers to HDFS directly using HDFS File System API
  • Developed a scalable distributed data solution using Hadoop on a 30-node cluster using AWS cloud to run analysis on 25+ Terabytes of customer usage data.
  • Developing Workflow Jobs in Sqoop and flume to extract/export data from IBM MQ and MySQL.
  • Worked extensively on building Hadoop Clusters.
  • Creating Kafka streams to capture and broadcast data and create live stream transformations to normalize data and store in HDFS
  • Developing multiple Extract, Transform and load functionalities with Pentaho Data Integration tool.
  • Tableau visualization experience for Data analytics and data visualization
  • Stored the processed data by using low level Java API’s to ingest data directly to HBase and HDFS.
  • Experience in writing Spark applications for Data validation, cleansing, transformations and custom aggregations.
  • Imported data from different sources into Spark RDD for processing.
  • Developed custom aggregate functions using Spark SQL and performed interactive querying.
  • Worked on installing cluster, commissioning & decommissioning of Data node, Name node high availability, capacity planning, and slots configuration.
  • Developed Spark applications for the entire batch processing by using Scala.
  • Utilized spark data frame and spark SQl api extensively for all the processing
  • Experience in managing and reviewing Hadoop log files.
  • Experience in hive partitioning, bucketing and perform joins on hive tables and utilizing hive SerDes like REGEX, JSON and AVRO.
  • Exported the analyzed data to the relational databases using Sqoop and to generate reports for the BI team.
  • Executed tasks for upgrading cluster on the staging platform before doing it on production cluster.
  • Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters.
  • Installed and configured various components of Hadoop ecosystem.
  • Optimized HIVE analytics SQL queries, created tables/views, written custom UDFs and Hive based exception processing.
  • Involved in transforming the relational database to legacy labels to HDFS, and HBASE tables using Sqoop and vice versa.
  • Replaced default Derby metadata storage system for Hive with MySQL system.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig.
  • Configured Fair Scheduler to provide fair resources to all the applications across the cluster.

Environment: Cloudera 5.13, Cloudera Manager, Ambari, Horton Works, AWS S3 cloud, Hue, Spark, Kafka, HBase, HDFS, Hive, Pig, Sqoop, Kafka, Mapreduce, DataStax, IBM DataStage 8.1(Designer, Director, Administrator), Flat files, Oracle 11g/10g, Windows NT, UNIX Shell Scripting, PDI(Pentaho Data Integration),Microsoft SQL Server Management Studio, GIT, JIRA, etc.

Confidential, Daytona Beach, FL

Hadoop Developer


  • Developed simple to complex MapReduce jobs using Java language for processing and validating the data.
  • Developed data pipeline using Sqoop, Spark, MapReduce, and Hive to ingest, transform and analyze operational data.
  • Developed Map Reduce and Spark jobs to summarize and transform+ raw data.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Real time streaming the data using Spark with Kafka
  • Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
  • Analyzed the data by performing Hive queries (Hive QL) and running Pig scripts (Pig Latin) to study customer behavior.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
  • Created HBase tables and column families to store the user event data.
  • Scheduled and executed workflows in Oozie to run Hive and Pig jobs.
  • Used Impala to read, write and query the Hadoop data in Hive.

Environment: Horton Works Hadoop, Ambari, HDFS, HBase, Pig, Hive, MapReduce, Sqoop, Flume, ETL, REST, Java, Scala, PL/SQL, Oracle 11g, Unix/Linux, GIT, JIRA.

Confidential, Daytona Beach, FL

Hadoop Developer


  • Developed MapReduce jobs using Java language for processing and validating the data.
  • Developing Workflow Jobs in Sqoop and flume to extract/export data from MySQL
  • Involving in System Analysis, Design, Coding, Data conversion, Development and Implementation.
  • Involve in enhancement to the existing application and create new applications.
  • Create jobs to retrieve the data from Hadoop HDFS according to requirements by writing scripts in Java (Map reduce), Pig, Hive.
  • Using Java extensively for writing map reduce programs in Hadoop.
  • Bulk loading of data from several databases into Hadoop using Sqoop.
  • Exported the analyzed data to the relational databases.
  • Microsoft SQL Server Management Studio, Oracle.


JAVA Developer


  • Involved in the design and implementation of the architecture for the project using OOAD, UML design patterns.
  • Involved in design and development of server side layer using XML, JSP, JDBC, JNDI, EJB and DAO patterns using eclipse IDE.
  • Work involved extensive usage of HTML, CSS, JavaScript and Ajax for client side development and validations.
  • Used parsers for the conversion of XML files to java objects and vice versa.
  • Developed screens using XML documents and XSL.
  • Developed Client programs for consuming the Web services published by the Country Defaults Department which keeps in track of the information regarding life span, inflation rates, retirement age, etc. using Apache Axis.
  • Developed java beans and jsp's by using Spring and JSTL tag libs for supplementsDeveloped java beans and jsp's by using Spring and JSTL tag libs for supplements Developed java beans and jsp's by using spring and JSTL tag libs for supplements.
  • Development of EJB’s, Servlets and JSP files for implementing Business rules and Security options using IBM Web Sphere.
  • Exported the analyzed data to the relational databases.
  • Microsoft SQL Server Management Studio, Oracle.

Hire Now