We provide IT Staff Augmentation Services!

Big Data And Etl Developer Resume

Minnetonka, MN


  • 8+ years of overall experience in building and developing Hadoop Map Reduce solutions.
  • Strong experience with Big Data and Hadoop technologies with excellent knowledge of Hadoop ecosystem: Hive, Spark, Sqoop, Impala, Pig, HBase, Kafka, Flume, Storm, Zookeeper, Oozie.
  • Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers and Kafka brokers.
  • Experience in working with Code migration, Data migration and Extraction/Transformation/ Loading with Teradata and HDFS files on UNIX and Windows.
  • Expert in creating and designing data ingest pipelines using technologies such as Apache Storm - Kafka.
  • Good Knowledge in creating processing data pipelines using Kafka and Spark Streaming.
  • Experienced in data ingestion using Sqoop, Storm, Kafka and Apache Flume.
  • Experience in Apache Flume for efficiently collecting, aggregating, and moving large amounts of log data.
  • Knowledge in Streaming the Data to HDFS using Flume.
  • Hands on Flume to handle the real time log processing for attribution reports.
  • Experience in Importing and Exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
  • Experience with Apache Spark ecosystem using Spark-SQL, Data Frames, RDD's and knowledge on Spark MLib.
  • Experience in ingestion, storage, querying, processing and analysis of Big Data with hands on experience in Big Data including Apache Spark, Spark SQL, Spark Streaming.
  • Worked with Spark engine to process large scale data and experience to create Spark RDD.
  • Knowledge on developing Spark Streaming jobs by using RDDs and leverage Spark-Shell.
  • Expertise in Talend Big data tool with involved in architectural designing and development of ingestion and extraction job in Big Data and Spark Streaming.
  • Having experience on RDD architecture and implementing Spark operations on RDD and also optimizing transformations and actions in Spark.
  • Hands on Apache Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
  • Knowledge of Spark code and SparkSQL for testing and processing of data using Scala.
  • Knowledge on cloud services Amazon web services(AWS).
  • Good in analyzing data using HiveQL, PigLatin and custom MapReduce program in Java.
  • Expertise in MapReduce programs in HIVE and PIG to validate and cleanse the data in HDFS, obtained from heterogeneous data sources, to make it suitable for analysis.
  • Good in Hive and Impala queries to load and processing data in Hadoop File system (HFS).
  • Good understanding of NoSQL Data bases and hands on work experience in writing applications on NoSQL databases like Cassandra and MongoDB.
  • Hands on Datalake cluster with Hortonworks Ambari on AWS using EC2 and S3.
  • Strong knowledge in MongoDB concepts includes CRUD operations and aggregation framework and in document Schema design.
  • Experience in maintenance/bug-fixing of web based applications in various platforms.
  • Experience in managing life cycle of MongoDB including sizing, automation, monitoring and tuning.
  • Implemented Proofs of Concept on Hadoop Stack and different big data analytic tools, migration from different databases (i.e Teradata, Oracle,MYSQL ) to Hadoop.
  • Experience in storing, processing unstructured data using NoSQL databases like HBase.
  • Good in developing web-services using REST, HBase Native API Client to query data from HBase.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.
  • Experienced in job workflow scheduling, monitoring tools like Oozie and Zookeeper.
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that run Java MapReduce and Pig jobs.
  • Good experience with both Job Tracker (Map reduce 1) and YARN (Map reduce 2).
  • Experience in managing and reviewing Hadoop Log files generated through YARN.
  • Experience in using Apache Solr for search applications.
  • Experienced in Java, Spring Boot, Apache Tomcat, Maven, Gradle, Hibernate and open source frameworks/ software's.
  • Preparation of Dashboards using Tableau.
  • Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
  • Proficient on Test driven development (TDD), Agile to produce high quality deliverables.
  • Hands on Agile (Scrum), Waterfall model along with automation and enterprise tools like Jenkins, Chef, JIRA, Confluence to develop projects and version control, Git.


Big Data: Hadoop HDFS, MapReduce, HIVE, PIG, HBase, Zookeeper, Sqoop, Oozie, Cassandra, Spark, Scala, Storm, Flume, Kafka and Avro.

Web Technologies: Core Java, J2EE, Servlets, JSP, JDBC, JBOSS, JSF, XML, AJAX, SOAP, WSDL.

Methodologies: Agile, SDLC, Waterfall model, UML, Design Patterns, Scrum.

Frameworks: ASP.NET, Java EE, MVC, Struts 2/1, Hibernate 3, Spring 3/2.5/2.

Programming Languages: Core Java, C, C++, SQL, Python, Scala, XML, Unix Shell scripting, HTML, CSS, JavaScript.

Data Bases: Oracle 11g/10g, IBM DB2, Oracle, MongoDB, Microsoft SQL Server, MySQL, MS-Access.

Apache Tomcat, JSON: RPC, Web Logic, Web Sphere.

NOSQL: Cassandra, MongoDB, HBase.

Monitoring, Reporting tools: Ganglia, Nagios, Custom Shell Scripts.


Confidential, Minnetonka,MN

Big Data and ETL Developer

Roles and Responsibilities:

  • Worked on building data lake on MapR cluster using Sqoop as data integration component and Hive as data access component.
  • Installed and configured development tenant on MapR cluster for application development and troubleshooting/reproducing production issues.
  • Developed UDF's using Scala Scripts, which used in Data frames/SQL and RDD in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Exposed to Spark Structured Streaming and worked as a part of Agile spike to check the functionality that suits the business requirement.
  • Improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame and Pair RDD's.
  • Involved in writing queries in SparkSQL using Scala.
  • Used Apache-Spark for creating RDD's and Data Frames applying operations like Transformation and Actions and converting RDD's to Data Frames.
  • Used Spark API over Hadoop YARN to perform analytics on data in Hive and Spark SQL.
  • Created Sqoop commands for data extraction to test the data against the various database tables and views.
  • Worked on job failures and developed shell scripts to automate Sqoop extraction and Spark jobs using TWS job in ServiceNow.
  • Implemented Sqoop for large dataset transfer to Hadoop from Oracle and Teradata database.
  • Used Hive and HBase to read, write and query the data in MapR hive warehouse using Hive metastore.
  • Proficient in SQL using Oracle, DB2, and SQL Server, also have experience with MicroStrategy Reporting BI tool.
  • Developed Merge jobs in Python in order to extract and load data into MySQL database and used Test driven approach for developing applications.
  • Created multiple Hive tables and written multiple Hive SQL queries to load the Hive tables with parquet data.
  • Worked on handling the import of data from Teradata and load them into MapR cluster hive warehouse.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Development of company´s internal CI system, providing a comprehensive API for CI/CD.
  • Loading data from Linux file system (LFS) to HDFS and vice-versa.
  • Coordinate with team on preparing sprint planning (Agile methodology).
  • Worked with Business Systems Analysts to analyze application requirements.
  • Responsible for handling data formats like Parquet, CSV and txt.
  • Worked on improvising and tuning Spark Jobs written in python and scala.
  • Responsible for monitoring MapR Cluster(Yarn applications) and TWS jobs on IBM Workload scheduler.
  • Worked on troubleshooting issues post deployment and come up with solutions required to drive production implementation and delivery.
  • Prepared Avro schema files for generating Hive tables.
  • Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
  • Work with business users for requirement gathering and workshops/meetings and Worked on delivering data to third party clients.
  • Created architectural data flow diagram using lucid chart.
  • Implemented Test Scripts to support test driven development and integration.
  • Developed multiple MapReduce jobs in Java to clean datasets.
  • Gathering all the necessary details from each source team such as nature of data, ingestion type, Test instance, prod instance etc to enable us in smooth ingestion of the systems.
  • Experienced in working with applications team in installing Hadoop updates, upgrades based on requirement.
  • Working on projecting, involving and migration of data from different sources, Teradata to HDFS Data Lake and creating reports by performing transformations on the data which is in the Data Lake.

Environment: Spark2.2.1,SparkSQl, Scala2.11.8, MapR2.7,YARN,Teradata, HBase1.1.8,HDFS, Oracle, MapReduce, Datalake, Hive2.1.1,Sqoop, ETL, Hadoop2.7,NOSQL, Talend, Flatfiles, UNIX, ShellScripting, RDBMS, Linux, Maven, Eclipse IDE,Java-1.8,Agile, TWS,Teradata SQL Assitant.

Confidential, Lexington, KY.

Hadoop Developer

Roles and Responsibilities:

  • Responsible for design and development of analytic models, applications and supporting tools, which enable Developers to create algorithms/models in a big data ecosystem.
  • Ensures appropriate data testing is completed and meets test plan requirements.
  • Engage with business stakeholders to design and own end-to-end solutions to empower data driven decision making.
  • Involved in developing code to write canonical model JSON records from numerous input sources to Kafka Queues.
  • Extracted data from various location and load them into the oracle table using SQL*LOADER.
  • Experienced on Scala programming as a part of Spark batches and Streaming Data pipelines development.
  • Sqoop is used for transferring big data stored in HDFS to relational databases such as DB2, Oracle, etc and vice versa.
  • Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD’s, and Scala.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Involved in setting QA environment by implementing Pig and Sqoop Scripts.
  • Developed Pig Latin Scripts to do operations of sorting, joining and filtering enterprise data.
  • Experience in writing storm topology to accept events from Kafka producer and emit to Cassandra.
  • Developed Oozie workflow for scheduling ETL process and Hive Scripts.
  • Hands on experience in installing, configuring Cloudera.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (like MapReduce, Pig, Hive, Sqoop) as well as system specific jobs.
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Have good experience with NOSQL database like Cassandra.
  • Analyze Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirement.
  • Develop, Maintain and support Continuous Integration(CI) framework based on Jenkins.
  • Designed Implemented the prototype for the functional comparison using Cassandra vs Solr, and then knowledge sharing with other team members on Cassandra.
  • Configured and used FTP by the Informatica Server to access source files.
  • Implemented on Hadoop stack and different big data analytic tools, migration from different databases SQL Server2008 R2, Oracle, MYSQL to Hadoop.
  • Worked on Hadoop cluster and data querying tools like Hive to store and retrieve data.
  • Loading the sstable files to Scylla using the Scylla sstableloader tool + Data validation.
  • POC for enabling member and suspect search using Solr.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
  • Developed an autonomous continuous integration system by using Git, Gerrit, Jenkins, MySQL and custom tools developed in Python and Bash.
  • Designed, developed, implemented, and maintained solutions for using Docker, Jenkins, Git, and Puppet for microservices and continuous deployment.
  • Implemented the unit testing by using Python Unit test framework.
  • Using CSVExcelStorage to parse with different delimiters in PIG.
  • Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments.
  • Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa.
  • Search and fetch the corresponding records using SOLR query.

Environment: Hortonworks, ETL, YARN, HDFS, PIG, Hive, Kafka, Oracle, Talend, Cassandra, Eclipse, Java, Sqoop, Avro, Spark, Spark API, SparkSQl, Spark Streaming, Scala, Solr, Talend, Linux file system, Linux Shell Scripting, Oozie, Agile.

Confidential, Minneapolis, MN

Hadoop Consultant

Roles and Responsibilities:

  • Collected and aggregated large amounts of weblog data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
  • Experience in setting up Fan-out workflow in Flume to design V-shaped architecture to take data from many sources and ingest into thesingle sink.
  • Reviewing and managing Hadoop log files from multiple machines using Flume.
  • Experienced in managing and reviewing Hadoop log files.
  • Real time processing of raw data stored in Kafka and storing processed data in Hadoop using Spark Streaming(DStreams).
  • In pre-processing phase used SparkRDD transformations to remove all the missing data and to create new features.
  • Developed SparkSQL queries for generating statistical summary and filtering operations for specific use cases working with SparkRDD's on distributed cluster running ApacheSpark.
  • Involved in converting SQL queries into Apache Spark transformations using ApacheSparkDataFrames.
  • Worked with Spark to create structured data from the pool of unstructured data received.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Developed Validation scripts in PL/SQL for importing the Invoices from the legacy system into Oracle Payables
  • Installed and configured Hive and also written Hive UDFs and UDAFs.
  • Worked on analyzing Hadoop Cluster and different big data analytic tools including Pig, Hive and MongoDB.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Worked on Impala for creating views for business use-case requirements on top of Hive tables.
  • Created AngularJS directives, factories and services for developing single page web applications.
  • Processed the Web server logs by developing Multi-Hop Flume agents by using Avro Sink and loaded into MongoDB for further analysis, Extracted files from MongoDB through Flume and Involved in loading data from UNIX file system to HDFS.
  • Extracted and restructured the data into MongoDB using import and export command line utility tool.
  • Maintained stored definitions, transformation rules and targets definitions using Informatica repository Manager.
  • Used MongoDB as part of POC and migrated few of the stored procedures in SQL to MongoDB.
  • Worked with NOSQL databases (MongoDB) and Hybrid implementations.
  • Monitoring of Document growth and estimating storage size for large MongoDB clusters.
  • Expertized in implementing Spark using Scala and SparkSQL for faster testing and processing of data responsible to manage data from different sources.
  • Developed interactive Dashboards using Tableau connecting to Impala.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in setting QA environment by implementing Pig and Sqoop scripts.

Environment: Linux, MongoDB, Hadoop, Spark HBase, Oracle Sqoop, Pig, Impala, Hive, Kafka, Flume, Cloudera, Design Patterns, Apache Tomcat, My SQL Server 2008.

Confidential, Tampa, FL.

Hadoop Developer

Roles and Responsibilities:

  • Experience in creating integration between Hive and HBase for effective usage and performed MR Unit testing for the Map Reduce jobs.
  • Created BI reports(Tableau) and dashboards from HDFS data using Hive.
  • Experience in importing and exporting the data from Relational Database Systems to HDFS by using Sqoop.
  • Developed a common framework to import the data from Teradata to HDFS and to export to Teradata using Sqoop.
  • Imported the log data from different servers into HDFS using Flume and developed MapReduce programs for analyzing the data.
  • Used Flume to handle the real time log processing for attribution reports.
  • Worked on tuning the performance of Pig queries.
  • Involved in building database Model, APIs and Views utilizing Python, in order to build an iterative web based solution.
  • Performed troubleshooting, fixed and deployed many Python bug fixes for two main applications that were main source of data for customers and internal customer service team.
  • Involved in loading data from UNIX file system to HDFS.
  • Performed operation using Partitioning pattern in MapReduce to move records into different categories.
  • Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the MapReduce jobs given by the users.
  • Involved in templates and screens in HTML and JavaScript.
  • Created HBase tables to load large sets of data coming from UNIX and NoSQL.
  • Implemented the Web Service client for the login authentication, credit reports and applicant information using Apache Axis 2 Web Service.
  • Design, develop, test, implement and support of Data Warehousing ETL using Talend and Hadoop Technologies.
  • Built and deployed Java applications into multiple Unix based environments and produced both unit and functional test results along with release notes

Environment: WebSphere 6.1, HTML, XML, ANT 1.6, MapReduce, Sqoop, UNIX, NoSQL, Java, JavaScript, MR Unit, Teradata, Node.js, JUnit 3.8, ETL, Talend, HDFS, Hive, HBase.

Confidential, Dallas,TX

Hadoop/Java Developer

Roles and Responsibilities:

  • Experience with the Hadoop ecosystem (MapReduce, Pig, Hive, HBase) and NoSQL.
  • Analyze and determine the relationship of input keys to output keys in terms of both type and number, identify the number, type, and value of emitted keys and values from the Mappers, Reducers and the number and contents of the output files.
  • Developed MapReduce pipeline jobs to process the data and create necessary HFiles and loading the HFiles into HBase for faster access without taking performance hit.
  • Designed & developed UI Screens with Spring (MVC), HTML5, CSS, JavaScript, AngularJS to provide interactive screens to display data.
  • Written multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other file formats.
  • Analyze to determine the correct InputFormat, OutputFormat on order of MapReduce job requirements.
  • Fetching the records for SOAP and Restful requests from Oracle DB using SOLR search.
  • Built AngularJS modules, controllers, pop up modals, and file uploaders.
  • Created database tables and wrote T-SQL Queries and stored procedures to create complex join tables and to perform CRUD operations.
  • By using AWS, MapReduce job processed the data stored in the AWS.
  • Using Automated Build and continuous integration systems Jenkins and test-driven Unit Testing framework Junit.
  • Access control to users depending on logins using HTML, jQuery for validations.
  • Created the web application using HTML, CSS, jQuery and JavaScript.
  • Experience in building frameworks in Python for Test Automation.
  • Experience in Bash Shell Scripting, SQL and Java Full stack web development using Python.
  • Wrote Python scripts to parse JSON files and load the data into the Consul.
  • Worked on Python OpenStack APIs and used Numpy for Numerical analysis.
  • Used Eclipse as an IDE for developing the application.
  • Loaded the flat files data using Informatica to the staging area.
  • Used UI-router in AngularJS to make this a single page application.
  • Developed unit/assembly test cases and UNIX shell scripts to run along with daily/weekly/monthly batches to reduce or eliminate manual testing effort.
  • Developed mappings in Informatica to load the data including facts and dimensions from various sources into the Data Warehouse, using different transformations like Source Qualifier, Java, Expression, Lookup, Aggregate, Update Strategy and Joiner.

Environment: Windows XP/NT, Java, MapReduce, Pig, Hive, Hbase, NoSQL, AWS, Jenkins, HTML, CSS, T-SQL, AngularJS, UI, jQuery, Korn Shell, Quality Center 10.


Java Developer

Roles and Responsibilities:

  • Played an active role in the team by interacting with welfare business analyst/program specialists and converted business requirements into system requirements.
  • Implemented J2EE standards, MVC architecture using Struts Framework. Implemented Servlets, JSP and Ajax to design the user interface.
  • Developed and deployed UI layer logics using JSP and dynamic JSP pages with Struts.
  • Worked with Struts MVC objects like Action Servlet, Controllers, and validators, Web Application Context, Handler Mapping, Message Resource Bundles and JNDI for look-up for J2EE components.
  • Automated the process for downloading and installing packages, copying source files and executing code.
  • Redesigned several web pages with better interface and features.
  • Developed application based on Software Development Life Cycle (SDLC).
  • Developed the XML data object to generate the PDF documents and other reports.
  • Used Hibernate, DAO, and JDBC for data retrieval and medications from database.
  • Messaging and interaction of Web Services is done using SOAP and REST
  • Developed JUnit Test cases for Unit Test cases and as well as System and User test scenarios and Involved in Unit Testing, User Acceptance Testing and Bug Fixing.

Environment: J2EE, JDBC, Java, Servlets, JSP, Hibernate, Web services, REST, SOAP, Design Patterns, SDLC, MVC, HTML, JavaScript, WebLogic 8.0, XML, Junit, Oracle 10g, Eclipse.


Intern/Java Developer


  • Developed the application under JEE architecture, developed Designed dynamic and browser compatible user interfaces using JSP, Custom Tags, HTML, CSS, and JavaScript.
  • Deployed & maintained the JSP, Servlets components on Web logic 8.0.
  • Developed MAVEN scripts to build and deploy the application onto Web Logic Application Server to run UNIX shell scripts and implemented auto deployment process.
  • Developed Application Servers persistence layer using, JDBC, SQL, Hibernate.
  • Used Hibernate 3.0 in data access layer to access and update information in the database.
  • Wrote SQL queries, stored procedures, and triggers to perform back-end database operations.
  • Used JDBC to connect the web applications to Data Bases.
  • Used Log4j framework to log/track application and debugging.
  • Developed and utilized J2EE Services and JMS components for messaging communication in Web Logic.
  • Configured development environment using Web logic application server for testing.

Environment: Java/J2EE, SQL, Oracle 10g, JSP 2.0, Hibernate 3.0, SQL, Maven, UNIX, Log4j, AJAX, Java Script, Web Logic 8.0, HTML, JDBC 3.0.

Education: Bachelor of Science in Information Technology, INDIA-2009

Hire Now