We provide IT Staff Augmentation Services!

Hadoop/spark/scala Developer Resume

5.00 Rating

T Louis, MissourI


  • 9+ years of IT experience in Design, Development, Deployment, Maintenance and Support of Java/J2EE applications. Focused on quality and efficiency.
  • 4 years of experience in Hadoop distributed file system (HDFS), Impala, Hive, Hbase, Spark, Hue, Map Reduce framework and Sqoop.
  • Experienced as Hadoop, expertise in providing end to end solutions for real time big data problems by implementing distributed processing concepts such as map reduce on Hadoop frameworks such as HDFS and Hadoop Ecosystem components
  • Experienced with NoSQL databases like HBase, MongoDB, Mahout and Cassandra.
  • Experience in working on large scale big data implementations and in production environment.
  • Experience using Cloudera and Horton Works platform and their Eco - System. Experience in installing, configuring and using ecosystem components like Hadoop, Map-Reduce, HDFS, Pig, Hive, Sqoop and Flume.
  • Hands on experience on Data Migration from Relational Database to Hadoop Platform using SQOOP.
  • Extensively used Apache Flume to collect logs and error messages across the cluster.
  • Experienced in providing technical solutions to the business on applications that are developed on Hadoop and Its eco-systems. Experience in cloud platforms like AWS, AZURE.
  • Experienced in using Pig scripts to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
  • Around 1year experience on Spark and Scala.
  • Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream.
  • Hands on experience on AWS infrastructure services Amazon Simple Storage Service (Amazon S3), Amazon Elastic Compute Cloud (Amazon EC2) and Elastic MapReduce (EMR)
  • Experience in Complete Software Development Life Cycle (SDLC) which includes Requirement Analysis, Design, Coding, Testing and Implementation using Agile (Scrum), TDD and other development methodologies.
  • Expertise in developing both Front End and Back End applications using Java, Servlets, JSP, Web Services, JavaScript, HTML, Spring, Hibernate, JDBC, XML, JSON.
  • Worked on Web logic, Tomcat Web Server for Development and Deployment of the Java/J2EE Applications.
  • Experience in writing MapReduce, YARN, PIG Scripts, Hive Queries, apache Kafka, Storm for analyzing Data
  • Good experience in Spring & Hibernate and Expertise in developing Java Beans.
  • Working knowledge of Web logic server clustering.
  • Developed POC for Apache Kafka.
  • Having knowledge in running event processing using storm.
  • Proficient in various web based technologies like HTML, XML, XSLT, and JavaScript.
  • Expertise in unit testing using JUnit.
  • Experience in error logging and debugging using Log4J.
  • Strong knowledge in creating/reviewing of data models that are created in RDBMS like Oracle 10g, MySQL databases.
  • Responsible for the formation and direction of Business Intelligence, Data Governance, Enterprise Data Warehouse (EDW) and Enterprise Data Management (EDM) (Oracle Appliance (11g), Informatica 9, Business Objects XI 3.1, Erwin)
  • Worked with operating systems like Linux, UNIX, Solaris, and Windows 2000/XP/Vista/7.
  • Experience in working with versioning tools like Git CVS & Clear Case.
  • Good knowledge in working with cloud integration with Amazon's Simple Storage Service (S3), Amazon Elastic MapReduce (EMR), Amazon Cloud Compute (EC2) and Microsoft Azure HDInsight.
  • Detailed understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Waterfall and Agile.
  • Goal oriented, organized, team player with good interpersonal skills; thrives well within group environment as well as individually.
  • Strong business and application analysis skills with excellent communication and professional abilities.


Languages: Java, PL/SQL, Scala

Big Data: Apache Hadoop, Hive, HDFS, Spark, MapReduce, Sqoop, Storm.

RDBMS: Oracle, SQL Server, Teradata

Scripting Languages: UNIX Shell script, Java Script, python

Web Servers: Tomcat 7.x.

Tools: and Utilities: MS Team Foundation Server, SVN, Maven, Gradle

Development Tools: Eclipse, IntelliJ IDEA

Operating systems: Windows NT/2000/XP, UNIX, Linux

Methodology: Waterfall, Agile Methodologies.


Confidential, St. Louis Missouri

Hadoop/Spark/Scala Developer


  • Developing scripts to perform business transformations on the data using Hive and PIG.
  • Developing UDFs in java for hive and pig.
  • Worked on reading multiple data formats on HDFS using Scala.
  • Worked on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods technologies
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Real time streaming the data using Spark with Kafka.
  • Import the data from different sources like HDFS/Hbase into Spark RDD developed a data pipeline using Kafka and Storm to store data into HDFS. Performed real time analysis on the incoming data.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Built Kafka Rest API to collect events from front end.
  • Wrote Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, ZooKeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Built real time pipeline for streaming data using Kafka and Spark Streaming.
  • Data analysis through Pig, Map Reduce, Hive.
  • Design and develop Data Ingestion component.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Cluster coordination services through Zookeeper
  • Import of data using Sqoop from Oracle to HDFS.
  • Working on Agile scrum methodologies.
  • Import and export of data using Sqoop from or to HDFS and Relational DB Teradata.
  • Developed POC on Apache-Spark and Kafka.
  • Worked on Implementation of a toolkit that abstracted Solr & ElasticSearch.
  • Implement Flume, Spark, Spark Stream framework for real time data processing.
  • Hands on experience in installing, configuring and using eco-System components like Hadoop MapReduce, HDFS, Hbase, Pig, Flume, Hive and Sqoop.
  • Developed analytical component using Scala, Spark and Spark Stream.

Environment: Java, Scala, Python, J2EE, Cloudera, Hadoop, Kafka, Spark, Elasticsearch, Cassandra, HBase, Hive, Agile, Pig, Sqoop, MySQL, Teradata, GitHub

Confidential, Charlotte, NC

Hadoop/Spark Developer


  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala
  • Developed and executed shell scripts to automate the jobs
  • Wrote complex Hive queries and UDFs.
  • Worked on reading multiple data formats on HDFS using PySpark
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
  • Worked on developing applications in Hadoop Big Data Technologies-Pig, Hive, Map-Reduce, Oozie, Flume, and Kafka.
  • Involved in loading data from UNIX file system to HDFS
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts.
  • Complaint/Claims data visualization and flexible search using ElasticSearch, Kibana.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, Spark and loaded data into HDFS.
  • Hands on experience on cloud services like Amazon web services (AWS)
  • Followed agile methodology, interacted directly with the client provide/take feedback on the features, suggest/ implement optimal solutions, and tailor application to customer needs.
  • Manage and review Hadoop log files.
  • Involved in analysis, design, testing phases and responsible for documenting technical specifications
  • Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
  • As Scrum Master for the Shared Risk Platform team within the Operational Risk technology area, I have established Scrum process, coached the team on Agile principles, values and practices, introduced JIRA as the Agile tool and led the team as a servant leader to deliver multiple increments.
  • Worked on the core and Spark SQL modules of Spark extensively.
  • Experienced in running Hadoop streaming jobs to process terabytes data.
  • Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.

Environment: Hadoop, HDFS, Hive, Python, Scala, Spark, Elasticsearch, SQL, Agile, Teradata, UNIX Shell Scripting, Cloudera.

Confidential, Bellevue WA

Hadoop Developer


  • Installed and configured Hadoop on a cluster.
  • Developed multiple MapReduce Jobs in java for data cleaning and pre-processing
  • Developed Simple to complex Map Reduce Jobs using Hive and Pig
  • Extending Hive and Pig core functionality by writing custom UDFs
  • Analyzed large data sets by running Hive queries and Pig scripts
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way.
  • Acted as Scrum Master for Product teams with a focus on guiding the teams towards improving the way they work.
  • Experienced in defining job flows using Oozie
  • Experienced in managing and reviewing Hadoop log files
  • Experienced in collecting, aggregating, and moving large amounts of streaming data into HDFS using Flume.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Responsible to manage data coming from different sources and application
  • Working Knowledge in NoSQL Databases like HBase, Mahout and Cassandra.
  • Good Knowledge of analyzing data in HBase using Hive and Pig.
  • Involved in Unit level and Integration level testing.
  • Prepared design documents and functional documents.
  • Based on the requirements, addition of extra nodes to the cluster to make it scalable.
  • Involved in running Hadoop jobs for processing millions of records of text data
  • Involved in loading data from local file system (LINUX) to HDFS
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Responsible to manage data coming from different sources.
  • Assisted in exporting analyzed data to relational databases using Sqoop
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing
  • Submit a detailed report about the daily activities on a weekly basis.

Environment: Hadoop-HDFS, Pig, Sqoop, HBase, Hive, Flume MapReduce, Cassandra, Oozie and MySQL

Confidential, Madison WI

Hadoop Developer


  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Cassandra, Mahout, Zookeeper and Sqoop.
  • Involved with Business Analysts in gathering requirements.
  • Involved in designing Logical/Physical Data Models.
  • Deployed Hadoop Cluster in Pseudo-distributed and Fully Distributed modes.
  • Involved in running Ad-Hoc query through PIG Latin language, Hive or Java MapReduce.
  • Created complex mappings using different transformations like Filter, Router, Connected & Unconnected lookups, Stored Procedure, Joiner, Update Strategy, Union, Expression and Aggregator transformations to pipeline data to DataMart. Also, made use of variables and parameters.
  • Developed PowerCenter mappings to extract data from various databases, Flat files and load into DataMart using the Informatica 8.6.1.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Involved in Big data analysis using Pig and User defined functions (UDF).
  • Managed and scheduled Jobs on a Hadoop cluster.
  • Involved in log file management where the logs greater than 7 days old were removed from log folder and loaded into HDFS and stored for 3 months.
  • Implemented Name node backup using NFS. This was done for High availability.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Set up standards and processes for Hadoop based application design and implementation.
  • Created Hive External tables and loaded the data into tables and query data using HQL.
  • Implemented various Performance Tuning techniques on Sources, Targets, Mappings, and Workflows.
  • Written shell scripts in UNIX to execute the workflow in a loop to process ‘n’ number of files and FTP Scripts to pull the files from FTP server to Linux Server.
  • Developed and followed agile project management plan (Agile Ceremonies). Facilitated build requirements log (product backlog) with cost estimates and priority.
  • Conducted Scrum Daily standup, Product backlog, Sprint Planning, Sprint Review & Sprint Retrospective meetings.
  • Determined the team capacity (velocity) from historical data. Created Work Break down structure (user stories) and corresponding activities (tasks).
  • Worked on Hadoop Backup Recovery and Upgrade.
  • Collected the logs data from web servers and integrated into HDFS using Flume.
  • Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the MapReduce jobs given by the users.
  • Involved with reporting team to generating reports from Data Mart using Cognos.

Environment: Apache Hadoop, EDW, EDM Informatica PowerCenter 8.6/8.1, SQL Server 2005, TOAD, Rapid SQL, Oracle 10g (RAC), HDFS, Map Reduce, Mongo DB, Java, VMware, HIVE, Eclipse, PIG, Hive, HBase, Sqoop, Flume, Linux, UNIX, DB2.


Software Engineer


  • Involved in different phases of Software Development Lifecycle (SDLC) like Requirements gathering, Analysis, Design and Development of the application.
  • Wrote several Action Classes and Action Forms to capture user input and created different web pages using JSTL, JSP, HTML, Custom Tags and Struts Tags.
  • Designed and developed Message Flows and Message Sets and other service component to expose Mainframe applications to enterprise J2EE applications.
  • Used standard data access technologies like JDBC and ORM tool like Hibernate
  • Worked on various client websites that used Struts 1 framework and Hibernate
  • Wrote test cases using JUnit testing framework and configured applications on WebLogic Server
  • Involved in writing stored procedures, views, user-defined functions and triggers in SQL Server database for Reports module.

Environment: Java, JSP, JUnit, Eclipse, JIRA, JDBC, Struts 1, Hibernate, Visual Source Safe (VSS), WebLogic, Oracle 9i.


Java Developer


  • Developed Web interface using JSP, Standard Tag Libraries (JSTL), and Struts Framework.
  • Used Struts as MVC framework for designing the complete Web tier.
  • Developed different GUI screens JSPs using HTML, DHTML and CSS to design the Pages according to Client Experience Workbench Standards.
  • Validated the user input using Struts Validation Framework.
  • Client side validations were implemented using JavaScript.
  • Implemented the mechanism of logging and debugging with Log4j.
  • Version control of the code and configuration files are maintained by CVS.
  • Developed PL/SQL packages and triggers.
  • Developed test cases for Unit testing and performed integration and system testing.

Environment: J2EE, Weblogic, Eclipse, Struts 1.0, JDBC, JavaScript, CSS, XML, ANT, Log4J, VSS, PL/SQL and Oracle 8i.

We'd love your feedback!