We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Chicago, IL


  • Having around 6 years of professional IT experience in Analysis, Development, Integration and Maintenance of Web based and Client/Server applications using Java and Big Data technologies.
  • 5 years of relevant experience in Hadoop Ecosystem and architecture (HDFS, Apache Spark, MapReduce, YARN, Pig, Hive, HBase, Sqoop, Flume, Oozie).
  • Experience in all phases of software development life cycle (SDLC), which includes User Interaction, Business Analysis/Modelling, Design/Architecture, Development, Implementation, Integration, Documentation, Testing, and Deployment
  • Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, ZOOKEEPER, SQOOP, HUE, JSON.
  • Reading data from File system into a Apache Spark RDD.
  • Good understanding in processing of real - time data using Apache Spark.
  • Inject data using Sqoop from various RDBMS like Oracle, MYSQL, and Microsoft SQL Server into Hadoop HDFS.
  • Integration of OBIEE, ODI, Tableau with Hive.
  • Experienced in WAMP (Windows, Apache, MYSQL, and Python /PHP) and LAMP (Linux, Apache, MySQL, and Python /PHP) Architecture.
  • Good experience in developing web applications implementing Model View Control architecture using Django, Flask, Pyramid and Zope Python web application frameworks.
  • Experience in implementation of Open-Source frameworks like Spring, Hibernate, Web Services etc.,
  • Experience in Continuous Integration and Continuous Deployment by the tools like Jenkins
  • Experience in manipulating the streaming data to clusters through Kafka and Apache Spark- Streaming.
  • Experience with databases such as Oracle 9i, PostgreSQL, MySQL Server with cluster setup and writing the SQL queries Triggers & Stored Procedures
  • Very Good understanding and Working Knowledge of Object Oriented Programming(OOPS), Python and Scala.
  • Experienced with the Apache Spark improving the performance and optimization of the existing algorithms in Hadoop using Apache Spark Context, Apache Spark-SQL, Data Frame, Pair RDD's, Apache Spark YARN.
  • Proficient in working with NoSQL database like MongoDB , Cassandra and HBase.
  • Good Knowledge in NoSQL databases HBASE (Column family DB).
  • Good knowledge on Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
  • Communicated to diverse communities of clients Confidential offshore and onshore, dedicated to client satisfaction and quality outcomes. Extensive experience in coordinating the Offshore Development activities
  • Highly organized and dedicated with positive Attitude, possess good time management and organizational skills with the ability to handle multiple tasks with positive attitude.
  • Experience working across multiple industries with Fortune 500 customers and government agencies.


Big Data components::

Hadoop/Big Data HDFS, MapReduce, HBase, Pig, Cassandra, Hive, Scala, Sqoop, Oozie, Kettle, Kafka, Zookeeper, MongoD

Programming Languages: Java (J2SE, J2EE), C, C#, PL/SQL, Swift, SQL+, ASP.NET, JDBC, Python

Mobile Development: Android, IOS application development with Swift, Objective C

Web Development: JavaScript, JQuery, HTML 5.0, CSS 3.0, AJAX, JSON

JUnit Testing, HP: Unified functional testing, HP- Performance Center, Selenium, win runner, Load Runner, QTP

UNIX Tools: Apache, Yum, RPM

Operating Systems: Windows, Linux, Ubuntu, Mac OS, Red Hat Linux

Protocols: TCP/IP, HTTP and HTTPS

Web Servers: Apache Tomcat

Cluster Management Tools: Cloudera Manager, HortonWorks, Ambari

Agile, V: model, Waterfall model

Databases: HBase, MongoDB, Cassandra, Oracle 10g, MySQL, Couch, MS SQL server

Encryption Tools: VeraCrypt, AxCrypt, BitLocker, GNU Privacy Guard


Hadoop/Spark Developer

Confidential - Chicago, IL


  • The main aim of the project is tuning the performance of the existing Hive Queries.
  • Implemented Spark using Scala, Java and utilizing Data frames and Spark SQL API for faster processing of data.
  • Created end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement
  • Developed data pipeline using Spark, Hive and Sqoop, to ingest, transform and analyze operational data.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark DataFrames and Scala.
  • Analyzed the SQL Scripts and designed the solution to implement using Scala.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Real time streaming the data using Spark with Kafka
  • Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters. Used in production by multiple companies.
  • Developed Python Scripts to copy data between the clusters. The Python script that is developed for the copy enables to copy huge amount of data very fast.
  • Ingested syslog messages, parses them and streams the data to Apache Kafka.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
  • Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
  • Created HBase tables and column families to store the user event data.
  • Scheduled and executed workflows in Oozie to run Hive jobs.

Environment: HADOOP, Hortonworks, HDFS, Hive, Impala, Spark, Kafka, Sqoop, Pig, Java, Scala, Eclipse, Teradata, UNIX, and Maven, AZURE, Anlytic TOOLS.

Hadoop/Spark/Scala Developer

Confidential - Cuyahoga Falls, OH


  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Apache Spark written in Scala
  • Creating end to end Apache Spark-Solr applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement
  • Used flume, sqoop, hadoop, Apache Spark and oozie for building data pipeline.
  • Good knowledge on Apache Spark Ecosystem and Apache Spark Architecture.
  • Cluster coordination services through Zookeeper.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
  • Implemented Apache Spark using Scala and Apache SparkSQL for faster testing and processing of data.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Involved in loading data from On-premises data warehouse to AWS's Cloud using different approaches like Sqoop, Spark, storm and AWS Services.
  • Experience on bootstrapping and maintaining AWS using Chef on complex hybrid IT infrastructure nodes through the VPN and Jump Servers.
  • Contributed towards developing a Data Pipeline to load data from different sources like Web, RDBMS, NoSQL to Apache Kafka or Apache Spark cluster.
  • Migrating data from Apache Spark-RDD into HDFS and NoSQL like Cassandra/Hbase .
  • Worked on reading multiple data formats on HDFS using PyApache Spark
  • Expertise in developing web-based open stack applications using Python and Django for large dataset analysis.
  • Wrote Pig Latin to manipulate ETL process and aggregate data in Hortonworks
  • Created ODBC connection through Sqoop between Hortonworks and SQL Server
  • Configured Flume to stream live Twitter data into Hortonworks nodes
  • Strong Programming Skills in development and implementation of multi-layer applications using Java, Java8, J2EE, JDBC, SQL, JSP, HTML, Struts, Spring, JavaScript, Velocity, XML, JAXB, Play, Akka , Cassandra
  • Developed REST APIs using Java, Play framework and AKKA .
  • Developed Kafka producer and consumers, HBase clients, Apache Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
  • Worked on the core and Apache Spark SQL modules of Apache Spark extensively.

Environment: Hadoop, HDFS, Hive, Scala, Apache Spark, SQL, Teradata, UNIX Shell Scripting, Big Data, Map Reduce, AKKA, Sqoop, Oozie, Pig, Zookeeper, Flume, LINUX, Java, Eclipse, Python 2.7, Spark

Hadoop/Scala Developer



  • Create, validate and maintain scripts to load data using Sqoop manually.
  • Create Oozie workflows and coordinators to automate Sqoop jobs weekly and monthly.
  • Worked on reading multiple data formats on HDFS using Scala.
  • Involved in converting Hive/SQL queries into Apache Spark transformations using Apache Spark RDDs and Scala.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Apache Spark, with Hive and SQL/Teradata.
  • Sound knowledge in Spring, Hibernate, RDBMS, Web, XML, Ant/Maven, HTML, Shell Scripting, Python.
  • Analyzed the SQL scripts and designed the solution to implement using Scala
  • Develop, validate and maintain HiveQ L queries.
  • Fetch data to/from HBase using Map Reduce jobs.
  • Future state to included Azure machine learning decision tree to automatically categorize and analyze data more quickly and add multiple data attributes.
  • A strong fan of Nascar Racing using Scala, Bash, tmux, Hive, Zeppelin , Apache Spark, Hadoop HDFS, Cloudera, IntelliJ, SBT
  • Designed Hive tables to load data to and from external tables.
  • Run executive reports using Hive and Qlik View.
  • Load and transform large sets of unstructured data from UNIX system to HDFS
  • Use Apache Scoop to dump the data user data into the HDFS on a weekly basis.
  • Created production jobs using Oozie work flows that integrated different actions like Map Reduce, Sqoop, Hive.
  • Used Scala collection framework to store and process the complex employer information. Based on the offers setup for each client, the requests were post processed and given offers.
  • Successfully migrated Django database from SQLite to MySQL with complete data integrity.
  • Involved in developing a linear regression model to predict a continuous measurement for improving the observation on wind turbine data developed using Apache Spark with Scala API.
  • Good knowledge in writing Apache Spark application using Python and Scala.
  • Rewrite existing Python /Django modules to deliver certain format of data.
  • Upgraded Python 2.3 to Python2.5 on a RHEL 4 server, this required recompiling mod Python to use Python 2.5. This upgrade was necessary because because in lined models with UTF-8
  • Using API for large scale graph processing implemented by Apache Giraph, a project with implementations of graph algorithms and running on Hadoop.

Environment: Hadoop Horton Works, Hadoop Stack (Hive, PIG, HCatlog, Sqoop, Oozie), Qlik view, Windows 8, SQL Server 2010, Bit Bucket, Scala, Python Django, Unix, Apache Spark MLib, Spark SQL, Spark Streaming, RDD

Hadoop Developer

Confidential - IN


  • Create, validate and maintain scripts to load data from and into tables in Oracle PL/SQL and in SQL Server 2008 R2.
  • Wrote Store Procedures and Triggers.
  • Converting, testing and validating Oracle scripts to SQL Server.
  • Developed Kafka producer and consumers, HBase clients, Apache Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS.
  • Used SOLR for database integration IBM MAXIMO to SQL SERVER.
  • Upgraded IBM Maximo database from 5.2 to 7.5.
  • Analyze, validate and document the changed records for IBM Maximo web application.
  • Importing data from MySQL database to HiveQL using Scoop.
  • Writing Map Reduce jobs.
  • Develop, validate and maintain HiveQL queries.
  • Running reports in Pig and Hive Queries.
  • Wrote and Implemented Apache PIG scripts to load data from and to store data into Hive.
  • Install and configure Hue.
  • Managing Amazon Web Services AWS infrastructure with automation and configuration management tools such as IBM Udeploy, Puppet or custom-built designing cloud-hosted solutions, specific AWS product suite experience.
  • Junit for unit testing.
  • Conduct data mining, data modelling, statistical analysis, business intelligence gathering, trending and benchmarking by using Datameer.
  • Used Tableau for visualization and generate reports for financial data consolidation, reconciliation and segmentation.
  • Designed and developed script for transfer of files using FTP/SFTP between servers according to business requirements
  • Implemented machine learning techniques like clustering and regression on Tableau and created interactive dashboards
  • Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4), Yarn distributions.
  • Support full testing cycle for ETL processes, including bug fixes. sing the data Integration tool Pentaho for designing ETL jobs in the process of building Data warehouses and Data Marts.
  • Performed upgrades, package administration and support for over 200 Linux servers.
  • Performed automated installation of CentOS operating system using kickstart.

Environment: HDFS, Hive, Pig, Sqoop, Zookeeper, Oozie, ETL, Pentaho BI.5.0.1, AWS, Tableau, Hive Query, CentOS, Cloudera

Hire Now