We provide IT Staff Augmentation Services!

Sr.hadoop/spark Developer Resume

New York City, NY


  • Around 7+ years of IT experience in software Development Life Cycle (Analysis, Design, Development, Testing, Deployment and Support) using WATERFALL and AGILE methodologies. Having 3+ years of experience in Data Analysis using Hadoop Eco System components (Spark, HDFS, MapReduce, Pig, Sqoop,Kafka, Hive, Cassandra and HBase) in Financial, Retail and Health - care sector.
  • Experience in Hadoop components like HDFS, MapReduce, Job Tracker, Name Node, Data Node Task Tracker and Apache Spark.
  • Experience in importing data from existing relational databases (Oracle, MySQL and Teradata) that provide SQL interfaces using Sqoop.
  • Hands on experience in Avro, Parquet, RC files and Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
  • Experience in developing Map Reduce programs using java API and using hive, pig to perform data analysis, data cleaning and data transformation.
  • Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design to load data into Hadoop environment.
  • Expertise in writing Hive UDF, Generic UDF's to in corporate complex business logic into Hive Queries.
  • Implemented Sqoop scripts for large dataset transfer between Hadoop and RDBMS.
  • Expertise in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning, bucketing, writing and optimizing the HiveQL queries.
  • Worked on different file formats (ORCFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, BZIP)
  • Experience in composing shell scripts to dump the shared information from MySQL servers to HDFS.
  • Performed different ETL operations using Pig for joining operations and transformations on data to join, clean, aggregate and analyze data.
  • Involved In working with Maven for build process.
  • Extensive experience on importing and exporting data using stream processing platforms like Flume and Kafka
  • Experience in data workflow scheduler Zoo-Keeper and Oozie to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with the control flows.
  • Knowledge in creating impala views on top of Hive tables for faster access to analyze data.
  • Integrated BI tool like Tableau with Impala and analyzed the data.
  • Experience with NoSQL databases like HBase, Cassandra and MongoDB.
  • Hands on Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive.
  • Experience in collecting the log data from different sources (webservers and social media) using Flume, Kafka and storing in HDFS to perform the MapReduce jobs.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data.
  • Exposure in working with data frames in Spark.
  • Hands on experience in working with Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
  • Profound experience in working with Cloudera (CDH4 &CDH5) and Horton Works Hadoop Distributions and Amazon EMR Hadoop distributors on multi-node cluster.
  • Good knowledge on AWS infrastructure services Amazon Simple Storage Service (Amazon S3), EMR, and Amazon Elastic Compute Cloud (Amazon EC2).
  • Knowledge in creating dashboards with the help of business inteligence tool such as Tableau.
  • Very Good understanding and Working Knowledge of Object Oriented Programming (OOPS), J2SE, Multithreading in Core Java, HTML, servlets, JSP, JDBC.
  • Experience in working with different relational databases like MySQL and Oracle.
  • Working knowledge in database design, writing complex SQL Queries and Stored Procedures.
  • Capable at using AWS utilites such as EMR,S3 and Cloud watch to run and monitor Hadoop/Spark jobs on AWS.
  • Having knowledge in making use of Pycharm and Python shell to develop spark based applications using Python as lanquage.
  • Expertise in various faces of Software Development including analysis, design, development and deployment of applications using Servlets, JSP, Java Beans, Struts, Spring Framework, JDBC.
  • Having Experience on Development applications like Eclipse, NetBeans etc.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Good analytical, communication, problem solving skills and adore learning new technical, functional skills .


Bigdata Ecosystem: HDFS and Map Reduce, Pig, Hive, Impala, YARN, HUE, Oozie, Zookeeper, Solr, Apache Spark, Apache STORM, Apache Kafka, Sqoop, Flume.

NoSQL Databases: HBase, Cassandra, and MongoDB

Hadoop Distributions: Cloudera, Hortonworks

Programming languages: Java, C, SCALA, Pig Latin, HiveQL.

Scripting Languages: Shell Scripting

Databases: MySQL, oracle, Teradata, DB2

Build Tools: Maven, Ant, sbt

Reporting Tool: Tableau

Version control Tools: SVN, Git, GitHub

Cloud: AWS, Azure

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

Web Design Tools: HTML, AJAX, JavaScript, JQuery, CSS and JSON.

Operating Systems: WINDOWS 10/8/Vista/ XP

Development IDEs: NetBeans, Eclipse IDE, Python(IDLE)

Packages: Microsoft Office, putty, MS Visual Studio


Confidential, New York City,NY

Sr.Hadoop/Spark Developer


  • Developed data pipeline using Kafka, Sqoop, Hive and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Developed SQOOP scripts for importing and exporting data into HDFS and Hive.
  • Developing design documents considering all possible approaches and identifying best of them.
  • Responsible to manage data coming from different sources.
  • Developing business logic using Scala.
  • Responsible for loading data from UNIX file systems to HDFS
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Developed scripts and automated data management from end to end and sync up between all the clusters.
  • Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Experienced with Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Scala.
  • Developed functional programs in SCALA for connecting the streaming data application and gathering web data.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Configured connection between Hive and Tableau using Impala for BI development tool.
  • Worked in Agile Methodology and used JIRA for maintain the stories about project.
  • Experience in automated scripts using Unixshell scripting to perform database activities.
  • Working experience with Linux lineup like Redhat and CentOS.
  • Good analytical,communication,problem solving skills and adore learning new technical, functional skills.

Environment: Hadoop, Map Reduce, Hive, Java, Maven, Impala, Pig, Spark, Oozie, Oracle, Yarn, GitHub, Tableau, Unix, Cloudera, Kafka, Sqoop, Scala, HBase.

Confidential, Rocky Hills,CT

Hadoop Developer


  • Created hive queries for extracting data and sending them to clients.
  • Created SCALA programs to develop the reports for Business users.
  • Created hive UDFs for formatting data in SCALA.
  • Distributed programming through spark, specifically Scala.
  • Transformation and Analysis in Hive/Pig, Parsing the raw data using Map reduce and SPARK.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
  • Worked on capturing transactional changes in the data using MAPREDUCE and HBASE.
  • Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MapReduce, HIVE, SPARK, SQOOP and Pig Latin.
  • Familiar with AWS Components like EC2,S3.
  • Worked with SQOOP import and export functionalities to handle large data set transfer between DB2 database and HDFS.
  • Worked on ingesting data from different sources.
  • Supported multiple application extracts coming out of Big Data Platform.
  • Followed agile methodology during project delivery.
  • Knowledge of CodeHub and GIT.
  • Worked/Coordinated with Offshore to complete the tasks.
  • Understanding of ServiceNowtool to submit Change requests, incidents for application deployments.

Environment: mapR, Hive, Pig, SPARK, SCALA, MapReduce, UNIX scripting, HBASE, Talend.

Confidential, Cranston, RI

Hadoop Developer


  • Implemented technical architecture and developed various Big Data workflows using custom MapReduce, Pig, Hive, Cassandra and Sqoop.
  • Deployed on premise cluster and tuned the cluster for optimal performance for job execution needs and processes large data sets.
  • Built re-usable Hive UDF libraries for business requirements which enabled various business analysts to use these UDF’s in Hive querying.
  • Used Kafka to dump the application server logs into HDFS.
  • The logs that are stored on HDFS are analyzed and the cleaned data is imported into Hive warehouse which enabled end business analysts to write Hive queries.
  • Configured various big data workflows to run on the top of Hadoop using oozie and these workflows comprise of heterogeneous jobs like Pig, Hive, Sqoop and MapReduce.
  • Experience in working with NoSQL database HBase in getting real time data analytics.
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
  • Assigned the tasks of resolving defects found in testing the new application and existing applications.
  • Analyzing the requirements, designing and developing solutions.
  • Managing Project team in achieving the project goals including resource allocation, resolving technical issues and mentoring the resources.
  • Used Linux (Ubuntu) machine for designing, developing and deploying of Java modules.

Environment: MapReduce, Pig, Hive, Sqoop, Kafka, FLUME, HBase, JDK 1.6, Maven, Linux

Confidential - Rolling Meadows, IL

Hadoop Developer.


  • Designed docs and specs for the near real-time data analytics using Hadoop and HBase.
  • Installed Cloudera Manager on the clusters.
  • Used a 15-node cluster with Cloudera Hadoop distribution on Amazon EC2.
  • Developed ad-clicks based data analytics, for keyword analysis and insights.
  • Crawled public posts from Facebook and tweets.
  • Used Flume and Kafka to get the streaming data from Twitter and Facebook.
  • Hands on experience in MapReduce jobs with the Data Science team to analyze this data.
  • Converted output to structured data and imported to Tableau with analytics team.
  • Defined problems to look for right data and analyze results to make room for new project.

Environment: Hadoop, HBase, HDFS, MapReduce, Flume, Java, Tableau, Cloudera Manager, Amazon EC2.


Java Developer


  • Interaction with business team for detailed specifications on the requirements and issue resolution.
  • Developed user interfaces using HTML, XML, CSS, JSP, Java Script and Struts Tag Libraries and defined common page layouts using custom tags.
  • Developed client-side validations using JavaScript.
  • Implemented Struts MVC Paradigm components such as Action Mapping, Action class, Action Form, Validation Framework, Struts Tiles and Struts Tag Libraries.
  • Involved in the development of the front end of the application using Struts framework and interaction with controller java classes.
  • Provided development support for System Testing, User Acceptance Testing and Production and deployed application on JBoss Application Server.
  • Wrote and executed efficient SQL queries (CRUD operations), JOINs on multiple tables, to create and test sample test data in Oracle Database using Oracle SQL Developer.
  • Used CVS for check-in, check-out of files to control versions of files.
  • Used Eclipse as an IDE.
  • Used HP Quality Center to track activities and defects
  • Implemented logging with Log4j
  • Used Maven to compile and build project.
  • Developed Style Sheet to provide dynamism to the pages and extensively involved in unit testing and System testing using JUnit and involved in critical bug fixing.
  • Utilized the base UML methodologies and Use cases modeled by architects to develop the front-end interface. The class, sequence and state diagrams were developed using Visio.

Environment: Java, Struts 1.2, Hibernate 3.0, JSP, JavaScript, HTML, XML, Oracle, Eclipse, JBoss Application Server, ANT, CVS, and SQL Developer.


SQL Developer


  • Involved in installation and configuration of SQL server 2005 on Database Servers.
  • Developed database objects like Tables, Views, User-defined Functions and Triggers to handle complex business rules, history data and audit analysis.
  • Worked with Complex T-SQL queries, Sub queries, co-related sub queries and joins to fetch the data as per the functional requirements.
  • Used Common Table expressions for hierarchical data and complex stored procedures.
  • Created various integrity constraints like Primary Key, Foreign Keys, Unique and Check to support application functionality.
  • Worked with command shell to invoke executables in SQL Stored Procedures .
  • Actively participated in gathering of User Requirement and System Specification.
  • Maintained User account administration for Different domains.
  • Involved in creating SQL reports and generating emails through DB Mail.
  • Worked with loading of data from Excel using OPEN ROWSET commands.
  • Creation/ Maintenance of Indexes for fast and efficient reporting process .
  • Created SSIS package to load data from Flat files, DB2 by using Lookup, Derived Columns, Data conversions and Condition Split transformations.
  • Maintained the physical Databases by monitoring Performance, space utilization and physical integrity.
  • Generating Reports as per the requirement using SSRS.

Environment: MS SQL server 2005, Microsoft Visual studio 2005, SSIS, SSRS, DB2, Microsoft Visual Studio 2005, Windows Server 2005, Performance Monitor and MS Office.

Hire Now