We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

Philadelphia, PA


  • IT Professional with 8+ years of referable experience in distributed file systems like HDFS and HBase in BigData environment.
  • Excellent understanding of the complexities associated with BigData with expertise in developing modules and codes in MapReduce, Hive, Pig, Sqoop, Apache Flume and Apache Spark to address those complexities
  • Highly skilled in Analysis, Design, Development and BigData in Scala, Spark, Hadoop, Pig and HDFS environment and experience in JAVA, J2EE.
  • Experience in using HCatalog for Hive, Pig and HBase.Experienced with NOSQL databases like HBASE and Cassandra.
  • Good experience in installing, configuring, and administrating Hadoop cluster of major Hadoop distributions Hortonworks, Cloudera.
  • Strong work experience on Kafka Streaming to fetch the data real time or near real time.
  • Expert in data processing like collecting, aggregating, moving from various sources using Kafka.
  • Good experience in developing solutions to analyze largedatasets efficiently.
  • Experience in setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud.
  • Familiar with various Relational databases - MS SQL and Teradata
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that run Impala, Hadoop MapReduce and Pig jobs.
  • Hands on experience in Import/Export ofdatausing HadoopDataManagement tool SQOOP.
  • Goo experience on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), set EMR (Elastic MapReduce).
  • Comprehensive knowledge in Debugging, Optimizing and Performance Tuning of DB2, Oracle and MYSQL databases.


Languages & Hadoop Components: HDFS, Sqoop, Flume, Hive, Pig, MapReduce, YARN, Oozie, Kafka, Spark, Impala, Storm, Hue, Zookeeper, Java, SQL.

BigData Platforms: Hortonworks, Cloudera, Amazon

Databases & NOSQL Databases: Oracle, MYSQL, Microsoft SQL Server, HBase and Cassandra

Operating Systems: Linux, UNIX, Windows

Development Methodologies: Agile/Scrum, Waterfall

IDE's: Eclipse, Net Beans, GitHub, Jenkins, Maven, IntelliJ, Ambari

Programming Languages: C, C++, JSE, XML, JSP/Servlets, Spring, HTML, JavaScript, jQuery, Web services, Python, Scala, PL/SQL & Shell Scripting


Confidential - Philadelphia, PA

Sr. Hadoop Developer


  • Applied transformations on the data loaded into Spark Dataframes and done in memory data computation to generate the output response.
  • Developed multiple POCs using Spark Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using Python (PySpark)
  • Developed Scala scripts using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Used hive to analyze the partitioned data and compute various metrics for reporting.
  • Import the data from different sources like HDFS into Spark Data frames.
  • Scheduled and executed workflows in Oozie to run Hive and Pig jobs
  • Extensively worked on Spark Context, Spark -SQL, Data Frame and Pair RDD's.
  • Reduced the latency of spark jobs by tweaking the spark configurations and following other performance and Optimization techniques.
  • Used Oozie workflow engine to manage interdependentHadoopjobs and to automate several types ofHadoopjobs such as Java map-reduce, Hive, Pig.
  • Used Hive, spark SQL Connection to generate Tableau BI reports.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Created Hive Generic UDF's to process business logic that varies based on policy.
  • Developed various data connections from data source to SSIS, Tableau Server for report and dashboard development
  • Developed solutions utilizing the Hadoop ecosystem such Hadoop, Spark, Hive, HBASE, Pig, Sqoop, Oozie, Ambari, Zookeeper etc.
  • Wrote MapReduce programs with Java API to cleanse structured and unstructured data.
  • Worked on loading the data from MySQL & Teradata to HBase where necessary using Sqoop.

Environment: Scala, spark, Kafka, Hive, HortonWorks, Oozie, Play framework, Akka, Git, ElasticSearch, Logstash, Kibana, Kerberos


Sr. Hadoop Developer/Admin


  • Installed, configured, upgraded, and applied patches and bug fixes for Prod, Lab and Dev Servers.
  • Install, configure and administer HDFS, Hive, Ranger, Pig, HBase, Oozie, Sqoop, Spark and Yarn.
  • Involved in upgrading Cloudera Manager Upgrade from Cloudera Manager 5.5 to Cloudera Manager 5.6.
  • Involved in capacity planning, load balancing and design of Hadoop clusters.
  • Involved in setting up alerts in Cloudera Manager for the monitoring health and performance of Hadoop Clusters.
  • Involved in installing and configuring security authentication using Kerberos security.
  • Creating and dropping of users, granting and revoking permissions to users/Policies as and when required using Ranger.
  • Commission and decommission the data nodes from cluster.
  • Write and modify UNIX shell scripts to manage HDP environments.
  • Involved in installed and configured Apache Flume, Hive, Sqoop and Oozie on the Hadoop cluster.
  • Create directories and setup appropriate permissions for different applications or users.
  • Backup tables in HBase to HDFS using export utility.
  • Involved in creating users, user’s groups and allotting the roles of the users and creating the home directory for the user.
  • Installation, Configuration and administration of HDP on Red Hat Enterprise Linux 6.6
  • Used Sqoop to import data into HDFS from Oracle database.
  • Detailed analysis of system and application architecture components per functional requirements.
  • Review and monitor system and instance resources to insure continuous operations (i.e., database storage, memory, CPU, network usage, and I/O contention)
  • On call support for 24x7 Production job failures and resolve the issue in timely manner.
  • Developed UNIX scripts for scheduling the delta loads and master loads using Auto sys Scheduler.
  • Troubleshoots with problems regarding the databases, applications and development tools.

Environment: Hadoop, HDFS, Hive, Cloudera Manager, Sqoop, Flume, Oozie, CDH5, MongoDB, Cassandra, HBase, Hue, Kerberos and Unix/Linux

Confidential - Dallas, TX

Java/Hadoop Developer


  • All the fact and dimension tables were imported from SQL Server into Hadoop using Sqoop.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Involved in extracting customer’s BigData from various data sources into Hadoop HDFS (this included data from mainframes, databases and logs data from servers).
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Developed Tableau visualizations and dashboards using Tableau Desktop.
  • Developed Tableau workbooks from multiple data sources using Data Blending.
  • Involved in managing and reviewing Hadoop log files.
  • Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • The Hive tables created as per requirement were managed or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Implemented Partitioning, Bucketing in Hive for better organization of the data.
  • Developed python UDFs in Pig and Hive.
  • Used Apache Kafka to gather log data and fed into HDFS.
  • Data Ingestion using Sqoop from various sources like Informatica, Oracle
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs.
  • Installed and configured various components of Hadoop ecosystem and maintained their integrity.
  • Implemented Fair Scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Implemented automatic failover Zookeeper and zookeeper failover controller.
  • Developed Java map-reduce programs to encapsulate transformations.
  • Participated in Performance tuning in database side, transformations, and jobs level.

Environment: Hadoop, HDFS, Map Reduce, Sqoop, Hive, Pig, Oozie, HBase, CDH4, Cloudera Manager, MySQL, Eclipse

Confidential - Roseville, CA

Hadoop Developer


  • Responsible for building data solutions in Hadoop using Cascading frameworks.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Worked hands on with ETL process.
  • Explored with the spark, improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's and Spark Yarn.
  • Imported the data from different sources like HDFS/HBase into Spark RDD.
  • Developed Spark Code using Scala and Spark-SQL /streaming for faster testing and processing of data.
  • Developed Kafka Producer and consumers, HBase clients, spark and Hadoop map reduce jobs along with components on HDFS and Hive.
  • Upgraded the Hadoop Cluster from CDH3 to CDH4. Integrate the HIVE with existing applications.
  • Configured Ethernet bonding for all Nodes to double the network bandwidth.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Teradata into HDFS using Sqoop.
  • Used Python and Shell scripts to automate the end-to-end ELT process
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Performed data quality checks on data as per the business requirement.
  • Performed data validation on target table in compared to the source table.
  • Achieved high throughput and low latency for ingestion jobs leveraging the Sqoop
  • Transformed the raw data and loaded into stage and target tables.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Teradata, Cloudera Manager, Pig, Sqoop, Oozie, Python


Java/J2EE Developer


  • Involved in designing and developing modules at both Client and Server Side.
  • Developed the UI using JSP, JavaScript and HTML.
  • Responsible for validating the data at the client side using JavaScript.
  • Interacted with external services to get the user information using SOAP web service calls
  • Developed web components using JSP, Servlets and JDBC.
  • Technical analysis, design, development and documentation with a focus on implementation and agile development.
  • Developed a Web based reporting system with JSP, DAO and Apache Struts-Validator using Struts framework.
  • Designed the controller using Servlets.
  • Accessed backend database Oracle using JDBC.
  • Developed and wrote UNIX Shell scripts to automate various tasks.
  • Developed user and technical documentation.
  • Developed business objects, request handlers and JSPs for this project using Java Servlets and XML.
  • Developed core spring components with some of the modules and integrated it with the existing struts framework.
  • Actively participated in testing and designed user interface using HTML and JSPs.
  • Implemented the database connectivity to Oracle using JDBC, designed and created tables using SQL.
  • Implemented the server side processing using Java Servlets.
  • Installed and configured the Apache Web server and also deployed JSPs and Servlets in Tomcat Server.

Environment: Java, Servlets, JSP, JavaScript, JDBC, Unix Shell scripting, HTML, Eclipse, Oracle 8i, WebLogic.

Hire Now