We provide IT Staff Augmentation Services!

Hadoop Developer Resume



  • ExperiencedHadoopdeveloper wif strong foundation in distributed file systems like HDFS, HBase in big data environment. Excellent understanding of teh complexities associated wif big data wif experience in developing modules and codes in MapReduce, Hive, Pig, Sqoop, Apache Flume and Apache Spark to address those complexities
  • 8+ years of IT experience in Analysis, Design, Development and Big Data in Scala, Spark, Hadoop, Pig and HDFS environment and experience in JAVA, J2EE.
  • Experience in using Hcatalog for Hive, Pig and Hbase.Experienced wif NOSQL databases like HBASE and Cassandra.
  • Good experience in installing, configuring, and administrating Hadoop cluster of major Hadoop distributions Hortonworks, Cloudera.
  • Strong work experience on Kafka streaming to fetch teh data real time or near real time.
  • Experience data processing like collecting, aggregating, moving from various sources using Kafka.
  • Good experience in developing solutions to analyze largedatasets efficiently.
  • Experience in setting cluster in Amazon EC2 & S3 including teh automation of setting & extending teh clusters in AWS Amazon cloud.
  • Familiar wif various Relational databases - MS SQL and Teradata
  • Experienced wif Oozie Workflow Engine in running workflow jobs wif actions that run Impala, Hadoop MapReduce and Pig jobs.
  • Hands on experience in Import/Export ofdatausing HadoopDataManagement tool SQOOP.
  • Experienced working on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), set EMR (Elastic MapReduce).
  • Comprehensive noledge in Debugging, Optimizing and Performance Tuning of DB2, Oracle and MYSQL databases.


Languages & Hadoop Components: HDFS, Sqoop, Flume, Hive, Pig, MapReduce, YARN, Oozie, Kafka, Spark, Impala, Storm, Hue, Zookeeper, Java, SQL.

BigDataPlatforms: Hortonworks, Cloudera, Amazon

Databases & NOSQL Databases: Oracle, MYSQL, Microsoft SQL Server, HBase and Cassandra.

Operating Systems: Linux, UNIX, Windows.

Development Methodologies: Agile/Scrum, Waterfall.

IDE's: Eclipse, Net Beans, GitHub, Jenkins, Maven, IntelliJ, Ambari.

Programming Languages: C, C++, JSE, XML, JSP/Servlets, Spring, HTML, JavaScript, jQuery, Web services, Python, Scala, PL/SQL & Shell Scripting.


Confidential, Philadelphia

Hadoop Developer


  • Applied transformations on teh data loaded into Spark Dataframes and done in memory data computation to generate teh output response.
  • Developed multiple POCs using Spark Scala and deployed on teh Yarn cluster, compared teh performance of Spark, wif Hive and SQL.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using Python (PySpark)
  • Developed Scala scripts using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Used hive to analyze teh partitioned data and compute various metrics for reporting.
  • Import teh data from different sources like HDFS into Spark Data frames.
  • Scheduled and executed workflows in Oozie to run Hive and Pig jobs
  • Experienced wif Spark Context, Spark -SQL, Data Frame and Pair RDD's.
  • Reduced teh latency of spark jobs by tweaking teh spark configurations and following other performance and Optimization techniques.
  • Used Oozie workflow engine to manage interdependentHadoopjobs and to automate several types ofHadoopjobs such as Java map-reduce, Hive, Pig.
  • Used Hive, spark SQL Connection to generate Tableau BI reports.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Created Hive Generic UDF's to process business logic that varies based on policy.
  • Developed various data connections from data source to SSIS, Tableau Server for report and dashboard development
  • Developed solutions utilizing teh Hadoop ecosystem such Hadoop, Spark, Hive, HBASE, Pig, Sqoop, Oozie, Ambari, Zookeeper etc.
  • Experience in writing map reduce programs wif java api to cleanse structured and unstructured data.
  • Experience in Rdms such as oracle, Teradata.
  • Worked on loading teh data from mysql & teradata to Hbase where necessary using sqoop.

Environment: Scala, spark, Kafka, Hive, Horotonworks, Oozie, Play framework, Akka, Git, ElasticSearch, Logstash, Kibana, Kerberos.

Confidential, Kokomo

Hadoop Developer/Admin


  • Installed, configured, upgraded, and applied patches and bug fixes for Prod, Lab and Dev Servers.
  • Install, configure and administer HDFS, Hive, Ranger, Pig, HBase, Oozie, Sqoop, Spark and Yarn.
  • Involved in upgrading Cloudera Manager Upgrade from Cloudera Manager 5.5 to Cloudera Manager 5.6.
  • Involved in capacity planning, load balancing and design of Hadoop clusters.
  • Involved in setting up alerts in Cloudera Manager for teh monitoring health and performance of Hadoop Clusters.
  • Involved in installing and configuring security autantication using Kerberos security.
  • Creating and dropping of users, granting and revoking permissions to users/Policies as and when required using Ranger.
  • Commission and decommission teh data nodes from cluster.
  • Write and modify UNIX shell scripts to manage HDP environments.
  • Involved in installed and configured Apache Flume, Hive, Sqoop and Oozie on teh Hadoop cluster.
  • Create directories and setup appropriate permissions for different applications or users.
  • Backup tables in HBase to HDFS using export utility.
  • Involved in creating users, user’s groups and allotting teh roles of teh users and creating teh home directory for teh user.
  • Installation, Configuration and administration of HDP on Red Hat Enterprise Linux 6.6
  • Used Sqoop to import data into HDFS from Oracle database.
  • Detailed analysis of system and application architecture components per functional requirements.
  • Review and monitor system and instance resources to insure continuous operations (i.e., database storage, memory, CPU, network usage, and I/O contention)
  • On call support for 24x7 Production job failures and resolve teh issue in timely manner.
  • Developed UNIX scripts for scheduling teh delta loads and master loads using Auto sys Scheduler.
  • Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment.
  • Troubleshoots wif problems regarding teh databases, applications and development tools.

Environment: Hadoop, HDFS, Hive, Cloudera Manager, Sqoop, Flume, Oozie, CDH5, MongoDB, Cassandra, HBase, Hue, Kerberos and Unix/Linux.

Confidential, Dallas, TX

Java/Hadoop Developer


  • All teh fact and dimension tables were imported from sql server into Hadoop using Sqoop.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream teh log data from servers.
  • Developed PIG Latin scripts to extract teh data from teh web server output files to load into HDFS.
  • Involved in extracting customer’s Big data from various data sources into Hadoop HDFS. dis included data from mainframes, databases and logs data from servers.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream teh log data from servers.
  • Developed Tableau visualizations and dashboards using Tableau Desktop.
  • Developed Tableau workbooks from multiple data sources using Data Blending.
  • Experience in managing and reviewing Hadoop log files.
  • Developed MapReduce programs to cleanse teh data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Teh Hive tables created as per requirement were managed or external tables defined wif appropriate static and dynamic partitions, intended for efficiency.
  • Implemented Partitioning, Bucketing in Hive for better organization of teh data.
  • Developed python UDFs in Pig and Hive.
  • Used Apache Kafka to gather log data and fed into HDFS.
  • Data Ingestion using Sqoop from various sources like Informatica, Oracle
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs.
  • Installed and configured various components of Hadoop ecosystem and maintained their integrity.
  • Implemented Fair Scheduler on teh job tracker to allocate teh fair amount of resources to small jobs.
  • Implemented automatic failover Zookeeper and zookeeper failover controller.
  • Developed Java map-reduce programs to encapsulate transformations.
  • Participation in Performance tuning in database side, transformations, and jobs level.

Environment: Hadoop, HDFS, Map Reduce, Sqoop, Hive, Pig, Oozie, Hbase, CDH4, Cloudera Manager, MySQL, Eclipse

Confidential, Roseville, CA

Hadoop Developer


  • Responsible for building data solutions in Hadoop using Cascading frameworks.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Worked hands on wif ETL process.
  • Explored wif teh spark, improving teh performance and optimization of teh existing algorithms in hadoop using spark context, spark-sql, data frame, pair rdd's, spark yarn.
  • Import teh data from different sources like hdfs/hbase into spark rdd.
  • Developed spark code using scala and spark-sql/streaming for faster testing and processing of data.
  • Developed kafka producer and consumers, hbase clients, spark and hadoop map reduce jobs along wif components on hdfs, hive.
  • Upgrading teh Hadoop Cluster from CDH3 to CDH4. Integrate teh HIVE wif existing applications.
  • Configured Ethernet bonding for all Nodes to double teh network bandwidth.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted teh data from Teradata into HDFS using Sqoop.
  • Used Python and Shell scripts to automate teh end-to-end ELT process
  • Analyzed teh data by performing Hive queries and running Pig scripts to no user behavior.
  • Continuous monitoring and managing teh Hadoop cluster through Cloudera Manager.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Developed Hive queries to process teh data and generate teh data cubes for visualizing.
  • Performed data quality checks on data as per teh business requirement.
  • Performed data validation on target table in compared to teh source table.
  • Achieved high throughput and low latency for ingestion jobs leveraging teh Sqoop
  • Transformed teh raw data and loaded into stage and target tables.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Teradata, Cloudera Manager, Pig, Sqoop, Oozie, Python.


Java/J2EE Developer


  • Involved in designing and developing modules at both Client and Server Side.
  • Developed teh UI using JSP, JavaScript and HTML.
  • Responsible for validating teh data at teh client side using JavaScript.
  • Interacted wif external services to get teh user information using SOAP web service calls
  • Developed web components using JSP, Servlets and JDBC.
  • Technical analysis, design, development and documentation wif a focus on implementation and agile development.
  • Developed a Web based reporting system wif JSP, DAO and Apache Struts-Validator using Struts framework.
  • Designed teh controller using Servlets.
  • Accessed backend database Oracle using JDBC.
  • Developed and wrote UNIX Shell scripts to automate various tasks.
  • Developed user and technical documentation.
  • Developed business objects, request handlers and JSPs for dis project using Java Servlets and XML.
  • Developed core spring components wif some of teh modules and integrated it wif teh existing struts framework.
  • Actively participated in testing and designed user interface using HTML and JSPs.
  • Implemented teh database connectivity to Oracle using JDBC, designed and created tables using SQL.
  • Implemented teh server side processing using Java Servlets.
  • Installed and configured teh Apache Web server and also deployed JSPs and Servlets in Tomcat Server.

Environment: Java, Servlets, JSP, JavaScript, JDBC, Unix Shell scripting, HTML, Eclipse, Oracle 8i, WebLogic.

Hire Now