We provide IT Staff Augmentation Services!

Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Overland Park, KS

SUMMARY:

  • 8years of experience in developing, implementing andconfiguring Hadoop ecosystem and development of various web applications using Java,J2EE.
  • Having 4 years of experience inBig Data Analyticsusing Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Sqoop, YARN, Spark, Scala, Oozie, Kafka and Flume.
  • Work experience with different Hadoop distributions like Horton Worksand Cloudera.
  • Excellent understanding of Hadoop distributed File systemand experienced in developing efficient MapReduceand YARN programs to process large datasets.
  • Good working knowledge in using Sqoop to transfer bulk data between relational databases & HDFS and Flume for ingesting streaming data into HDFS.
  • Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
  • Implemented Talend jobs to load data from different sources and also integrated with Kafka.
  • Highly skilled in integrating Kafkawith Spark streamingfor high speed data processing.
  • Very good at loading data into spark schema RDD’s and querying them using Spark - SQL.
  • Good at writing custom RDD’s in Scala for applying data specific transformations and also implemented design patterns to improve the performance.
  • Experienced in using apache Hue and Ambarito manage and monitor the Hadoop clusters.
  • Experience in analysing large amounts of data using Pig Latin Scripts and Hive Query Language and also assisted with performance tuning.
  • Mastery in writing customized UDF's using java to extend PIG and HIVE functionalities.
  • Sound knowledge in using Apache Solr to search against structured and un-structured data.
  • Worked with Azkaban and Oozie workflow schedulersto recurrently run Hadoop jobs.
  • Experience in implementing Kerberos authentication protocol in Hadoop for data security.
  • Experience in creating dash boards and generating reports using Tableau by connecting to tables in Hive and HBase.
  • Experience in using Sequence files, RC, ORC and Avrofile formats and compression techniques.
  • Worked on NoSQL databases like HBase, Cassandra to store structured and unstructured data.
  • Good knowledge in cloud integration with Amazon Elastic MapReduce (EMR), Amazon Cloud Compute (EC2), Amazon's Simple Storage Service (S3) and Microsoft Azure.
  • Hands on experience onUNIXenvironment and shell scripting.
  • Experienced in using version control systems like SVN, GITbuild toolMaven and continuous integration tool Jenkins.
  • Expertise in development of Web Applications using J2EE technologies likeServlets, JSP, WebServices,Spring,Hibernate,HTML,JQuery, Ajax and etc.
  • Implemented design patternsto improve quality and performance of the applications.
  • Worked on Junit to test the functionality of java methodsand used Python to do automation.
  • Good experience in using Relational databases Oracle, SQLServer and PostgreSQL.
  • Worked with agile, Scrum and Confidential software development framework for managing product development.

TECHNICAL SKILLS:

Hadoop Eco System: HDFS, MapReduce, Pig, Hive, Sqoop, Flume, Zookeeper, Oozie, Kafka, Storm, Talend, Spark, NiFi, Impala, Solr, Avro, and Crunch.

Programming languages: Java, Python, Scala.

No SQL Databases: HBase, Cassandra, MongoDB

Databases: Oracle, SQL Server, PostgreSQL

Web Technologies: HTML, JQuery, Ajax, CSS, JavaScript, JSON, XML.

Business Intelligence Tools: Tableau, Jasper reports.

Testing: Hadoop Testing, Hive Testing, MRUnit.

OperatingSystems: Linux Red Hat/Ubuntu/CentOS, Windows 10/8.1/7/XP.

Hadoop Distributions: Cloudera Enterprise, Horton Works.

Technologies and Tools: Servlets, JSP, Spring, Web Services, Hibernate, Maven, GitHub.

Application Servers: Tomcat, JBoss

IDE’s: Eclipse, Net Beans, IntelliJ

PROFESSIONAL EXPERIENCE:

Confidential, Overland Park, KS

Hadoop Developer

Responsibilities: 

  • Responsible for building scalable distributed data solutions using Hadoop cluster environmentwith Hortonworks distribution on 110 data nodes.
  • Worked on Kafka REST API to collect and load the data on Hadoop file system and also used sqoop to load the data from relational databases.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka and Persists into Cassandra.
  • Started using apache NiFi to copy the data from local file system to HDFS.
  • Developed Spark scripts by writing custom RDDs in Scala for data transformations and perform actions on RDDs.
  • Worked with Avro , ORCfile formats and compression techniques like LZO .
  • Used Hive to form an abstraction on top of structured data resides in HDFS and implemented Partitions , Dynamic Partitions, Buckets on HIVE tables.
  • Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala .
  • Worked on migrating MapReduce programs into Spark transformations using Scala. 
  • Designed, developed data integration programs in a Hadoop environment with NoSQL data store Cassandra for data access and analysis.
  • UsingJob management scheduler apache Oozie to execute the workflow.
  • Using Ambari to monitor node’s health and status of the jobs in Hadoop clusters.
  • Worked on Tableau to build customized interactive reports, worksheets and dashboards.
  • Implemented Kerberos for strong authentication to provide data security.
  • Worked on apache Solr for indexing and load balanced querying to search for specific data in larger datasets.
  • Involved in performance tuning of spark jobs using Cache and using complete advantage of cluster environment.

Environment: Hadoop , HDFS, Spark, Scala, Kafka, Hive, Sqoop, Ambari, Solr, Oozie, Cassandra, Tableau,Jenkins, Bit bucket,Hortonworks and Red Hat Linux.

Confidential, Kansas City, MO 

Hadoop Developer

Responsibilities:

  • Design and develop analytic systems to extract meaningful data from large scale structured and unstructured health data.
  • Created Sqoop jobs to populate data present inrelational databases to hive tables.
  • Developed UDF’s in java for enhancing functionalities of Pig and Hive scripts.
  • Solved performance issues in Pig and Hive scripts with deep understanding in joins, groups and aggregations and how these jobs doestranslated into MapReduce jobs.
  • Involved in creating Hive external tables, loading data, and writing Hive queries.
  • Developed the processed data in HBase for faster querying and random access.
  • Defined job flow using Azkaban scheduler to automate the Hadoop jobs and installed zookeepers for automatic node failovers.
  • Managing and reviewing Hadoop log files to find the source for job failures and debugging the scripts for code optimization.
  • Developed complex MapReduce Programs to analyse data that exists on the cluster.
  • Developed the processes to load data from server logs into HDFS using Flume and also loading from UNIX file system to HDFS.
  • Build a platform to query and display the analysis results in dashboard using Tableau .
  • Used apacheHue web interface to monitor the Hadoop cluster and run the jobs.
  • Implemented apache Sentry for role based authorization to access the data.
  • Developed Shell scripts to automate routine DBA tasks (i.e. data refresh, backups)
  • Involved in the performance tuning for Pig Scripts and Hive Queries.

Environment: HDFS, Map Reduce, Pig, Hive, Sqoop, Flume,HBase, Azkaban, Tableau,Java, Maven, Git, Cloudera, Eclipseand Shell Scripting.

Confidential, Sunnyvale, CA

Hadoop Developer

Responsibilities:

  • Worked on Hadoop Cloudera clusterof50 data nodes with Red Hat enterprise Linux installed. 
  • Involved in loading data from UNIX file system to HDFS using Shell Scripting. 
  • Importing and exporting data into HDFS from Oracle 10.2 databaseusing Sqoop
  • Developed ETL processes to load data from multiple data sources to HDFS using Sqoop, analyzing data using MapReduce , Hive and Pig Latin .
  • Developed custom UDF’s for pig scripts for cleaning unstructured data and used different joins and groups whenever required to optimize the pig scripts.
  • Created hive external tables on top of processed data to easily manage and query the data using HiveQL.
  • Involved in performance tuning of Hive Queries by implementing Dynamic Partitions, buckets in Hive to improve the performance.
  • Integrated Map Reduce with HBase to import bulk data using MR programs.
  • Used Flume to collect, aggregate and store the web log data from different sources like web servers and pushed to HDFS.
  • Wrote the Map Reduce jobs in java to parse the web logs, which are stored in HDFS and used MRUnit to test and debug MapReduce programs.
  • Implemented the workflows using Apache Oozie framework to automate tasks. 
  • Coordinated with team in resolving the issues technically as well as functionally. 

Environment: Cloudera, HDFS, MapReduce, Pig, Hive, Flume, Sqoop,HBase, Oozie, Maven, Git, Java, Pythonand Linux.

Confidential

Java Developer

Responsibilities:

  • Involved in designing and development of the project using java and J2EE technologies by following MVC architecture of which JSP’s are views and Servlets as controllers.
  • Using StarUMLdesigned network and use case diagrams to monitor the work flow.
  • Wrote server side programs to handle requests coming from different types of devices like iOS and using RESTfulWebServices.
  • Implemented design patterns like CacheManager and Factory classes to improve the performance of the application.
  • Used hibernateORM tool to store and retrieve the data from PostgreSQL database.
  • Involved in writing test cases for the application using Junit.
  • Followed the Agilesoftware development process to do this project and achieved the fast development.

Environment: JSP, Servlets, Ajax, RESTful,Hibernate, Design Patterns, StarUML,Eclipseand PostgreSQL.

Confidential

Java Developer

Responsibilities:

  • Designed and implemented the training and reports modules of the application using Servlets, JSP and Ajax.
  • Developed custom JSP tags for the application.
  • Writing queries for fetching and manipulating data using ORM software iBatis.
  • Used Quartz schedulers to run the jobs sequentially at given time.
  • Implemented design patterns like Filter, Cache Manager and Singleton to improve the performance of the application.
  • Implemented the reports module of the application using Jasper Reports to display dynamically generated reports for business intelligence.
  • Deployed the application in client’s location on Tomcat Server.

Environment:HTML, Java Script, Ajax, Servlets, JSP, iBatis, Tomcat Server, PostgreSQL, Jasper Reports.

We'd love your feedback!