We provide IT Staff Augmentation Services!

Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Washington D, C

SUMMARY

  • Overall 5 years of professional experience in IT industry. Including 2 years’ experience in Hadoop development with solving problems and delivering high quality results in a fast - paced environment, one-year experience in with 2 years into Core Java based programming.
  • In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Map Reduce, Kafka, Sqoop, Hive, Pig, SparkSQL, Yarn, Hue, HCatalogs.
  • Good knowledge on the Cloudera Apache Hadoop tool.
  • Worked independently with Cloudera support for any issue/concerns with Hadoop cluster.
  • Have hands-on-experience on messaging services like JMS, Kafka, Flume.
  • Experience in NoSQL databases such as MongoDB, HBase and Elastic Search.
  • Hands-on experience in developing Hive UDF's, UDAF's, and Pig MACROS, Pig UDF's.
  • Extensive Experience in validating and cleansing the data using Hive Queries and Pig statements.
  • Experience in writing robust/reusable Hive Queries for processing and analyzing large volumes of data.
  • Read, processed and stored data parallel using the Hive Query Language.
  • Have a good knowledge on systems using spark, JAVA.
  • Strong analytical, problem solving and communicational skills with ability to work in a group or independently.
  • Used Kafka & Spark Streaming for real-time stream processing.
  • Supported Map Reduce Programs those are running on the cluster.
  • Experience in extracting source data from Sequential files, XML, JSON and other file formats and transforming and loading it into the target Data warehouse using Sqoop with Bash Scripts.
  • Experience in data processing like collecting, aggregating, moving from various sources using Kafka.
  • Used Kafka & Spark Streaming for real-time stream processing.
  • Experience in Spark for data manipulation, preparation, cleansing.
  • Hands on experience with Spark Core, Spark SQL, Spark Streaming using PySpark.
  • Used Spark-SQL to perform transformations and actions on data residing in Hive and MongoDB.
  • Worked on kerborized Hadoop cluster with 250 nodes on Cloudera distribution 5.4.5.
  • Worked on to migrate existing data to Hadoop from RDBMS (MySQL, SQL Server, and Oracle) using Sqoop.
  • Worked on external tables with proper partitions for efficiency and loaded the structured data in
  • Experience in managing and reviewing Hadoop Log files.
  • Responsible for 250+ RHEL servers in Enterprise environment. Support hardware/software's issues in Production, install and configure software's, patch install, troubleshoot a performance issue.
  • Fine tune Linux systems for better performance, modify kernel parameters to achieve optimal system performance.
  • Used JMS and created MDBs, sender and receiver and test servlets to check the results of program.
  • Experienced in Using monitoring tools such as top, sar, vmstat, iostat, Net stat to identify resource issue with Linux severs and provide recommendations.
  • Experience in development of Java applications.
  • Have a six years of work experience in Core Java developing programs.

TECHNICAL SKILLS

Big Data Ecosystems: Hadoop, MapReduce, HDFS, Yarn, hue, Hcatalog.

Ingestion Tool: Scoop, Kafka.

Databases: HBase, Mongo DB, MySQL, Oracle.

Programming Languages: Java, Spark.

Scripting Languages: HiveSQL, Pig Latin, Bash Script, XML, HTML, CSS.

Web / Application Servers: Apache, Tomcat Application Server

Operating system: Linux, Red Hat, CentOS.

Virtualization: VMware, VSphere, VMware VSphere, Vcenter.

System Monitoring tools: sar, vmstat, iostat, top, tcpdump, PS

Cloud Technologies: Amazon Web Services (AWS), EC2, EMR, VPC, RDS, Auto scaling, S3, AWS Import / Export.

PROFESSIONAL EXPERIENCE

Confidential - Washington D.C.

Hadoop Developer

Responsibilities:

  • Involved in choosing the right configurations for Hadoop.
  • Requirement gathering from the Business Partners and Subject Matter Experts.
  • Played a major role in Hadoopcluster installation, configuration and monitoring.
  • Developed data pipeline using Kafka, Spark and HBase ingest, process and store data.
  • Selected HBase database since the data is NoSQL.
  • Wrote Kafka configuration files for importing streamed log data into HBase.
  • Analyze and define researcher's strategy and determine system architecture and requirement to achieve goals.
  • Developed multiple Kafka Producers, Consumers and Zookeeper to maintain the smooth flow as per the software requirement specifications.
  • Wrote the java program to connect the Kafka with Spark Streaming using eclipse in Cloudera distribution
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HBase.
  • Wrote a Scala program to ensure that data is going to the HBase.
  • Used various Spark Transformations and Actions for cleansing the input data.
  • Developed shell scripts to generate hive create statements from the data and load the data into the table.
  • Wrote Map Reduce jobs using Java API and Pig Latin
  • Optimized Hive QL/ pig scripts by using execution engine like Spark.
  • Involved in writing custom Map-Reduce programs using java API for data processing.
  • Involved in developing a linear regression model to predict a continuous measurement for improving the observation on data developed using spark with Scala API.
  • Worked on the development in analyzation of data in spark.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
  • The hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions and bucketing, intended for efficiency.
  • Load and transform large sets of structured, semi structured data using hive.
  • Written Spark jobs in Scala to analyze the data of the customers and sales history.
  • Involved in designing the row, key in HBase to store Text, JSON, Parquet and Avronformat files to create schema for HBase tables,
  • Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Develop Hive queries for the analysts.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre- processing with Pig.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Involved in performing the analytics and visualization for the data from the logs and estimate the error rate and study the probability of future errors using regressing models.
  • Used Kafka to patch up a customer activity taking after pipeline as a course of action of steady appropriate subscribe supports.
  • Exported the analyzed data in HBase to the Oracle using Sqoop for visualization and to generate reports for the BI team.

Environment: Hadoop, Cloudera, HDFS, pig, Hive, Kafka, Sqoop, Spark, Scala, HBase, MySQL, Oozie, Shell Scripting, Linux Red Hat, Java.

Confidential - Durham, NC.

Jr. Hadoop Developer

Responsibilities:

  • Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
  • Created a twitter application Tp flume to fetch the data from twitter. played a major role in the implementation complex map reduce programs to perform map side joins and reduce using distributed cache.
  • Experienced in developing complex MapReduce programs against structured, and unstructured data.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Converted existing SQL queries into Hive QL queries.
  • Had experience in loading data to hive and accessed the data from hive.
  • Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, xml and json.
  • Written multiple Map Reduce programs to power data for extraction, transformation and aggregation from multiple file formats including xml, json, csv & other compressed file formats.
  • Refined the Website clickstream from data from Omniture logs and moved it into Hive.
  • Developed programs using the scripting languages like Pig t manipulate the data.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
  • PIG UDF was required to extract the information of the area from the huge data which we get from the sensors.
  • Maintained the track records of the project.
  • Created hive tables according to the company requirements.
  • Experience in working with very large data sets.
  • Build programs that leverage the parallel capabilities of Hadoop and MPP platforms
  • Involved in NoSQL database Mongo DB design, integration and implementation.
  • Loaded data into NoSQL database MongoDB.

Environment: Flume, Pig, Hive, and MongoDB database, Sqoop, and Cloudera Manager

Confidential

Hadoop Administrator

Responsibilities:

  • Collaborated with teams in Hadoop development for Cluster Planning, Hardware requirement, Server configurations, network equipment's to implement clusters in Cloudera Distributed Hadoop.
  • Involved in development of ongoing administration of Hadoop infrastructure.
  • Implemented commissioning and decommissioning of data nodes, updating the metadata of the name node, killing the unresponsive task tracker and dealing with blacklisted task trackers.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from Oracle, NoSQL and various portfolios
  • Created the derby database to store the log files generated by the hive.
  • Resolving tickets submitted by users, troubleshoot the documented errors, resolving the errors.
  • Involved in creating Hive tables and loading and analyzing data using Hive queries.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive, and Sqoop.
  • Created workflow using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Assisted in importing data to HDFS and exporting analyzed data to relational databases using Sqoop.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Automated script to monitor HDFS and HBase through cronjobs.
  • Supported code/design analysis, strategy development, and project planning.

Environment: Oozie, Sqoop, pig Latin, Sqoop, HBase, Oracle.

Confidential 

Java Developer

Responsibilities:

  • Participation in sprint planning and collaborate with product owners to identify and prioritize product and technical requirements.
  • Used various Core Java techniques like Exception Handling, Data Structuresand Collections toimplement various features and enhancements.
  • Provide architectural solutions as needed across applications involved in the development.
  • Co-ordinate multiple development teams to complete a feature
  • Developing new projects or enhancements and maintaining the existing program to support onlineapplication.
  • Periodically communicate project status to stakeholders
  • Work on Design patterns and involvement in design decisions
  • Used JMS to connect with the application in India to connect with the regional services in USA.
  • Created a sender and receiver code by using java programming.
  • Developed a Message Driven Beam in India when the customer received the courier then a message is sent to the management.
  • Developed a hibernate framework to simplify the development of java application to interact with database like Oracle

Environment: Core Java, JMS, Hibernate Framework

We'd love your feedback!