We provide IT Staff Augmentation Services!

Hadoop/big Data Developer Resume

2.00/5 (Submit Your Rating)

SUMMARY:

  • Around 8 Years of IT experience as a developer, Designer and Quality reviewer with cross platform integration experience using Hadoop, Hadoop architecture and python.
  • Good understanding of Hadoop architecture and various components such as HDFS, Job tracker, Task Tracker, Name Node and Data Node.
  • Strong understanding of Hadoop daemons and Map Reduce concepts
  • Hands on experience in installing, configuring and using Hadoop ecosystems such as Map Reduce, HIVE, PIG, SQOOP, FLUME and OOZIE.
  • Good Knowledge in loading the data from Oracle and MySQL databases to HDFS system using SQOOP (Structured Data) and FLUME (Log Files & XML).
  • Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
  • Experienced in writing custom Hive UDF’s to in corporate Business logic with Hive Queries.
  • Knowledge on analyzing data interactively using Apache Spark and Apache Zeppelin.
  • Good Knowledge in understanding the Apache Storm - Kafka pipelines.
  • Good experience in optimizing Map Reduce algorithms using Mappers, Reducers, combiners and partitioners to deliver the best results for the large datasets.
  • Extensive experience in working with application servers like WebSphere, WebLogic and Tomcat.
  • Hands on experience in developing PIG Latin Scripts and Hive Query language for data analytics
  • Strong understanding of NoSQL databases like Cassandra, HBase and MangoDB.
  • Good Knowledge in job/workflow scheduling and monitoring tools like Oozie & Zookeeper.
  • Extensive experience in design, development and support Model View Controller using Struts and Spring framework.
  • Hands on experience in application development using core SCALA, RDBMS, Linux shell scripting and developed UNIX shell scripts to automate various processes.
  • Proficiency in using BI tools like Tableau/Pentaho.
  • Experience in understanding the security requirements for Hadoop and integrate with Key Distribution Centre.
  • Extensive Experience in using database applications of RDBMS in ORACLE and MS Access, SQL Server
  • Detailed understanding of Software Development Life Cycle (SDLC) and sound knowledge of project implementation methodologies including Scrum, Waterfall and Agile.
  • Well experienced in testing huge and complex databases, Reporting and ETL tools like Informatica and Data Stage.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce, YARN, Pig, Hive, HBase, Sqoop, Solr, Flume, Oozie, ZooKeeper, Kafka.

No SQL Databases: HBase, MangoDB, Cassandra

Languages: C, Python, Pig Latin, Scala, HiveQL, Perl, Unix shell scripts

Frameworks: Struts, Spring, Spring XD,Hibernate

Operating Systems: Ubuntu Linux, Windows XP/Vista/7/10, MAC OS

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, WebSphere

Databases: Oracle, MySQL,PL/SQL,PostgreSQL

Tools: and IDE Eclipse, Anaconda, Spyder

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

Development Methodologies: Agile, Scrum, Waterfall

Highest Qualification: Bachelors

University: Vellore Institute of Technology

PROFESSIONAL EXPERIENCE:

Confidential

Hadoop/Big Data Developer

Responsibilities:

  • Worked on unstructured and semi structured data of 100 TB and with replication factor of 3 the total size is 300TB.
  • Collected and aggregated a large amount of log data using Apache Flume and staged data in HDFS for further analysis.
  • Used PIG as ETL tool for transforming, filtering, events joining and performing aggregations.
  • Scripted UDF and UDAF’s for Hive.
  • Populated HDFS and Cassandra with large amounts of data using Apache Kafka.
  • Worked on Spark stream processing to get the data into in-memory and implemented RDD transformations, actions to process as units.
  • Developed scripts and Batch job to schedule various Hadoop Programs.
  • Developed Hive Queries for creating foundation tables from stage data.
  • Used DML statements to perform different operations on Hive tables.
  • Developed job flows to automate the workflow for PIG and HIVE jobs.
  • Worked on Apache Crunch library to write, test and run Hadoop Map Reduce Pipeline jobs.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Cluster coordination services through Zookeeper.
  • Created Hive tables, dynamic partitions and buckets for sampling and worked on them using the Hive QL.
  • Extracted the data from Teradata into HDFS using Scoop.
  • Adjusted the minimum share of the maps and reducers for all the queues.
  • Used Tableau for visualizing the data reported from Hive tables.
  • Worked in using Sequence files, RC File, AVRO and HAR file formats.

Environment: Hadoop, HDFS, Apache Crunch,Map Reduce, Hive, Flume, Sqoop, Zookeeper, Kafka, Storm, Cassandra, Spark, Puppet, Storm, Linux.

Confidential

Hadoop Developer

Responsibilities:

  • Worked on Kafka-Storm on HDP 2.2 platform for real time analysis
  • Created PoC to store server log data in MangoDB to identify System Alert Metrics.
  • Implemented Hadoop framework to capture user navigation across the application to validate the user interface and provide analytic feedback/result to the UI team.
  • Computed MapReduce jobs using Java API and PIG Latin .
  • Worked on loading the data into the cluster from the dynamically generated files using Flume and sent the cluster to Relational database management systems using SCOOP .
  • Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
  • Worked on Oozie for defining and scheduling jobs to manage apache Hadoop jobs by Directed Acyclic graph (DAG) of actions with control flows.
  • Involved in creating Hive tables and working on them using HiveQL and perform data analysis using Hive and Pig.
  • Responsible for managing data from multiple sources.
  • Wrote Pig scripts to run ETL jobs on the data in HDFS for future testing.
  • Used Hive to analyze the data and checked for correlation.
  • Imported data using Sqoop to load data from MySQL to HDFS and Hive on regular basis.
  • Automatically Importing data regular basis using SQOOP to into the Hive partition by using apache Oozie.
  • Supported Map Reduce Programs that are running on the cluster.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Used Agile methodology in developing the application, which included iterative application development, weekly status report and stand up meetings.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Flume, ZooKeeper, Agile Cloudera Manager,Oozie, MySQL, SQL, Linux

Confidential

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solution using Hadoop.
  • Moved data from Oracle to HDFS and HDFS to Oracle using SQOOP.
  • Worked on loading and transforming of large sets of semi structured data using Pig Latin operations.
  • Wrote shell scripts to monitor the health checkup of Hadoop daemon services and responds according to any warning to failure conditions
  • Imported and exported data into HDFS and HIVE using SQOOP.
  • Wrote the Apache PIG scripts to process the HDFS data.
  • Clustered customers category based on the offers using Apache Hive.
  • Grouping, Aggregation and Sorting are done by using Pig and Hive which are higher level abstractions of MapReduce.
  • Experience on Pig UDF’s for pre-processing the data for analysis
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Generated workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing using Pig.
  • Extensive experience in performance tuning of oracle queries.
  • Tested and validated Hadoop Log files.
  • Created data-models for customer data using Cassandra Query language.
  • Worked in monitoring, managing and troubleshooting the Hadoop Log files.

Environment: Apace Hadoop, Hive, Cassandra, DataStax, Oracle 11g/10g, MySQL, UNIX, Oozie

Confidential

Java/J2EE Developer

Responsibilities:

  • Involved in analysis and design of the application.
  • Involved in preparing the detailed design document for the project.
  • Developed the application using J2EE architecture.
  • Involved in developing JSP forms.
  • Designed and developed web pages using HTML and JSP.
  • Designed various applets using JBuilder.
  • Designed and developed Servlets to communicate between presentation and business layer.
  • Used EJB as a middleware in developing a three-tier distributed application.
  • Developed Session Beans and Entity beans to business and data process.
  • Used JMS in the project for sending and receiving the messages on the queue.
  • Developed the Servlets for processing the data on the server.
  • The processed data is transferred to the database through Entity Bean.
  • Used JDBC for database connectivity with MySQL Server.
  • Used CVS for version control.
  • Involved in unit testing using Junit.

Environment: Core Java, J2EE, JSP, Servlets, XML, XSLT, EJB, JDBC, JBuilder 8.0, JBoss, Swing, JavaScript, JMS, HTML, CSS, MySQL Server, CVS, Windows 2000

We'd love your feedback!