We provide IT Staff Augmentation Services!

Hadoop Developer/spark Developer Resume

3.00/5 (Submit Your Rating)

SUMMARY

  • Over 6 Plus years of extensive hands - on experience in IT industry and including an experience in development using Big Data/Hadoop ecosystem tools, Database and Java/J2EE technologies.
  • Experience in new Hadoop 2.0 architecture YARN and developing YARN Applications on it.
  • Good experience in processing Unstructured, Semi-structured and Structured data.
  • Thorough understanding of the HDFS, Map Reduce framework and extensive experience in developing Map Reduce Jobs.
  • Expertise on Hadoop Core Components & Environment Administration, Hive, Pig, Scoop, Oozie, Flume, Hue etc.
  • Experienced in building highly scalable Big-data solutions using Hadoop and multiple distributions i.e., Cloudera, Horton works and NoSQL platforms
  • Good Exposure on Map Reduce programming using Java, PIG Latin Scripting and Distributed Application and HDFS.
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache Cloudera distributions, Horton works and Amazon web services (AWS).
  • Hands-on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase-Hive Integration, PIG, Sqoop, Flume, kafka and knowledge of Mapper/Reduce/HDFS Framework.
  • Having Experience in Loading Tuple shaped data into Pig and Generate Normal Data into Tuples. Ability to build User-Defined Functionalities(UDFs) not available in core Hadoop.
  • Ability to move the data in and out of Hadoop RDBMS, No-SQL and UNIX from various systems using SQOOP and other traditional data movement technologies.
  • Good experienced with Hbase Schema design.
  • Experience in Hadoop Distributions like Cloudera, Horton Works, Map R Windows Azure, and Impala.
  • Maintained, audited and built new clusters for testing purposes using the Cloudera manager.
  • Implemented Cluster for NoSQL tools Cassandra, MongoDB as a part of POC to address Hbase limitations.

TECHNICAL SKILLS

Big Data Ecosystems: Hadoop, Map Reduce, HDFS, Hbase, Hive, Pig, Sqoop, Spark, Storm, Kafka, Oozie, MongoDB, Cassandra

Languages: C, Core Java, Unix, SQL, Python, C#, Scala

J2EE Technologies: Servlets, JSP, JDBC, Java Beans.

Methodologies: Agile, UML, Design Patterns (Core Java and J2EE).

NoSQL Technologies: Cassandra, MongoDB, Hbase

Operating Systems: Windows XP/10, Linux, Sandbox.

Software Package: MS Office 2010.

Tools: & Utilities: Eclipse, Net Beans, My Eclipse, SVN, Git, Maven, SOAP UI, JMX explorer, XML Spy, QC, QTP, Jira

Web Servers: WebLogic, WebSphere, Apache Tomcat.

Web Technologies: HTML, XML, JavaScript, jQuery, AJAX, SOAP, and WSDL.

PROFESSIONAL EXPERIENCE

Confidential

Hadoop Developer/Spark Developer

Responsibilities:

  • Worked on large-scale Hadoop YARN cluster for distributed data processing and analysis using Databricks Connectors, Spark core, Spark SQL, Sqoop, Pig, Hive, Impala and NoSQL databases.
  • Designed and implemented importing data to HDFS using Sqoop from different RDBMS servers.
  • Used incremental imports, delta imports on tables from Teradata having no primary keys and importing them into Hive for the transformations, aggregations.
  • Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
  • Developed and implemented hive custom UDFs involving date functions.
  • Used Impala for querying the HDFS data.
  • Created Partitioning, Bucketing, and Map side Join, Parallel execution for optimizing the hive queries decreased the time of execution from days to hours.
  • Implemented Pig scripts and used Skewed, replicated and merge Joins for performance improvements.
  • Worked in transforming data from HBase to Hive as bulk operations.
  • Developed Oozie workflows and sub-workflows with hundreds Sqoop queries, Map Reduce, Pig Scripts, Hive Queries.
  • Responsible for handling different data formats like Avro, Parquet and ORC formats.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
  • Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
  • Worked with Autosys scheduler to automate the jobs.
  • Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.

Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Oozie, Java, UNIX, Flume, Spark, Kafka, Scala, Impala, Hbase, Oracle, Cloudera Distribution, Autosys.

Confidential

Hadoop Developer

Responsibilities:

  • Concerned and well-informed on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming.
  • Developed Map-Reduce programs to get rid of irregularities and aggregate the data.
  • Implemented Hive UDF's and did performance tuning for better results
  • Developed Pig Latin Scripts to extract data from log files and store them to HDFS. Created User Defined Functions (UDFs) to pre-process data for analysis
  • Implemented optimized map joins to get data from different sources to perform cleaning operations before applying the algorithms.
  • Experience in using Sqoop to import and export the data from Oracle DB into HDFS and HIVE.
  • Implemented CRUD operations on HBase data using thrift API to get real time insights.
  • Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster for generating reports on nightly, weekly and monthly basis.
  • Used various compression codecs to effectively compress the data in HDFS.
  • Used Avro SerDe's for serialization and de-serialization and also implemented hive custom UDF’s involving date functions.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Implemented POC to migrate map reduce jobs into Spark RDD transformations.
  • Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.
  • Created final reports of analyzed data using Apache Hue and Hive Browser and generated graphs for studying by the data analytics team.

Environment: Hadoop, CDH 5.5, Map Reduce, Hive, Pig, Sqoop, Flume, HBase, Java, Spark, Oozie, Linux, UNIX

Confidential

Java Developer

Responsibilities:

  • Key responsibilities included requirements gathering, designing and developing the Java application.
  • Identified and fixed transactional issues due to incorrect exceptional handling and concurrency issues due to unsynchronized block of code.
  • Created Java application module for providing authentication to the users for using this application and to synchronize handset with the Exchange server.
  • Performed unit testing, system testing and user acceptance test.
  • Built Web applications using Struts MVC framework
  • Gathered specifications for the Library site from different departments and users of the services.
  • Developed stored procedures and Triggers in PL/SQL and Wrote SQL scripts to create and maintain the database, roles, users, tables, views, procedures and triggers.
  • Designed and implemented the UI using HTML and Java.
  • Worked on database interaction layer for insertions, updating and retrieval operations on data.

Environment: Core Java, JDBC, Struts, HTML, SQL, Oracle10g, Struts, PL/SQL, BM Rational, Eclipse IDE

Confidential

Programmer Analyst/ SQL Developer

Responsibilities:

  • Developed SQL Scripts to perform different joins, sub queries, nested querying, Insert/Update and Delete data in MS SQL database tables.
  • Experience in writing PL/SQL and in developing and implementing Stored Procedures, Packages and Triggers.
  • Experience on modeling principles, database design and programming, creating E-R diagrams and data relationships to design a database.
  • Responsible for the designing the advance SQL queries, procedure, cursor, triggers.
  • Build data connection to the database using MS SQL Server.
  • Worked on project to extract data from xml file to SQL table and generate data file reporting using SQL Server 2008.

Environment: PL/SQL, My SQL, SQL Server 2008(SSRS & SSIS), Visual studio 2000/2005, MS Excel.

We'd love your feedback!