We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

5.00/5 (Submit Your Rating)

PROFESSIONAL SUMMARY:

  • IBM certified Hadoop Level - 2 and Spark Level-1 developer with 5 years of experience in Information Technology.
  • Experienced in working with Big data, Spark, Hadoop and big data ecosystem components such as Spark SQL, HDFS, Map Reduce, Hive, Pig, Sqoop, Impala for high performance computing.
  • In depth understanding and knowledge of Hadoop Architecture.
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Experience in writing Complex SQL Queries involving multiple tables inner and outer joins.
  • Experience in performing ETL operations using Spark/Scala.
  • Flexible with Unix/Linux and Windows environments.
  • Hands-on experience in Spark, writing spark streaming, creating Datasets, and Data frames from the existing datasets to perform actions on different types of data.
  • Good understanding of No SQL databases such as HBase, Mongo DB.
  • Hands-on experience on Scala, Python, SQL and PLSQL.
  • Capable of processing large sets of structured, semi-structured and unstructured data sets.
  • Skilled on migrating the data from different databases to Hadoop HDFS and Hive using Sqoop.
  • Strong programming experience in creating Packages, Procedures, Functions, Triggers using SQL and PL/SQL.
  • Extensive technical experience in Oracle Applications R12/11i.
  • Familiarity with Agile methodology.
  • Strong SQL, ETL and data analysis skills.
  • Excellent Team Player with good problem-solving approach having strong communication, leadership skills and ability to work in time constrained environment.
  • Knowledge on unsupervised machine learning: Clustering k-means .
  • Knowledge on supervised machine learning: Classifiers Linear Regression, Logistic Regression, k-nearest neighbour, Decision Tree .
  • Knowledge on real-time data processing using Spark Streaming and Kafka.

TECHNICAL SKILLS:

Hadoop /Big Data Technologies: Hadoop, HDFS, Map Reduce, HBase, Pig, Hive, Sqoop, Spark, Kafka, Oozie, Hue, Impala, ezFlow

Shell Scripting/Programming Languages: SQL, Pig Latin, HiveQL, Python, Scala, Java

Web Technologies: HTML, XML, JSON

Databases/No SQL Databases: Oracle 9i/10g, MongoDB

Database Tools: TOAD Data point, SQL Developer

IDE Tools: IntellIJ, Jupiter Notebook, PyCharm

Operating Systems: Unix/LINUX, Windows

Code Repositories: SVN, GIT, Bit bucket

Build Tools: Maven, Gradle, Ant

Scheduling & Management System Tools: Autosys

ERP skills: Oracle Applications R12 / 11i, Accounts Receivables (AR), Accounts Payables (AP)

PROFESSIONAL EXPERIENCE:

Confidential

Hadoop/Spark Developer

Responsibilities:

  • Understand and assess the needs of the business and recommend the most appropriate solution to support business function.
  • Document solutions and architecture through other artifacts.
  • Prepare data flow diagrams, design documents and present to key stakeholders where necessary.
  • Work with the various business and functional stakeholders to understand their business strategies and requirements and produce corresponding IT roadmaps.
  • Produce Current State, Future State and Transitional Architecture models.
  • Ensure solution designs are strategically aligned to the Group Future State Architecture and Enterprise Function Model.
  • Actively participate in Group solutions design and review sessions to guide systems teams to ensure adherence to the agreed solution architecture.
  • Created a parallel branch to load the data to Hadoop using Sqoop utilities.
  • Responsible for delivering data to GRA models from various systems like Oracle, Netezza and Teradata.
  • Designed Asset Based Landing system (ABL) and current expected credit loss (CECL) which delivers data to Global Risk Analytics (GRA) team which run business models.
  • Developed Python scripts in Spark for ETL tasks.
  • Developed ETL pipelines using ezFlow as data processing framework.
  • Worked in an agile environment and built a robust system using code reviews.
  • Implemented Partitioning, Bucketing in Hive for better organization of the data.
  • Performed advanced analytics, feature selection/extraction using Apache Spark in Scala.
  • Involved in developing Shell scripts to orchestrate execution of all other scripts (Hive, and Map Reduce) and move the data files within and outside of HDFS.

Environment: Spark, ezFlow, Hadoop, HDFS, YARN, Hive, Impala, Oracle, Netezza, Sqoop, UNIX, Python, Autosys, Teradata

Confidential

Hadoop/Spark Developer

Responsibilities:

  • Ingested data into HDFS using SQOOP from various RDBMS, CSV files.
  • Performed Data cleansing, transformations tasks using SPARK using SCALA.
  • Data Consolidation was implemented using SPARK, HIVE to generate data in the required formats.
  • ETL tasks for data repair, massaging data to identify source for audit purpose, data filtering and store back to HDFS.
  • SPARK-Scala RDD s are used to transform, filter data which contains “ERROR”, “FAILURE”, “WARNING” in the log lines and then stored into HDFS.
  • Worked on different data formats such as Parquet, AVRO, Sequence File formats.
  • Uploaded data to Hadoop Hive and combined new tables with existing databases.
  • Worked on writing Scala programs using Spark on Yarn for analyzing data.
  • Worked on the use case involving NoSQL/MongoDB for faster data ingestion/update and retrieval of data.
  • Worked on writing SQL queries in retrieving data from MongoDB.

Environment: Spark, Hadoop, HDFS, YARN, Hive, Impala, Pig, Oozie, MongoDB, Sqoop, Scala, Linux.

Confidential

Big data Engineer

Responsibilities:

  • Designed and developed Big Data analytics platform for processing data using Hadoop, Hive and Pig.
  • Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive structured data.
  • Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
  • Worked hands on with ETL process and Involved in the development of the Hive scripts for extraction, transformation and loading of data into other data warehouses.
  • Load the aggregate data into a relational database for reporting, dash boarding and ad-hoc analyses.
  • Partitioning and Bucketing techniques in hive to improve the performance.
  • Designing data model on Hive and optimize Hive queries.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.

Environment: Hadoop, Hive, Pig, Sqoop, Zookeeper, HDFS

Confidential

Oracle Application Technical Developer

Responsibilities:

  • Involved as a developer in Custodial Project.
  • Custodial project is designed to perform the disbursement processing and reconciliation activities for the clients.
  • Developed the Technical Design (MD070) documents for all the enhancements which I was involved based on the Functional specifications (MD050).
  • Created inbound interface to interface the check request details into Oracle AP .
  • Created the Trigger which will fire when the payment batch is confirmed.
  • Created the outbound to send the payment batch details.
  • Created XML Publisher Report to send the outstanding check details.
  • Involved as a developer in R12 Up gradation Project (Refunds).

Environment: Oracle Applications R12, Oracle Payables, PL/SQL, XML Publisher, TOAD.

Confidential

Software Developer Intern

Responsibilities:

  • Designed and developed quick solutions using design patterns.
  • Analyzed and developed solutions for legacy projects by debugging issues.
  • Supported production team by analyzing and providing requested information using PLSQL.
  • Gathered requirements by interacting with various team members to integrate services.
  • Created parsers for quick analysis of data and data extraction (XML & JSON).

Environment: Java1.6, IntellIj, Oracle11i, SQL, PLSQL

We'd love your feedback!