Hadoop/Spark Developer Resume

PROFESSIONAL SUMMARY:

IBM certified Hadoop Level - 2 and Spark Level-1 developer with 5 years of experience in Information Technology.
Experienced in working with Big data, Spark, Hadoop and big data ecosystem components such as Spark SQL, HDFS, Map Reduce, Hive, Pig, Sqoop, Impala for high performance computing.
In depth understanding and knowledge of Hadoop Architecture.
Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Experience in writing Complex SQL Queries involving multiple tables inner and outer joins.
Experience in performing ETL operations using Spark/Scala.
Flexible with Unix/Linux and Windows environments.
Hands-on experience in Spark, writing spark streaming, creating Datasets, and Data frames from the existing datasets to perform actions on different types of data.
Good understanding of No SQL databases such as HBase, Mongo DB.
Hands-on experience on Scala, Python, SQL and PLSQL.
Capable of processing large sets of structured, semi-structured and unstructured data sets.
Skilled on migrating the data from different databases to Hadoop HDFS and Hive using Sqoop.
Strong programming experience in creating Packages, Procedures, Functions, Triggers using SQL and PL/SQL.
Extensive technical experience in Oracle Applications R12/11i.
Familiarity with Agile methodology.
Strong SQL, ETL and data analysis skills.
Excellent Team Player with good problem-solving approach having strong communication, leadership skills and ability to work in time constrained environment.
Knowledge on unsupervised machine learning: Clustering k-means .
Knowledge on supervised machine learning: Classifiers Linear Regression, Logistic Regression, k-nearest neighbour, Decision Tree .
Knowledge on real-time data processing using Spark Streaming and Kafka.

TECHNICAL SKILLS:

Hadoop /Big Data Technologies: Hadoop, HDFS, Map Reduce, HBase, Pig, Hive, Sqoop, Spark, Kafka, Oozie, Hue, Impala, ezFlow

Shell Scripting/Programming Languages: SQL, Pig Latin, HiveQL, Python, Scala, Java

Web Technologies: HTML, XML, JSON

Databases/No SQL Databases: Oracle 9i/10g, MongoDB

Database Tools: TOAD Data point, SQL Developer

IDE Tools: IntellIJ, Jupiter Notebook, PyCharm

Operating Systems: Unix/LINUX, Windows

Code Repositories: SVN, GIT, Bit bucket

Build Tools: Maven, Gradle, Ant

Scheduling & Management System Tools: Autosys

ERP skills: Oracle Applications R12 / 11i, Accounts Receivables (AR), Accounts Payables (AP)

PROFESSIONAL EXPERIENCE:

Confidential

Hadoop/Spark Developer

Responsibilities:

Understand and assess the needs of the business and recommend the most appropriate solution to support business function.
Document solutions and architecture through other artifacts.
Prepare data flow diagrams, design documents and present to key stakeholders where necessary.
Work with the various business and functional stakeholders to understand their business strategies and requirements and produce corresponding IT roadmaps.
Produce Current State, Future State and Transitional Architecture models.
Ensure solution designs are strategically aligned to the Group Future State Architecture and Enterprise Function Model.
Actively participate in Group solutions design and review sessions to guide systems teams to ensure adherence to the agreed solution architecture.
Created a parallel branch to load the data to Hadoop using Sqoop utilities.
Responsible for delivering data to GRA models from various systems like Oracle, Netezza and Teradata.
Designed Asset Based Landing system (ABL) and current expected credit loss (CECL) which delivers data to Global Risk Analytics (GRA) team which run business models.
Developed Python scripts in Spark for ETL tasks.
Developed ETL pipelines using ezFlow as data processing framework.
Worked in an agile environment and built a robust system using code reviews.
Implemented Partitioning, Bucketing in Hive for better organization of the data.
Performed advanced analytics, feature selection/extraction using Apache Spark in Scala.
Involved in developing Shell scripts to orchestrate execution of all other scripts (Hive, and Map Reduce) and move the data files within and outside of HDFS.

Environment: Spark, ezFlow, Hadoop, HDFS, YARN, Hive, Impala, Oracle, Netezza, Sqoop, UNIX, Python, Autosys, Teradata

Confidential

Hadoop/Spark Developer

Responsibilities:

Ingested data into HDFS using SQOOP from various RDBMS, CSV files.
Performed Data cleansing, transformations tasks using SPARK using SCALA.
Data Consolidation was implemented using SPARK, HIVE to generate data in the required formats.
ETL tasks for data repair, massaging data to identify source for audit purpose, data filtering and store back to HDFS.
SPARK-Scala RDD s are used to transform, filter data which contains “ERROR”, “FAILURE”, “WARNING” in the log lines and then stored into HDFS.
Worked on different data formats such as Parquet, AVRO, Sequence File formats.
Uploaded data to Hadoop Hive and combined new tables with existing databases.
Worked on writing Scala programs using Spark on Yarn for analyzing data.
Worked on the use case involving NoSQL/MongoDB for faster data ingestion/update and retrieval of data.
Worked on writing SQL queries in retrieving data from MongoDB.

Environment: Spark, Hadoop, HDFS, YARN, Hive, Impala, Pig, Oozie, MongoDB, Sqoop, Scala, Linux.

Confidential

Big data Engineer

Responsibilities:

Designed and developed Big Data analytics platform for processing data using Hadoop, Hive and Pig.
Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive structured data.
Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
Worked hands on with ETL process and Involved in the development of the Hive scripts for extraction, transformation and loading of data into other data warehouses.
Load the aggregate data into a relational database for reporting, dash boarding and ad-hoc analyses.
Partitioning and Bucketing techniques in hive to improve the performance.
Designing data model on Hive and optimize Hive queries.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.

Environment: Hadoop, Hive, Pig, Sqoop, Zookeeper, HDFS

Confidential

Oracle Application Technical Developer

Responsibilities:

Involved as a developer in Custodial Project.
Custodial project is designed to perform the disbursement processing and reconciliation activities for the clients.
Developed the Technical Design (MD070) documents for all the enhancements which I was involved based on the Functional specifications (MD050).
Created inbound interface to interface the check request details into Oracle AP .
Created the Trigger which will fire when the payment batch is confirmed.
Created the outbound to send the payment batch details.
Created XML Publisher Report to send the outstanding check details.
Involved as a developer in R12 Up gradation Project (Refunds).

Environment: Oracle Applications R12, Oracle Payables, PL/SQL, XML Publisher, TOAD.

Confidential

Software Developer Intern

Responsibilities:

Designed and developed quick solutions using design patterns.
Analyzed and developed solutions for legacy projects by debugging issues.
Supported production team by analyzing and providing requested information using PLSQL.
Gathered requirements by interacting with various team members to integrate services.
Created parsers for quick analysis of data and data extraction (XML & JSON).

Environment: Java1.6, IntellIj, Oracle11i, SQL, PLSQL

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship