Hadoop/spark Developer Resume
PROFESSIONAL SUMMARY:
- IBM certified Hadoop Level - 2 and Spark Level-1 developer with 5 years of experience in Information Technology.
- Experienced in working with Big data, Spark, Hadoop and big data ecosystem components such as Spark SQL, HDFS, Map Reduce, Hive, Pig, Sqoop, Impala for high performance computing.
- In depth understanding and knowledge of Hadoop Architecture.
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Experience in writing Complex SQL Queries involving multiple tables inner and outer joins.
- Experience in performing ETL operations using Spark/Scala.
- Flexible with Unix/Linux and Windows environments.
- Hands-on experience in Spark, writing spark streaming, creating Datasets, and Data frames from the existing datasets to perform actions on different types of data.
- Good understanding of No SQL databases such as HBase, Mongo DB.
- Hands-on experience on Scala, Python, SQL and PLSQL.
- Capable of processing large sets of structured, semi-structured and unstructured data sets.
- Skilled on migrating the data from different databases to Hadoop HDFS and Hive using Sqoop.
- Strong programming experience in creating Packages, Procedures, Functions, Triggers using SQL and PL/SQL.
- Extensive technical experience in Oracle Applications R12/11i.
- Familiarity with Agile methodology.
- Strong SQL, ETL and data analysis skills.
- Excellent Team Player with good problem-solving approach having strong communication, leadership skills and ability to work in time constrained environment.
- Knowledge on unsupervised machine learning: Clustering k-means .
- Knowledge on supervised machine learning: Classifiers Linear Regression, Logistic Regression, k-nearest neighbour, Decision Tree .
- Knowledge on real-time data processing using Spark Streaming and Kafka.
TECHNICAL SKILLS:
Hadoop /Big Data Technologies: Hadoop, HDFS, Map Reduce, HBase, Pig, Hive, Sqoop, Spark, Kafka, Oozie, Hue, Impala, ezFlow
Shell Scripting/Programming Languages: SQL, Pig Latin, HiveQL, Python, Scala, Java
Web Technologies: HTML, XML, JSON
Databases/No SQL Databases: Oracle 9i/10g, MongoDB
Database Tools: TOAD Data point, SQL Developer
IDE Tools: IntellIJ, Jupiter Notebook, PyCharm
Operating Systems: Unix/LINUX, Windows
Code Repositories: SVN, GIT, Bit bucket
Build Tools: Maven, Gradle, Ant
Scheduling & Management System Tools: Autosys
ERP skills: Oracle Applications R12 / 11i, Accounts Receivables (AR), Accounts Payables (AP)
PROFESSIONAL EXPERIENCE:
Confidential
Hadoop/Spark Developer
Responsibilities:
- Understand and assess the needs of the business and recommend the most appropriate solution to support business function.
- Document solutions and architecture through other artifacts.
- Prepare data flow diagrams, design documents and present to key stakeholders where necessary.
- Work with the various business and functional stakeholders to understand their business strategies and requirements and produce corresponding IT roadmaps.
- Produce Current State, Future State and Transitional Architecture models.
- Ensure solution designs are strategically aligned to the Group Future State Architecture and Enterprise Function Model.
- Actively participate in Group solutions design and review sessions to guide systems teams to ensure adherence to the agreed solution architecture.
- Created a parallel branch to load the data to Hadoop using Sqoop utilities.
- Responsible for delivering data to GRA models from various systems like Oracle, Netezza and Teradata.
- Designed Asset Based Landing system (ABL) and current expected credit loss (CECL) which delivers data to Global Risk Analytics (GRA) team which run business models.
- Developed Python scripts in Spark for ETL tasks.
- Developed ETL pipelines using ezFlow as data processing framework.
- Worked in an agile environment and built a robust system using code reviews.
- Implemented Partitioning, Bucketing in Hive for better organization of the data.
- Performed advanced analytics, feature selection/extraction using Apache Spark in Scala.
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Hive, and Map Reduce) and move the data files within and outside of HDFS.
Environment: Spark, ezFlow, Hadoop, HDFS, YARN, Hive, Impala, Oracle, Netezza, Sqoop, UNIX, Python, Autosys, Teradata
Confidential
Hadoop/Spark Developer
Responsibilities:
- Ingested data into HDFS using SQOOP from various RDBMS, CSV files.
- Performed Data cleansing, transformations tasks using SPARK using SCALA.
- Data Consolidation was implemented using SPARK, HIVE to generate data in the required formats.
- ETL tasks for data repair, massaging data to identify source for audit purpose, data filtering and store back to HDFS.
- SPARK-Scala RDD s are used to transform, filter data which contains “ERROR”, “FAILURE”, “WARNING” in the log lines and then stored into HDFS.
- Worked on different data formats such as Parquet, AVRO, Sequence File formats.
- Uploaded data to Hadoop Hive and combined new tables with existing databases.
- Worked on writing Scala programs using Spark on Yarn for analyzing data.
- Worked on the use case involving NoSQL/MongoDB for faster data ingestion/update and retrieval of data.
- Worked on writing SQL queries in retrieving data from MongoDB.
Environment: Spark, Hadoop, HDFS, YARN, Hive, Impala, Pig, Oozie, MongoDB, Sqoop, Scala, Linux.
Confidential
Big data Engineer
Responsibilities:
- Designed and developed Big Data analytics platform for processing data using Hadoop, Hive and Pig.
- Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive structured data.
- Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
- Worked hands on with ETL process and Involved in the development of the Hive scripts for extraction, transformation and loading of data into other data warehouses.
- Load the aggregate data into a relational database for reporting, dash boarding and ad-hoc analyses.
- Partitioning and Bucketing techniques in hive to improve the performance.
- Designing data model on Hive and optimize Hive queries.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Environment: Hadoop, Hive, Pig, Sqoop, Zookeeper, HDFS
Confidential
Oracle Application Technical Developer
Responsibilities:
- Involved as a developer in Custodial Project.
- Custodial project is designed to perform the disbursement processing and reconciliation activities for the clients.
- Developed the Technical Design (MD070) documents for all the enhancements which I was involved based on the Functional specifications (MD050).
- Created inbound interface to interface the check request details into Oracle AP .
- Created the Trigger which will fire when the payment batch is confirmed.
- Created the outbound to send the payment batch details.
- Created XML Publisher Report to send the outstanding check details.
- Involved as a developer in R12 Up gradation Project (Refunds).
Environment: Oracle Applications R12, Oracle Payables, PL/SQL, XML Publisher, TOAD.
Confidential
Software Developer Intern
Responsibilities:
- Designed and developed quick solutions using design patterns.
- Analyzed and developed solutions for legacy projects by debugging issues.
- Supported production team by analyzing and providing requested information using PLSQL.
- Gathered requirements by interacting with various team members to integrate services.
- Created parsers for quick analysis of data and data extraction (XML & JSON).
Environment: Java1.6, IntellIj, Oracle11i, SQL, PLSQL