Hadoop/spark Developer Resume
OH
PROFESSIONAL SUMMARY:
- IBM certified Hadoop Level - 2 and Spark Level-1 developer with 4 years of experience in Information Technology.
- Experienced in working with Bigdata, Spark, Hadoop and bigdata ecosystem components such as Spark-Streaming, Spark SQL, HDFS, Map Reduce, Hive, Pig, Sqoop, Kafka for high performance computing.
- In depth understanding and knowledge of Hadoop Architecture.
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Experience in writing Complex SQL Queries involving multiple tables inner and outer joins.
- Experience in performing ETL operations using Spark/Scala.
- Flexible with Unix/Linux and Windows environments.
- Experience in Spark, writing spark streaming, creating Datasets, Data frames from the existing datasets to perform actions on different types of data.
- Good understanding of NoSQL databases such as HBase, Mongo DB.
- Hands-on experience on Scala, Python, SQL and PLSQL.
- Capable of processing large sets of structured, semi-structured and unstructured data sets.
- Skilled on migrating the data from different databases to Hadoop HDFS and Hive using Sqoop.
- Strong programming experience in creating Packages, Procedures, Functions, Triggers using SQL and PL/SQL.
- Extensive technical experience in Oracle Applications R12/11i.
- Familiarity with Agile methodology.
- Strong SQL, ETL and data analysis skills.
- Excellent Team Player with good problem-solving approach having strong communication, leadership skills and ability to work in time constrained environment.
TECHNICAL SKILLS:
Hadoop /Big Data Technologies: Hadoop, HDFS, Map Reduce, HBase, Pig, Hive, Sqoop, Spark, Kafka, Oozie
Shell Scripting/Programming Languages: SQL, Pig Latin, HiveQL / Python, Scala, Java
Web Technologies: HTML, XML, JSON
Databases/No SQL Databases: Oracle 9i/10g, PostgreSQL, MongoDB
Database Tools: TOAD, SQL Developer
IDE Tools: IntellIJ, Jupiter Notebook
Operating Systems: Unix/LINUX, Windows
Code Repositories: SVN, GIT
Tools:: Maven, Gradle, Ant
ERP skills: Oracle Applications R12 / 11i, Accounts Receivables (AR), Accounts Payables (AP)
PROESSIONAL EXPERIENCE:
Confidential, OH
Hadoop/Spark Developer
Responsibilities:
- Ingested data from various data sources into Hadoop HDFS/Hive Tables and managed data pipelines in providing DaaS (Data as Service) to business/data scientists for performing the analytics.
- Ingested data into HDFS using SQOOP from various RDBMS, CSV files.
- Performed Data cleansing, transformations tasks using SPARK using SCALA.
- Data Consolidation was implemented using SPARK, HIVE to generate data in the required formats.
- ETL tasks for data repair, massaging data to identify source for audit purpose, data filtering and store back to HDFS.
- Worked on real-time data processing using Spark Streaming and Kafka using Scala.
- SPARK-Scala RDD s are used to transform, filter data which contains “ERROR”, “FAILURE”, “WARNING” in the log lines and then stored into HDFS.
- Worked on different data formats such as Parquet, AVRO, Sequence File, Map file formats.
- Uploaded data to Hadoop Hive and combined new tables with existing databases.
- Worked on writing Scala programs using Spark on Yarn for analyzing data.
- Worked on the use case involving NoSQL/MongoDB for faster data ingestion/update and retrieval of data.
- Worked on writing SQL queries in retrieving data from MongoDB
Environment: Spark, Kafka, Hadoop, HDFS, YARN, Hive, Impala, Pig, Flume, Oozie, MongoDB, Sqoop, Scala, Linux,PySpark.
Confidential, CA
Big data Engineer
Responsibilities:
- Designed and developed Big Data analytics platform for processing data using Hadoop, Hive and Pig.
- Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive structured data.
- Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
- Worked hands on with ETL process and Involved in the development of the Hive scripts for extraction, transformation and loading of data into other data warehouses.
- Load the aggregate data into a relational database for reporting, dash boarding and ad-hoc analyses.
- Partitioning and Bucketing techniques in hive to improve the performance.
- Designing data model on Hive and optimize Hive queries.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Environment: Hadoop, Hive, Pig, Sqoop, Zookeeper, HDFS
Confidential, CA
Oracle Application Technical Developer
Responsibilities:
- Involved as a developer in Custodial Project.
- Custodial project is designed to perform the disbursement processing and reconciliation activities for the clients.
- Developed the Technical Design(MD070) documents for all the enhancements which I was involved based on the Functional specifications(MD050).
- Created inbound interface to interface the check request details into Oracle AP .
- Created the Trigger which will fire when the payment batch is confirmed.
- Created the outbound to send the payment batch details.
- Created XML Publisher Report to send the outstanding check details.
- Involved as a developer in R12 Upgradation Project (Refunds).
Environment: Oracle Applications R12, Oracle Payables, PL/SQL, XML Publisher, TOAD.
Confidential
Software Developer
Responsibilities:
- Designed and developed quick solutions using design patterns.
- Analyzed and developed solutions for legacy projects by debugging issues.
- Supported production team by analyzing and providing requested information using PLSQL.
- Gathered requirements by interacting with various team members to integrate services.
- Created parsers for quick analysis of data and data extraction (XML & JSON).
Environment: Java1.6, IntellIj, Oracle11i, SQL, PLSQL
Confidential
Software Developer Intern
Responsibilities:
- Involved as software developer in Cloud Connector Framework Project.
- Developed low cost network using Zigbee that monitor temperature.
- Zigbee based Home automation system is developed through gateway.
- Displayed all monitoring parameters at common point.
Environment: XCTU Tool, Matlab