We provide IT Staff Augmentation Services!

Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Redlands, CA

PROFESSIONAL SUMMARY:

  • Accomplished IT professional with 6+ years of experience, specializing in Big Data and testing techniques.
  • Enthusiast in exploring how big data analytics benefits to different industry verticals - Banking, Insurance, Healthcare, Retail, Manufacturing, Transportation etc.
  • Have knowledge on Automating the data flow between the components by using Apache Nifi.
  • Have in-depth understanding of keybig data concepts - distributed file systems, parallel processing, high availability, fault tolerance and scalability.
  • Worked on most of the significant big data ecosystem tools/ frameworks - Sqoop, Kafka, Hadoop, Spark, Hive, HBase, ZooKeeper& Tableau.
  • Acquired profound knowledge onSparkArchitecture,itskey Components -Spark Core, Spark SQL, Data Frames, Spark Streaming.
  • Considerable amount of experience in testing integrating and moving data from several sources/ production systems to Enterprise Data Warehouseby leveraging Informatica PowerCenter
  • Solid understanding of OLAP concepts and challenges, especially with large data sets.
  • Well versed in OLTP Data Modeling, Data warehousing concepts.
  • Strong expertise in data analysis, data validation, data verification, data cleansing, data completeness, data integrity, and data mismatch identification.
  • Well acquainted with various Software Development Life Cycle (SDLC) models and Software Testing Life Cycle (STLC).
  • Proficiency in Smoke Testing, Functional Testing, System Integration Testing, User Acceptance Testing and Regression Testing.
  • Test Lead with expertise in Test Planning, Test Design, Test Execution and Test Summary Reporting activities.
  • Expertise in Defect Tracking tools like JIRA and HP Quality Center/Application Life Cycle Management ( ALM ).
  • Completed ISTQB Basic level and Banking and Finance L1 Certification Exam.
  • Possess excellent people skills, rational in thinking and decision making, Quick learner, Hard worker to deliver quality results.

TECHNICAL SKILLS:

Big Data Ecosystem: HDFS, YARN, Sqoop, Kafka,MapReduce, Spark, Hive, HBase, Zookeeper

Languages: Python, SQL,PLSQL, XML, Shell Scripting, HTML, CSS

Databases: Oracle, SQL Server, Teradata,HBaseMethodologies: Agile, Waterfall, Incremental Water fall

Operating Systems: LINUX, UNIX, Windows, CentOS

Tools: IntelliJ, Eclipse, Putty, HP ALM

PROFESSIONAL EXPERIENCE:

Confidential, Redlands,CA

Hadoop Developer

Responsibilities:-

  • Involved in Scenario conversion from PL-SQL to HQL for EAP( Hive).
  • Worked with ETL transformations with Hive Query language.
  • Involved in creations of Dynamic solutions in creation of Merge Views (View on top of other view) which solves the problem of cluster space issues and also a dynamic solution to copy the data from L4 source to SMV layer.
  • Created the Automated solution for all the transformation using Shell scripting.
  • Worked on Data cleansing and Data Quality jobs.
  • Involved in Creation of Rest Services for UI application and integrated successfully all the components
  • Worked on code deployment tools like RLM and RTC.
  • Co-ordinated with Off shore team on regular basses and assigned work accordingly.

Environment: Cloudera Hadoop platform

Confidential, Pittsburgh

Hadoop Developer

Responsibilities:

  • Involved in collecting business requirement and designing multiple data pipelines and monitoring the data flow in Cloudera Hue UI.
  • Imported and Exported data from RBDMS systems to HDFS/ HBase or vice-versa using Sqoop Incremental jobs .
  • Stored data in Parquet file format since it utilizes less space and has high ingestion rate.
  • Worked on data cleansing activities like Eliminating Null values and Duplicates.
  • Worked on Spark Streaming and Structured Spark streaming using Apache Kafka for real time data processing.
  • Used Gzip and Snappy compression codecs to compress files which will be efficient for storage and processing.
  • Created external tables (both transactional and non-transactional) from compressed files in Hive.
  • Performed ad-hoc queries on structured data using HiveQL and used Partitioning, Bucketing techniques and joins with Hive for faster data access.
  • Used Spark API over Cloudera Hadoop YARN to process data in Hive.
  • Developed PySpark applications using Python utilizing Data frames and Spark SQL API for faster processing of data.
  • Developed highly optimized Spark applications to perform various data cleansing, validation, transformation and summarization activities according to the requirement
  • Developed Pyspark scripts, UDFs using Data frames/SQL/Data Sets for Data Aggregation, queries.
  • Data pipeline consists Spark, Hive, Kafka and Sqoop and custom-built input Adapters to ingest, transform and analyze operational data.
  • Designed and developed jobs to validate the data post migration such as reporting fields from source and designation systems using Spark SQL, RDDs and DataFrames/Datasets.
  • Worked on query performance and try to optimize it by using aggregations/optimizing techniques.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Co-ordinated with TMS team in gathering data from Kafka producers team and writing spark-core jobs to achieve the business requirement.
  • Co-ordinated with offshore team on daily basis through teleconference to discuss about road blocks, issues and developments.

Environment: Cloudera Hadoop platform with 65 nodes equaling 866 TB (3-way replication)

Confidential

Hadoop Developer

Responsibilities:

  • Involved in Requirement Analysis -understood the source systems and complete architecture of the Hadoop cluster.
  • Performed data profiling, to measure the accuracy, validity and completeness of data.
  • DevelopedSqoopjobs with incremental import feature to ingest data into HDFS and Hive.
  • Ingested data from multiple Oracle DB servers.
  • Used Snappy to compress the ingesting data.
  • Involved in loading data from UNIX file system and FTP to HDFS.
  • Responsible to manage data coming from different sources.
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process.
  • Built data cleansing and data transformation rules in Python and PySparkDataFrame
  • Installed and configured Hive and written HiveUDFs.
  • Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Worked with Apache Spark which provides a fast and general engine for large data processing integrated with functional programming language Scala.
  • Imported the data from different source systemsinto Spark RDD.
  • Responsible for design development of Spark SQL Scripts based on Functional Specifications.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Scala.
  • Developed ETL Process using PYSPARK, Python, HIVE and HBASE.
  • Created HBase tables to store variable data formats of data coming from different Legacy systems.
  • Developed Oozie coordinators to schedule Hive scripts to create Data pipelines.
  • Performed troubleshooting of Spark jobs by analyzing and reviewing log files.
  • Worked with Network, database, application and BI teams to ensure data quality and availability.
  • Participated in setting up the schema on hive and setting up the processing framework (Spark Jobs)
  • Performed history data load for more than 200 tables (20 TB) from Oracle DB Server to Hive
  • Worked in a team environment that supports data ingestion, processing and reporting from Hadoop
  • Reporting was performed on Tableau with Impala query engine
  • Worked on No SQL.
  • Participated in Agile development during the end-to-end implementation of the project.

Environment: Cloudera Hadoop platform with 24 nodes equaling 254TB (3-way replication)

Confidential

Hadoop Developer

Responsibilities:

  • Involved in collecting business requirement and designing multiple data pipelines and monitoring the data flow in Cloudera Hue UI.
  • Imported and Exported data from RBDMS systems to HDFS/ HBase or vice-versa using Sqoop Incremental jobs .
  • Stored data in Parquet file format since it utilizes less space and has high ingestion rate.
  • Worked on data cleansing activities like Eliminating Null values and Duplicates.
  • Worked on Spark Streaming and Structured Spark streaming using Apache Kafka for real time data processing.
  • Used Gzip and Snappy compression codecs to compress files which will be efficient for storage and processing.
  • Created external tables (both transactional and non-transactional) from compressed files in Hive.
  • Performed ad-hoc queries on structured data using HiveQL and used Partitioning, Bucketing techniques and joins with Hive for faster data access.

Confidential

QA Tester

Responsibilities:

  • Analyzed Functional Requirement and Business Requirement Documents to get a better understanding of the system on both functional and business perspectives.
  • Involved in effective implementation of Test plan and Test Procedures.
  • Involved in preparing Test Scenarios and Test Cases based on business requirement documents.
  • Study of Change Requests and preparing Test Cases.
  • Performed extensive Manual testing using HP Quality Center to develop and execute the test scenarios/test cases and logging the defects.
  • Prepared Requirement Traceability Matrix (RTM) to trace test cases and functional requirements.
  • Created SQL queries to retrieve data from database to validate the input data.
  • Prepared Test data for the inputs of the test cases.
  • Preparing Suggestion Documents to improve the quality of the application.
  • Communication with the Test Lead / Test Manager.
  • Conducting Review Meetings within the Team.
  • Involved in Regression Testing after each build of the application.
  • Responsible for updating and maintaining the Quality Center for all the defects found during functional and regression testing and follow up the bug life cycle.
  • Analyzed the performance based on the reports generated.
  • Attend daily stand up meetings.
  • Re-tested the application for every new build and attended client calls, status meetings.
  • Provided the inputs in retrospective meetings.

We'd love your feedback!