Hadoop Developer Resume
Redlands, CA
PROFESSIONAL SUMMARY:
- Accomplished IT professional with 6+ years of experience, specializing in Big Data and testing techniques.
- Enthusiast in exploring how big data analytics benefits to different industry verticals - Banking, Insurance, Healthcare, Retail, Manufacturing, Transportation etc.
- Have knowledge on Automating the data flow between the components by using Apache Nifi.
- Have in-depth understanding of keybig data concepts - distributed file systems, parallel processing, high availability, fault tolerance and scalability.
- Worked on most of the significant big data ecosystem tools/ frameworks - Sqoop, Kafka, Hadoop, Spark, Hive, HBase, ZooKeeper& Tableau.
- Acquired profound knowledge onSparkArchitecture,itskey Components -Spark Core, Spark SQL, Data Frames, Spark Streaming.
- Considerable amount of experience in testing integrating and moving data from several sources/ production systems to Enterprise Data Warehouseby leveraging Informatica PowerCenter
- Solid understanding of OLAP concepts and challenges, especially with large data sets.
- Well versed in OLTP Data Modeling, Data warehousing concepts.
- Strong expertise in data analysis, data validation, data verification, data cleansing, data completeness, data integrity, and data mismatch identification.
- Well acquainted with various Software Development Life Cycle (SDLC) models and Software Testing Life Cycle (STLC).
- Proficiency in Smoke Testing, Functional Testing, System Integration Testing, User Acceptance Testing and Regression Testing.
- Test Lead with expertise in Test Planning, Test Design, Test Execution and Test Summary Reporting activities.
- Expertise in Defect Tracking tools like JIRA and HP Quality Center/Application Life Cycle Management ( ALM ).
- Completed ISTQB Basic level and Banking and Finance L1 Certification Exam.
- Possess excellent people skills, rational in thinking and decision making, Quick learner, Hard worker to deliver quality results.
TECHNICAL SKILLS:
Big Data Ecosystem: HDFS, YARN, Sqoop, Kafka,MapReduce, Spark, Hive, HBase, Zookeeper
Languages: Python, SQL,PLSQL, XML, Shell Scripting, HTML, CSS
Databases: Oracle, SQL Server, Teradata,HBaseMethodologies: Agile, Waterfall, Incremental Water fall
Operating Systems: LINUX, UNIX, Windows, CentOS
Tools: IntelliJ, Eclipse, Putty, HP ALM
PROFESSIONAL EXPERIENCE:
Confidential, Redlands,CA
Hadoop Developer
Responsibilities:-
- Involved in Scenario conversion from PL-SQL to HQL for EAP( Hive).
- Worked with ETL transformations with Hive Query language.
- Involved in creations of Dynamic solutions in creation of Merge Views (View on top of other view) which solves the problem of cluster space issues and also a dynamic solution to copy the data from L4 source to SMV layer.
- Created the Automated solution for all the transformation using Shell scripting.
- Worked on Data cleansing and Data Quality jobs.
- Involved in Creation of Rest Services for UI application and integrated successfully all the components
- Worked on code deployment tools like RLM and RTC.
- Co-ordinated with Off shore team on regular basses and assigned work accordingly.
Environment: Cloudera Hadoop platform
Confidential, Pittsburgh
Hadoop Developer
Responsibilities:
- Involved in collecting business requirement and designing multiple data pipelines and monitoring the data flow in Cloudera Hue UI.
- Imported and Exported data from RBDMS systems to HDFS/ HBase or vice-versa using Sqoop Incremental jobs .
- Stored data in Parquet file format since it utilizes less space and has high ingestion rate.
- Worked on data cleansing activities like Eliminating Null values and Duplicates.
- Worked on Spark Streaming and Structured Spark streaming using Apache Kafka for real time data processing.
- Used Gzip and Snappy compression codecs to compress files which will be efficient for storage and processing.
- Created external tables (both transactional and non-transactional) from compressed files in Hive.
- Performed ad-hoc queries on structured data using HiveQL and used Partitioning, Bucketing techniques and joins with Hive for faster data access.
- Used Spark API over Cloudera Hadoop YARN to process data in Hive.
- Developed PySpark applications using Python utilizing Data frames and Spark SQL API for faster processing of data.
- Developed highly optimized Spark applications to perform various data cleansing, validation, transformation and summarization activities according to the requirement
- Developed Pyspark scripts, UDFs using Data frames/SQL/Data Sets for Data Aggregation, queries.
- Data pipeline consists Spark, Hive, Kafka and Sqoop and custom-built input Adapters to ingest, transform and analyze operational data.
- Designed and developed jobs to validate the data post migration such as reporting fields from source and designation systems using Spark SQL, RDDs and DataFrames/Datasets.
- Worked on query performance and try to optimize it by using aggregations/optimizing techniques.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Co-ordinated with TMS team in gathering data from Kafka producers team and writing spark-core jobs to achieve the business requirement.
- Co-ordinated with offshore team on daily basis through teleconference to discuss about road blocks, issues and developments.
Environment: Cloudera Hadoop platform with 65 nodes equaling 866 TB (3-way replication)
Confidential
Hadoop Developer
Responsibilities:
- Involved in Requirement Analysis -understood the source systems and complete architecture of the Hadoop cluster.
- Performed data profiling, to measure the accuracy, validity and completeness of data.
- DevelopedSqoopjobs with incremental import feature to ingest data into HDFS and Hive.
- Ingested data from multiple Oracle DB servers.
- Used Snappy to compress the ingesting data.
- Involved in loading data from UNIX file system and FTP to HDFS.
- Responsible to manage data coming from different sources.
- Developed interactive shell scripts for scheduling various data cleansing and data loading process.
- Built data cleansing and data transformation rules in Python and PySparkDataFrame
- Installed and configured Hive and written HiveUDFs.
- Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Worked with Apache Spark which provides a fast and general engine for large data processing integrated with functional programming language Scala.
- Imported the data from different source systemsinto Spark RDD.
- Responsible for design development of Spark SQL Scripts based on Functional Specifications.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Scala.
- Developed ETL Process using PYSPARK, Python, HIVE and HBASE.
- Created HBase tables to store variable data formats of data coming from different Legacy systems.
- Developed Oozie coordinators to schedule Hive scripts to create Data pipelines.
- Performed troubleshooting of Spark jobs by analyzing and reviewing log files.
- Worked with Network, database, application and BI teams to ensure data quality and availability.
- Participated in setting up the schema on hive and setting up the processing framework (Spark Jobs)
- Performed history data load for more than 200 tables (20 TB) from Oracle DB Server to Hive
- Worked in a team environment that supports data ingestion, processing and reporting from Hadoop
- Reporting was performed on Tableau with Impala query engine
- Worked on No SQL.
- Participated in Agile development during the end-to-end implementation of the project.
Environment: Cloudera Hadoop platform with 24 nodes equaling 254TB (3-way replication)
Confidential
Hadoop Developer
Responsibilities:
- Involved in collecting business requirement and designing multiple data pipelines and monitoring the data flow in Cloudera Hue UI.
- Imported and Exported data from RBDMS systems to HDFS/ HBase or vice-versa using Sqoop Incremental jobs .
- Stored data in Parquet file format since it utilizes less space and has high ingestion rate.
- Worked on data cleansing activities like Eliminating Null values and Duplicates.
- Worked on Spark Streaming and Structured Spark streaming using Apache Kafka for real time data processing.
- Used Gzip and Snappy compression codecs to compress files which will be efficient for storage and processing.
- Created external tables (both transactional and non-transactional) from compressed files in Hive.
- Performed ad-hoc queries on structured data using HiveQL and used Partitioning, Bucketing techniques and joins with Hive for faster data access.
Confidential
QA Tester
Responsibilities:
- Analyzed Functional Requirement and Business Requirement Documents to get a better understanding of the system on both functional and business perspectives.
- Involved in effective implementation of Test plan and Test Procedures.
- Involved in preparing Test Scenarios and Test Cases based on business requirement documents.
- Study of Change Requests and preparing Test Cases.
- Performed extensive Manual testing using HP Quality Center to develop and execute the test scenarios/test cases and logging the defects.
- Prepared Requirement Traceability Matrix (RTM) to trace test cases and functional requirements.
- Created SQL queries to retrieve data from database to validate the input data.
- Prepared Test data for the inputs of the test cases.
- Preparing Suggestion Documents to improve the quality of the application.
- Communication with the Test Lead / Test Manager.
- Conducting Review Meetings within the Team.
- Involved in Regression Testing after each build of the application.
- Responsible for updating and maintaining the Quality Center for all the defects found during functional and regression testing and follow up the bug life cycle.
- Analyzed the performance based on the reports generated.
- Attend daily stand up meetings.
- Re-tested the application for every new build and attended client calls, status meetings.
- Provided the inputs in retrospective meetings.
