Hadoop Developer Resume Redlands,CA - Hire IT People

PROFESSIONAL SUMMARY:

Accomplished IT professional with 6+ years of experience, specializing in Big Data and testing techniques.
Enthusiast in exploring how big data analytics benefits to different industry verticals - Banking, Insurance, Healthcare, Retail, Manufacturing, Transportation etc.
Have knowledge on Automating the data flow between the components by using Apache Nifi.
Have in-depth understanding of keybig data concepts - distributed file systems, parallel processing, high availability, fault tolerance and scalability.
Worked on most of the significant big data ecosystem tools/ frameworks - Sqoop, Kafka, Hadoop, Spark, Hive, HBase, ZooKeeper& Tableau.
Acquired profound knowledge onSparkArchitecture,itskey Components -Spark Core, Spark SQL, Data Frames, Spark Streaming.
Considerable amount of experience in testing integrating and moving data from several sources/ production systems to Enterprise Data Warehouseby leveraging Informatica PowerCenter
Solid understanding of OLAP concepts and challenges, especially with large data sets.
Well versed in OLTP Data Modeling, Data warehousing concepts.
Strong expertise in data analysis, data validation, data verification, data cleansing, data completeness, data integrity, and data mismatch identification.
Well acquainted with various Software Development Life Cycle (SDLC) models and Software Testing Life Cycle (STLC).
Proficiency in Smoke Testing, Functional Testing, System Integration Testing, User Acceptance Testing and Regression Testing.
Test Lead with expertise in Test Planning, Test Design, Test Execution and Test Summary Reporting activities.
Expertise in Defect Tracking tools like JIRA and HP Quality Center/Application Life Cycle Management ( ALM ).
Completed ISTQB Basic level and Banking and Finance L1 Certification Exam.
Possess excellent people skills, rational in thinking and decision making, Quick learner, Hard worker to deliver quality results.

TECHNICAL SKILLS:

Big Data Ecosystem: HDFS, YARN, Sqoop, Kafka,MapReduce, Spark, Hive, HBase, Zookeeper

Languages: Python, SQL,PLSQL, XML, Shell Scripting, HTML, CSS

Databases: Oracle, SQL Server, Teradata,HBaseMethodologies: Agile, Waterfall, Incremental Water fall

Operating Systems: LINUX, UNIX, Windows, CentOS

Tools: IntelliJ, Eclipse, Putty, HP ALM

PROFESSIONAL EXPERIENCE:

Confidential, Redlands,CA

Hadoop Developer

Responsibilities:-

Involved in Scenario conversion from PL-SQL to HQL for EAP( Hive).
Worked with ETL transformations with Hive Query language.
Involved in creations of Dynamic solutions in creation of Merge Views (View on top of other view) which solves the problem of cluster space issues and also a dynamic solution to copy the data from L4 source to SMV layer.
Created the Automated solution for all the transformation using Shell scripting.
Worked on Data cleansing and Data Quality jobs.
Involved in Creation of Rest Services for UI application and integrated successfully all the components
Worked on code deployment tools like RLM and RTC.
Co-ordinated with Off shore team on regular basses and assigned work accordingly.

Environment: Cloudera Hadoop platform

Confidential, Pittsburgh

Hadoop Developer

Responsibilities:

Involved in collecting business requirement and designing multiple data pipelines and monitoring the data flow in Cloudera Hue UI.
Imported and Exported data from RBDMS systems to HDFS/ HBase or vice-versa using Sqoop Incremental jobs .
Stored data in Parquet file format since it utilizes less space and has high ingestion rate.
Worked on data cleansing activities like Eliminating Null values and Duplicates.
Worked on Spark Streaming and Structured Spark streaming using Apache Kafka for real time data processing.
Used Gzip and Snappy compression codecs to compress files which will be efficient for storage and processing.
Created external tables (both transactional and non-transactional) from compressed files in Hive.
Performed ad-hoc queries on structured data using HiveQL and used Partitioning, Bucketing techniques and joins with Hive for faster data access.
Used Spark API over Cloudera Hadoop YARN to process data in Hive.
Developed PySpark applications using Python utilizing Data frames and Spark SQL API for faster processing of data.
Developed highly optimized Spark applications to perform various data cleansing, validation, transformation and summarization activities according to the requirement
Developed Pyspark scripts, UDFs using Data frames/SQL/Data Sets for Data Aggregation, queries.
Data pipeline consists Spark, Hive, Kafka and Sqoop and custom-built input Adapters to ingest, transform and analyze operational data.
Designed and developed jobs to validate the data post migration such as reporting fields from source and designation systems using Spark SQL, RDDs and DataFrames/Datasets.
Worked on query performance and try to optimize it by using aggregations/optimizing techniques.
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Co-ordinated with TMS team in gathering data from Kafka producers team and writing spark-core jobs to achieve the business requirement.
Co-ordinated with offshore team on daily basis through teleconference to discuss about road blocks, issues and developments.

Environment: Cloudera Hadoop platform with 65 nodes equaling 866 TB (3-way replication)

Confidential

Hadoop Developer

Responsibilities:

Involved in Requirement Analysis -understood the source systems and complete architecture of the Hadoop cluster.
Performed data profiling, to measure the accuracy, validity and completeness of data.
DevelopedSqoopjobs with incremental import feature to ingest data into HDFS and Hive.
Ingested data from multiple Oracle DB servers.
Used Snappy to compress the ingesting data.
Involved in loading data from UNIX file system and FTP to HDFS.
Responsible to manage data coming from different sources.
Developed interactive shell scripts for scheduling various data cleansing and data loading process.
Built data cleansing and data transformation rules in Python and PySparkDataFrame
Installed and configured Hive and written HiveUDFs.
Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
Worked with Apache Spark which provides a fast and general engine for large data processing integrated with functional programming language Scala.
Imported the data from different source systemsinto Spark RDD.
Responsible for design development of Spark SQL Scripts based on Functional Specifications.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Scala.
Developed ETL Process using PYSPARK, Python, HIVE and HBASE.
Created HBase tables to store variable data formats of data coming from different Legacy systems.
Developed Oozie coordinators to schedule Hive scripts to create Data pipelines.
Performed troubleshooting of Spark jobs by analyzing and reviewing log files.
Worked with Network, database, application and BI teams to ensure data quality and availability.
Participated in setting up the schema on hive and setting up the processing framework (Spark Jobs)
Performed history data load for more than 200 tables (20 TB) from Oracle DB Server to Hive
Worked in a team environment that supports data ingestion, processing and reporting from Hadoop
Reporting was performed on Tableau with Impala query engine
Worked on No SQL.
Participated in Agile development during the end-to-end implementation of the project.

Environment: Cloudera Hadoop platform with 24 nodes equaling 254TB (3-way replication)

Confidential

Hadoop Developer

Responsibilities:

Involved in collecting business requirement and designing multiple data pipelines and monitoring the data flow in Cloudera Hue UI.
Imported and Exported data from RBDMS systems to HDFS/ HBase or vice-versa using Sqoop Incremental jobs .
Stored data in Parquet file format since it utilizes less space and has high ingestion rate.
Worked on data cleansing activities like Eliminating Null values and Duplicates.
Worked on Spark Streaming and Structured Spark streaming using Apache Kafka for real time data processing.
Used Gzip and Snappy compression codecs to compress files which will be efficient for storage and processing.
Created external tables (both transactional and non-transactional) from compressed files in Hive.
Performed ad-hoc queries on structured data using HiveQL and used Partitioning, Bucketing techniques and joins with Hive for faster data access.

Confidential

QA Tester

Responsibilities:

Analyzed Functional Requirement and Business Requirement Documents to get a better understanding of the system on both functional and business perspectives.
Involved in effective implementation of Test plan and Test Procedures.
Involved in preparing Test Scenarios and Test Cases based on business requirement documents.
Study of Change Requests and preparing Test Cases.
Performed extensive Manual testing using HP Quality Center to develop and execute the test scenarios/test cases and logging the defects.
Prepared Requirement Traceability Matrix (RTM) to trace test cases and functional requirements.
Created SQL queries to retrieve data from database to validate the input data.
Prepared Test data for the inputs of the test cases.
Preparing Suggestion Documents to improve the quality of the application.
Communication with the Test Lead / Test Manager.
Conducting Review Meetings within the Team.
Involved in Regression Testing after each build of the application.
Responsible for updating and maintaining the Quality Center for all the defects found during functional and regression testing and follow up the bug life cycle.
Analyzed the performance based on the reports generated.
Attend daily stand up meetings.
Re-tested the application for every new build and attended client calls, status meetings.
Provided the inputs in retrospective meetings.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Redlands, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship