We provide IT Staff Augmentation Services!

Bigdata Developer Resume

0/5 (Submit Your Rating)

Dallas, TexaS

SUMMARY

  • 12 years of extensive experience in Hadoop ecosystem, Bigdata, PYSpark, Spark, Hive, Impala, Sqoop, AWS, GCP, DataProc, Big Query, Data Warehousing, ETL tool, Business Intelligence, Data Analytics, Data Integration, Implementation and Maintenance of Business Intelligence and the related Database Platforms.
  • Expertise in Hadoop architecture and various components such as HDFS, YARN, High Availability, Job Tracker, Task Tracker, Name Node, Data Node, and Map Reduce programming paradigm.
  • 5+ years of experience on Apache Hadoop technologies like Hadoop distributed file system (HDFS), Map Reduce framework, Hive, Python, Sqoop, NiFI, Oozie, HBase, Spark, and Python.
  • Having hands on experience in versioning using bitbucket and GitHub
  • Extensive experience in migrating on premise Hadoop platforms to cloud solutions using AWS and GCP.
  • Good experience in writing python as ETL framework and PYSpark to process huge amount of data daily. implementing Spark using python and Spark SQL for faster processing of data.
  • Strong experience in extracting and loading data using complex business logic using Hive from different data sources and built the ETL pipelines to process Tera bytes of data daily.
  • Experienced in Hadoop stack and storage technologies, HDFS, MapReduce, Yarn, HIVE, Sqoop, Impala, spark, flume, Kafka and Oozie.
  • Having hands on experience in Application Deployment using CICD pipeline.
  • Having hands on experience on Snowflake and streaming data using Kafka & spark streaming API.
  • Hands on experience with importing and exporting data from Relational databases to HDFS, Hive using Spark.
  • Extracted the data from Oracle into HDFS using Sqoop and loaded into Hive.
  • Handled importing of data from various data sources, performed transformations using Spark and loaded data into HDFS and Hive.
  • Experienced in transporting, and processing real time event streaming using Kafka and Spark Streaming.
  • Worked in cloud data warehouses AWS Redshift, SQL Data Warehouse, and Cloud Data Integration solutions to augment the performance, productivity, and extensive connectivity to cloud and on - premises sources.
  • AWS Certified Cloud Practitioner. Extensive knowledge on AWS services like S3, ETL Glue, Redshift, EC2, RDS, Dynamo DB and others.
  • GCP DataProc with Apache Spark framework and PYSpark. deploy PYSpark code on DataProc cluster to run Cloud Data warehouse migration project.
  • AWS Lambda application development using SNS, SQS, and S3.
  • Extensive knowledge of Spark SQL, Spark Streaming for Structured data and real-time data processing.
  • Done multiple automation using Unix shell scripting and Python for files validation like empty file check, count check, column mismatch, column value check.
  • Excellent understanding of Data Warehousing concepts, Dimensional modelling, Dimension, Fact, Start schema.
  • Good Experience in support, maintenance and development projects.
  • A quick learner, committed, smart worker, self-motivated and an enthusiastic team player, good communicator with a positive attitude and an open mind towards learning
  • Working with testing teams to identify potential problems and their appropriate solutions.
  • Excellent understanding of project issues, tracking of issues, solving issues and closing issues.

TECHNICAL SKILLS

  • PYSpark
  • AWS
  • AWS EMR
  • Hadoop
  • Big Data
  • Spark
  • Cloudera
  • MapR
  • HDFS
  • MapReduce
  • Yarn
  • Hive
  • Sqoop
  • Impala
  • GCP
  • Big Query
  • DataProc
  • Control M
  • Autosys
  • Netezza
  • Teradata
  • Oracle
  • BD2 flume
  • Kafka
  • Oozie
  • Python
  • Hadoop
  • Bigdata
  • ETL DataStage
  • Snowflake
  • ETL E3
  • Hera
  • Hadoop
  • Linux
  • Windows Server
  • VMWare
  • Scala
  • Python
  • C
  • Java
  • Java Script
  • Shell Script
  • AWS Redshift
  • Oracle
  • DB2
  • Netezza
  • SQL Server
  • PL/SQL
  • SQL Developer
  • Dynamo DB
  • AWS RDS
  • Control-m
  • Autosys

PROFESSIONAL EXPERIENCE

Confidential, Dallas, Texas

Bigdata Developer

Responsibilities:

  • Cloud Data Warehouse (CDW) Working on Cloud Data Warehouse Migration project to migrate the huge amount of data from Hadoop Hive On-Prem to GCP Big Query. As part of this project, we fetch the data from hive table and copy to GCS bucket using gsutil cp and create the stage table. We do transformation, add few business columns and create BQ Target table. There is script created to do validation between Hive and BQ table to compare number of records check and row level check, column level check.
  • Carried out the following activities:
  • Understand business requirement from Business, created source to target mapping to design Data pipeline through PYSpark, Python to fetch the data from hive, HDFS.
  • Creating Data Lake for CDW application process which pulls the data from Teradata database using Sqoop import and loads the data into Hive table using PYSpark, Hive and Hadoop ecosystem.
  • Created Internal and external Hive table for historical data lake.
  • Worked in Hadoop architecture and various components such as HDFS, YARN, High Availability, Job Tracker, Task Tracker, Name Node, Data Node, and Map Reduce programming paradigm.
  • Worked with Big Data Cloud platforms Google Cloud (GCP), Big Query, Big Table, HBase, Azure, Databricks, HDInsight (Hortonworks), Kubernetes Hadoop applications in administration, configuration management, monitoring, debugging, and performance tuning.
  • Control-m job invokes shell script to run Hadoop Distcp for data copy of MapRFS to GCS bucket.
  • GCP Interconnect to facilitate data movement from On-Prem to Google Cloud.
  • MapRFS data files are staged in object storage GCS which is version able and encrypted. Table object type are pushed into GCS as similar partition structure of Hive.
  • Spark job picks GCS files and performs transformation and pushes to BQ Stage table and from BQ stage data are type casted and sent over to BQ Target tables.

Environment: PYSpark, Hadoop, Hive, Bigdata, GCP, Big Query, Hadoop, Bigdata, Hive, Sqoop, Control-M, Python, Teradata, Unix.

Confidential, Pennington, New Jersey

Bigdata Developer

Responsibilities:

  • Securities Pricing Platform (SPP) Application is designed to get the pricing details for majority of the firm’s brokerage system (GWIM & GBAM) every end-of-day by running the Merrill’s Lynch nightly batch jobs. Autosys and ETL DataStage tools used for executing the batch jobs. SPP receives, validates, and consolidates pricing from over 150 domestic & international sources both external and internal. The purpose of this project is to decommission current Securities pricing System (SPS) application running on legacy (“end of life”) architecture with high risk profile (VSE /VSAM for z/VM, PL/1 NDM). It’s re-engineering the existing mainframe application using distributed architecture/Components (PYSpark, Hadoop, Hive, Data Lake, Unix, Autosys, Spark) and the architecture are approved by EARC for scalability.
  • Carried out the following activities:
  • Involved as an ETL & Hadoop lead in gathering requirement, analyzing and developing end to end SPP applications.
  • Understand business requirement from Business, created source to target mapping to design Data pipeline through ETL DataStage and PYSpark.
  • Develop batch jobs through ETL DataStage job and load the data into Oracle database and automate this process using AutoSys jobs.
  • Creating Data Lake for SPP application process which pulls the data from Oracle database using Sqoop import and loads the data into Hive table using PYSpark, Hive and Hadoop ecosystem.
  • Deploy PYSpark application code on EMR cluster to run SPP application on UAT and Production.
  • Unload the data from Oracle SPP DB and load the data into S3 bucket and process those files along with supporting files on AWS EMR cluster using PYSpark code in SPP application.
  • Attending stand-up meeting to interact with business, offshore team and Scrum master to understand daily work.
  • Good knowledge on PYSpark, AWS, Snowflake, Hive, ETL Glue, data warehousing concepts, Sqoop as well Python for advance backend integrations.
  • Automated reports and alerts to monitor the applications, tools and services proactively.
  • Working on production issues, fixing the bug, provides permanent resolution.

Environment: PYSpark, AWS, AWS EMR, Hadoop, Bigdata, Hive, Kafka, Sqoop, DataStage Autosys, Python, Oracle, DB2, Unix.

Confidential, Plano, Texas

Bigdata Developer

Responsibilities:

  • Involved in analysis of requirements and business rules based on given documentation and work closely with tech leads and Business analysts in understanding the current system.
  • Analyze the data coming from different resources to know its schema and functionality.
  • Use Sqoop Scripts to ingest data from different RDBMS sources into Hadoop cluster (HDFS) and created Hive tables, partitions, data loading into hive tables.
  • Worked on several functions in Python Library to build spark Application, Spark SQL RDDs- Transformations, Actions, Data frames and pushed the results to the HDFS and Hive table.
  • Created Hive queries that helped marketing analysts to spot emerging trends by comparing fresh data with GPM reference tables and historic metrics.
  • Provide regular support guidance to ETL DataStage project teams on complex solution and issue resolution.
  • Understand business requirement from Business, does data profiling, prepare source to target mapping to create data modelling.
  • Skills: Hadoop Ecosystem, Hive, Spark, Sqoop, Python, Unix shell scripting, ETL E3 Framework, Data Modelling, Control-M, Netezza, Data stage 11.5,
  • LTB- Real-estate Mortgage Project is the project where we are extracting data from SQL Server as source for all the Loan Borrowers and Loading the data into data Warehouse and from Data warehouse to Data Mart based on business requirement and Subject area. also, loading the data into Hadoop (Hive DB) through ETL E3 Framework. We were using DataStage 9.1, Unix shell Scripting, Python, Oracle, Netezza, Unix, ETL framework, Hera Framework, Hadoop (Hive DB, Sqoop), python, Spark, hive, ETL tools DataStage, Unix, DB2 technologies, Unix shell scripting, Netezza, GitLab, CD.
  • Carried out the following activities:
  • Provide regular support guidance to ETL DataStage project teams on complex solution and issue resolution.
  • Involved as an ETL lead in gathering requirement, analyzing and developing end to end applications.
  • Understand business requirement from Business, does data profiling, prepare source to target mapping to create data modelling.
  • Develop DataStage job through E3 framework and automate the process using Control-m job.
  • Load the data into Hive table to maintain warehouse.
  • Created DataStage job to load the data into Dimension and Fact table to have data into warehouse.
  • Helping team member to provide technical solutions and business understanding.
  • Interact with offshore team and define process to take care of the various problems reported by users in DW Applications and get the project work done.
  • Creating Data Lake for LTB application process which pulls the data from Oracle database using Sqoop import and loads the data into Hive table using PYSpark, Hive and Hadoop ecosystem.
  • Good knowledge on PySpark, Spark, AWS, Snowflake, Hive, ETL Glue, data warehousing concepts, Sqoop as well Python for advance backend integrations.
  • Attending stand-up meeting to interact with business and Scrum master to understand daily work.
  • Helped team on-board data, create various knowledge objects, good knowledge on ETL, data warehousing concepts, Hive, Sqoop well Python for advance backend integrations.
  • Working on production issues, fixing the bug, provides permanent resolution
  • Environment: Hadoop, Bigdata, Hive, Kafka, Sqoop, DataStage, Python, DataStage 11.5, Oracle, DB2, Netezza, Unix Shell Scripting, Hive. Hera Framework, ETL E3 framework, Control-m.

Confidential

Senior DataStage Developer

Responsibilities:

  • FDW (Financial data warehousing) is the single source of data for SAP. The data in the FDW is used to calculate the revenue and remittance amounts in SAP. The FDW integrates data from multiple sources
  • And stores them in the form of a star schema. FDW is a project for EBay enterprise finance team use SAP billing to invoice and charge clients for all the services that were rendered to them.
  • Carried out the following activities:
  • Provide regular support guidance to ETL DataStage project teams on complex solution and issue resolution.
  • Involved as a ETL lead in gathering requirement, analyzing and developing end to end applications.
  • Understand business requirement from Business, does data profiling, prepare source to target mapping to create data modelling.
  • Used most of the stages in DataStage like transformer, join, merge, lookup, unstructured stage, xml stages, DB connector stages.
  • Created DataStage job to load the data into Dimension and Fact table to have data into warehouse.
  • Helping team member to provide technical solutions and business understanding.
  • Interact with onshore team and define process to take care of the various problems reported by users in DW Applications

Environment: DataStage 9.1, Autosys, Netezza, Oracle11g, Unix Shell scripting, SQL Developer

Confidential

Senior DataStage Developer

Responsibilities:

  • AIG Insurance Data Warehouse (IDW) is a consolidation point for Regions, Party, Policy and Objects (Vehicles, Vessels), that holds the atomic level transactions relating to the Party, policy and claim domains. The IDW is the enhancement and extended version of AIG-ODS also holds the claim, policy, object (e.g., vehicle, vessel) and other issue of existing party as well as a common set of reference data loaded through DataStage ETL.
  • Carried out the following activities:
  • Involved as a ETL lead in gathering requirement, analyzing and developing end to end applications.
  • Understand business requirement from Business, does data profiling, prepare source to target mapping to create data modelling.
  • Created DataStage job to load the data into Dimension and Fact table to have data into warehouse.
  • Helping team member to provide technical solutions and business understanding.
  • Helped team on-board data, create various knowledge objects, good knowledge on ETL, data warehousing concepts, Unix shell scripting for advance backend integrations.
  • Used most of the stages in DataStage like transformer, join, merge, lookup, unstructured stage, xml stages, DB connector stages.
  • Very good understanding of software development life-cycle (SDLC) process, Followed Agile scrum, Feature, story and tasks maps for dev tracking.
  • Working on production issues, fixing the bug, provides permanent resolution Environment: DataStage 8.7, Unix Shell scripting, DB2, SQL Developer

We'd love your feedback!