We provide IT Staff Augmentation Services!

Sr. Big Data Developer Resume

SUMMARY

  • Bigdata Developer with over 8 years of Professional IT industry experience comprising of build release management, software configuration, design, development and cloud implementation.
  • Excellent knowledge in understanding Hadoop architecture, HDFS, yarn and map reduce
  • Hands on experience in writing map reduce jobs in Hadoop ecosystems including hive, pig.
  • Over 3 years of experience on AWS services along with wide and in depth understanding of each one of them.
  • Good knowledge in understanding the concepts of Partitions, bucketing in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Cognitive about designing, deploying and operating highly available, scalable and fault tolerant systems using Amazon Web Services (AWS).
  • Experienced with event - driven and scheduled AWS Lambda functions to trigger various AWS resources.
  • Experienced with installation of AWS CLI to control various AWS services through SHELL/BASH scripting.
  • Possess good knowledge in creating and launching EC2 instances using AMI’s of Linux, Ubuntu and wrote shell scripts to bootstrap instance.
  • Configured S3 buckets with various life cycle policies to archive the infrequently accessed data to storage classes based on requirement.
  • Experience in implementing Spark using Scala and python through SparkSQL for faster analyzing and processing of data.
  • Developed Spark application to convert complex nested Json to AVRO format and store it in HDFS and S3.
  • Experienced in writing complex SQL queries and scheduled tasks using Maestro scheduler, Autosys and Cron jobs.
  • Extensively worked on Apache NIFI to design, develop the data pipelines to process large set of data and configured Lookup’s for Data Validation and Integrity in HDFS and AWS CLOUD.
  • Worked with different file formats like Json, AVRO and parquet and compression techniques like snappy. NIFI ecosystem is used.
  • Experience in pushing the data from S3 to SNOWFLAKES and REDSHIFT databases.
  • Developed a spark streaming API to consume the REAL TIME messages from Kafka topic and convert the input JSON message to AVRO format and push it to S3 and then to Snowflakes and Redshift.
  • Experienced in worked on Backend database programming using SQL, PL/SQL, Stored Procedures, Functions, Macros, Indexes, Joins, Views, Packages and Database Triggers.
  • Hands on experience in application development using Java, python, RDBMS, and Linux shell scripting and python scripting.

TECHNICAL SKILLS

Programming Languages: Java, Python, Dart, R, SQL, R and PL/SQL.

Hadoop Ecosystem: NIFI, HDFS, MapReduce, Hive, AWS, Terraform, Kibana, Spark, PySpark, Kafka, Strom, Scala, Zookeeper, Oozie, Yarn, Flume.

Framework: Hadoop, Spring, TestNG and JUNIT.

Web Development Language: Flutter, AngularJS, HTML5, CSS, JavaScript, jQuery, JSP, Groovy and Grails

ETL: ABINITIO, Talend, NIFI etc.

IDE: IntelliJ Idea, Visual Studio, PyCharm, sts ide, android studio

BUILD Tools: Jenkins, Maven, ANT, Git.

Web/Application Servers: IIS, Tomcat, Apache, JBoss.

DBMS / No SQL: Snowflakes, MariaDB, Teradata, Redshift, Hive, Oracle, MySQL, HBase, Cassandra, MongoDB

PROFESSIONAL EXPERIENCE

Confidential

Sr. Big data Developer

Responsibilities:

  • Develop and analyze large scale, high speed and low latency data solutions using Big Data, Apache Hive, Apache Spark and Scala.
  • Implement Java Spring boot application to consume real time messages and write it to Postgres.
  • Created the latency tracker for Kafka topics to check the lag in the topics and throw alerts.
  • Migrate data from RDBMS to Big data using Apache Sqoop, PLSQL and into Apache Hive and HBase.
  • Design and develop ETL data flows in local machines using hive, pig, spark and Scala.
  • We are working on developing a generic framework to ingest the data to Hadoop and AWS from different servers.
  • Developed a distributed data intensive ETL solution that performs data ingestion, data enrichment, and reporting within a critical business-level SLA, which resulted in quicker yet accurate match of claims and rebates.
  • Developed daily running serverless analytics workflow in AWS cloud which processes huge data at scale, using AWS Lambda, Glue Catalog, Cloudera Altus, and AWS Step Functions.
  • Used Spark for interactive queries, processing of streaming data, and integration with popular NoSQL database for the massive volume of data.
  • Worked on complex logics and implemented in the spark Scala, created test cases for the scenario’s developed.
  • Using Spark connected to various databases like Postgres, DB2 to read and write the data.
  • Used spark to read the data from AWS s3 using boto3 libraries.
  • Worked on various formats like Avro, Json, parquet, ORC.
  • Worked with QA team and helped in automating the testing process using Spark Scala.
  • Implement a generic framework to ingest the data to Hadoop and AWS from on prem servers and databases.
  • Implement Kafka producer to stream the data from downstream.
  • Implement a utility to track the latency and job instrumentation.
  • Implement the Jenkins scripts and jobs to push the code from git to the Kubernetes.
  • Automate the deployment of jobs to create images in Docker and run it on Kubernetes cloud environment.
  • Analysis of business requirements by understanding the existing process and prepare design documents for applications.

Confidential, Riverwoods, IL

Big data Developer

Responsibilities:

  • Worked on creating the NIFI templates using various NIFI processors.
  • Worked on creating a generic spark converter which will be able to convert the complex nested structure data into structured Json and Avro.
  • Worked on creating the pig script which can be able to handle the fixed delimited data.
  • Designed and developed a NIFI generic real time pattern, which will capture the messages from MQ (Event Bus) and push the data to Kafka, HDFS raw, HBase.
  • We have extensively used Scala and Spark for spark convertor.
  • Worked on pushing near real time data to hive tables where we are doing mini batching.
  • Worked on building the Data Mart table balance transfer table which is used as part of the offline scanner project.
  • As part of building the data mart table, I have converted the complex SQL queries to spark transformations.
  • Successfully optimized data mart table queries and ran them in spark to decrease the load time of data mart from 12 hours to 30 minutes.
  • Designed and developed a real time NIFI process where I captured the JSON messages by subscribing to the KAFKA TOPIC, split them into multiple JSON, converted it to AVRO and pushed to hive tables in real time.
  • For adding the new attributes to real time json message I have used python script which is invoked by NIFI.
  • Used Python to add attributes to the incoming JSON from Kafka topic as per the requirements.
  • Worked on pushing the data to AWS MariaDB Pulling the data from on-prem hive tables.
  • Setup a vault connectivity form on prem to AWS MariaDB.
  • Worked on pushing the data to AWS S3 and SNOWFLAKES.
  • Worked on ABINITIO ETL tool for cataloguing the files and loading the Teradata tables.
  • Developed a Python API which subscribes to the Kafka and Modify the incoming AVRO messages schema to static to push the data to S3 and then to a snowflakes table with variant datatypes.
  • Involved in designing and deploying a multiple application utilizing almost all of the AWS stack (Including EC2, S3, RDS, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud formation.
  • Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
  • Implemented Hive Serdes like REGEX, JSON and Avro, Parquet.
  • Experience in writing Spark Applications using spark-shell, Pyspark, spark-submit., etc.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Worked on setting up and configuring AWS's EMR Clusters and Used Amazon IAM to grant fine-grained access to AWS resources to users.

Confidential

Hadoop Developer

Responsibilities:

  • Experienced in decommissioning the legacy systems.
  • Worked extensively on HIVE, SQOOP, SHELL, PIG and PYTHON.
  • Responsible for adding new eco system components, like spark, storm, flume, Knox with required custom configurations based on the requirements.
  • Installed and configured Hadoop cluster using YARN and good experience using Hadoop components like Hadoop Map Reduce, YARN, HDFS, HBase, Hive, Pig, Flume, Sqoop, and Cassandra.
  • Worked on reading multiple data formats on HDFS using Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Worked on setting up and configuring AWS EMR Clusters and Used Amazon IAM to grant fine-grained access to AWS resources to users.
  • Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.
  • Involved in designing and deploying a multiple application utilizing almost all of the AWS stack (Including EC2, S3, RDS, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud formation.
  • Installation of SPARK, KAFKA and configuring it as per the requirements.
  • Experience in writing Spark Applications using spark-shell, PySpark, spark-submit, etc.
  • Used PIG predefined functions to convert the fixed width file to delimited file
  • Used HIVE join queries to join multiple tables of a source system and load them into Elastic Search Tables.

Environment: Hive, pig, Apache Hadoop, Cassandra, Sqoop, Big Data, HBase, ZookeeperYarn, Cloudera, CentOS, No SQL, AWS, AWS S3, Unix, Windows.

Confidential

Hadoop Developer

Responsibilities:

  • Involved with the application teams to install Hadoop updates, patches and version upgrades as required.
  • Worked on analyzing, writing Hadoop MapReduce jobs using JavaAPI, Pig and Hive.
  • Created HBase tables to store variable data formats of data coming from different portfolios.
  • Involved in Configuring core-site.xml and mapred-site.xml according to the multi node cluster environment.
  • Having experience on Hadoop eco system components HDFS, MapReduce, Hive, Pig, Sqoop and HBase.
  • Experience working on installing and configuration of windows active directory.
  • Strong Knowledge on HDFS, MapReduce and NoSQL Database like HBase.
  • Experience in client side Technologies such as HTML, CSS, JavaScript, jQuery
  • Responsible for writing Hive Queries for analyzing terabytes of customer data from HBase and put the results in output file.
  • Created Server Side of application for project management using Node JS and Mongo DB.
  • Hands-on expertise in SQL reporting, development, analytics, or BI etc.
  • Relevant experience in processing flowcharts or data flow diagrams.
  • Good grasp of database concepts, structures, storage theories, practices, or principles etc.
  • The excellent problem-solving approach, analytics skills, T-SQL coding commands etc.
  • Ability to develop database models or systems in a systematic manner.
  • Expected exposure to Oracle SQL standards and Java programming language.
  • Good knowledge of data warehouse and ETL (Extraction, Transformation, and Loading) concepts.
  • Amazing communication skills and problem identification ability.

Hire Now