We provide IT Staff Augmentation Services!

Bigdata Developer Resume

Charlotte, NC


  • 0verall 9 plus years of professional experience in IT in with 5+ years of Big Data Engineer in various industries like Banking and Health.
  • Experience in building data pipelines for data collection, storage and processing of data.
  • Expert skills in HDFS, Kafka, Spark, Hive, Sqoop, MapReduce, YARN, HBase, Oozie and Zookeeper.
  • Experience in Realtime data streaming using NiFi and KAFKA.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, and MapReduce concepts
  • Strong knowledge of Pyspark and Spark SQL analytical functions and extending functionalities by writing custom UDFs.
  • Experience designing and implementing fast and efficient data acquisition using Big Data processing techniques and tools.
  • Good Experience in Data Visualization tools like Kibana and Tableau to display graphs.
  • Experience in using Amazon Web Services (AWS) in creating EC2 instances and S3 storage.
  • Realtime data streaming using Kinesis and Kinesis Data Firehose
  • ETL transformations using AWS Glue and AWS Lambda to trigger & process events.
  • Working Knowledge on MLIB in Spark using linear regression, navies Bayes and other machine learning algorithms.
  • Experience in creating the REST API’s and CRUD operations like post, put and get request using curl.
  • Knowledge in both relational databases (RDBMS) such as MySQL, PostgreSQL and NoSQL databases such as MongoDB, Cassandra.
  • Good knowledge of SQL process and experienced in building queries.
  • Knowledge in SQL database design and development in writing Constraints, Indexes, Views, Stored Procedures and Triggers using MySQL.
  • Experience in project management and Bug Tracking tool such as JIRA and Bugzilla.
  • Experience with version control tools such as GIT, GitHub and SVN.
  • Hands on Experience in Continuous Integration (CI) and Continuous Deployment (CD) using Jenkins. using Autosys and Airflow DAG’s creation and scheduling.
  • Good experience in AGILE development environment and Agile Frameworks like SCRUM.
  • Ability to handle multiple tasks to work in a team as well as independently, experienced in interacting with Business/Operations/Technology groups.


Programming & Scripting Language Web Technology: Python, Scala, PHP 7/5.6, JavaScript, HTML 5, CSS 3, JavaScript, jQuery, Ajax C/C++, JAVA, Swift

Web Server Database: LAMP Server, WAMP Server, XAMPP Server MySQL,SQLServer,MongoDB,PostgreSQLNGINX Cassandra

Hadoop Ecosystem Cluster Mgmt. & Monitoring/ Cloud Platforms: Spark, Hive, Sqoop, Oozie, Map reduce, EMR, Cloudera Manager, Horton Works AmbariFlume, Hbase Microsoft Azure, AWS

Visualization Tools Build Management Tools: Tableu, Power BI, Kyvos workbook Gradle, Maven, Apache ant D3.js and Chart.js

Scheduling Tools IDE: Oozie, Autosys, AirFlow, Eclipse, PyCharm, Atom, IntelliJ, PHPStorm Jenkins

Web Service Version Control: REST, SOAP GIT, GitHub, SVN


Confidential, Charlotte, NC

BigData Developer


  • On High availability Hadoop cluster we process the customer data and to produce the Risk Models using Kafka and Spark.
  • Developed Credit Risk models for IFRS9 regulatory requirements using pyspark
  • Implementation of IFRS9 Model execution in pyspark, integrating with Python and proprietary C++ Libraries.
  • Performance tuning of Spark Applications for Adhoc runs, attribution and sensitivity analysis.
  • Stored customer details and Transactions info to Hive for better Business analysis and Marketing.
  • Maintain log data with Kafka consumes them and process using pyspark and store the historical data to Datawarehouse Hive.
  • Based on the business requirements transform the data using Pyspark and SparkSQL to load the data to Hive.
  • Designed and developed file sourcing process which highly reduced processing time to consume data from third - party vendors.
  • Split, filter, Map, sort and Aggregate the data using Python, Spark and SQL in distributed and parallelly across the datanodes.
  • Defined SDLC and end to end CI with branching strategy and multi-lane deployments.
  • Simplified existing code base and written utilities to help fellow teammates to develop in different lane and environments.
  • Design Data ingestion pattern and planned dataflows for other envs using NiFi and Kafka.
  • For long-term and Analytics stored data to S3 and Amazon Athena for adhoc data queries.
  • Orchestrate data workflows using airflow to manage and schedule by creating DAGS using Python.

Environment: Kafka, Pyspark, Zookeeper, cloudera, EMR, Hive, SparkSQL, YARN, Ansible, RedShift, Linux, GIT, Windows 10, JIRA


Hadoop Developer


  • Implemented UDFs, UDAFs, UDTFs in java for hive to process the data that can’t be performed using Hive inbuilt functions
  • Effectively used Oozie to develop automatic workflows of Sqoop, Mapreduce and Hive jobs.
  • ETL transformations using pyspark and Spark SQL and store the data to Hive.
  • Written Shell scripts with 2 logging features to automate jobs and scheduled with Autosys
  • Deployed and extracted data using Microsoft Azure into Netezza
  • Performed regression testing for integral code releases.
  • Worked on kyvos to build the cube required for tableu and Power BI dashboard view

Confidential, Piscataway, NJ

Big data/Data Engineer


  • Scheduled all jobs in OOZIE calling shell actions.
  • Integrated Kafka with spark streaming for data extraction and transformation and created to process the data with Dstreams
  • Loaded all the data after transformation into HBASE for analytics and reporting
  • Data Visualization using D3.js to display graphs.
  • Built the CRUD operations and building REST API’s using web Technologies.


Data Analyst


  • Preparing Hive SQL Scripts, Procedures and Views to implement the Business logic
  • Written custom calculation UDF in Hive
  • Demo to Business Users and incorporating the feedback obtained
  • Discussing New or Enhancement requirements with the Business Analysts.
  • Integrating with php to generate web based reports and Dashboards

Hire Now