- Professional Big Data Developer and Data Engineer with 4 years of technical expertise in all phases of Software development cycle (SDLC) in Sales, Marketing, Enterprise Business expertizing in Bigdata and Cloud Computing areas.
- Experience in Cloud Computing (Azure and AWS) and Big Data analytics tools like Hadoop, HDFS, Map - Reduce, Hive, HBase, Spark, Spark Streaming, Azure Cloud, Amazon EC2, DynamoDB, Amazon S3,Kafka, Flume, Avro, Sqoop, PySpark.
- Experience building Data pipeline for Realtime streaming data and Data Analytics using Azure cloud components like Azure Data Factory, HDInsight (spark cluster), Azure ML Studio, Azure stream Analytics, Azure Blob Storage, Microsoft SQL DB, Neo4j (Graph DB).
- Hands on experience on Spark with Scala, PySpark.
- Experience working in SQL Server and My SQL database. good experience working with Parquet files and parsing, validating JSON format files.
- Worked on NoSQL databases like MongoDB, Document DB and Graph Databases like neo4j .
- Worked with Flask framework for designing REST API's using Python language Flask Framework.
- Experience working on pipelines to engineer the machine learning models using Azure ML studio.
- Consumed RESTful web services and invoked them using Postman.
- Experience building microservice using AWS lambda.
- Proficient with Software development methodologies like Agile Methodologies.
- Experience in developing ETL jobs using Spark to SQL Database Systems and NoSQL database.
- Hands on experience on version control like GitHub.
- Experience working on Continuous integration and continuous deployment using Jenkins.
BigData Technologies: HDFS, Hadoop MapReduce, Zookeeper, Hive, Pig, Sqoop, Flume, Oozie, Storm, Spark, Kafka, HBase, Spark Streaming, Machine Learning.
Cloud Services: Azure blob storage, Azure ADF (both version 1 and version 2), Azure ML studio, Azure HDInsight, Azure Functions, AWS Lambda, AWS EC2, AWS API Gateway.
Languages: C, SQL, Python, Shell Scripting, Scala, R, Core Java.
Database: SQL Server 2008, Document DB, MySQL, Neo4j, Teradata.
Methodologies: Agile, Waterfall model
API Frameworks: Flask Framework (python)
Operating Systems: Windows 10/8/7/XP, Linux (Ubuntu 18.0), Unix.
- Involved in the requirements Analysis meetings and understand the requirements for the modules for development.
- Followed Agile methodology and participated in daily SCRUM meetings.
- Implemented Spark Scala and Py Spark using Data Frames, RDD, Datasets and Spark SQL for processing of data.
- Worked Azure Databricks to develop notebooks of pyspark and Scala for spark transformations.
- Implemented Py Spark jobs for Batch Analysis.
- Worked on YAML scripting to orchestrate.
- Worked on Stored Procedures to retrieve data from Database.
- Worked with XML and JSON contents.
- Worked on Database Stored Procedures, Functions, Triggers and views.
- Used GIT to track and maintain the different version of the project.
- Used Jenkins as a primary tool for implementing the CI/CD during code releases.
- Used IntelliJ as IDE tool to develop spark application and JIRA for bug and issue tracking.
Environment: YAML, Teradata, Spark scala, pyspark, JSON, PL/SQL, LOG4J, Jenkins, JIRA, Intellij, GIT.
Big Data Application Developer
- Followed the guidelines of Agile methodologies of SDLC for project management.
- Involved in requirements gathering, analysis of existing Design Documents, planning, Development and Testing of the applications.
- Extensively used PySpark to implement transformations and deployed in Azure HDInsight for ingestion and Hygiene, Identity Resolution process.
- Worked on creating a File based Data lake using Azure Blob Storage, Azure Data Factory, Azure HDInsight. used HBase to Data Storage and Retrieval .
- Worked on Implementing Business rules for deduplication of contacts by using spark transformations with spark scala and pyspark.
- Developed Graph Database nodes and relation using cypher language .
- Developed spark job using Spark Data frames to flatten Json documents to flat file.
- Developed micro services using AWS Lambda to make API calls for third party vendors like Melissa, Strikeiron.
- Developed Pipelines for batch processing and scheduling using Azure Data Factory.
- Created Azure ML studio pipeline with python module codes to execute the Naïve bayes and xgboost classification (machine learning algorithm) for persona Mapping.
- Loaded data (ingestion) from Salesforce, SAP, SQL server, Teradata to Azure Data Lake using Azure Data Factory.
- Developed Deduplication module for contacts of sales and marketing data of Confidential .
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis.
- Developed Rest API using Flask Framework (python) for the front end (UI) to consume.
- Tested Rest API calls through python scripts and Postman as well.
- Wrote Azure Automation -Runbook automation scripts to spin up/down and scale the HDInsight Cluster.
- Stored data of REST API calls to Redis database for to return the results for repetition Query.
Environment: Spark, Spark-Streaming, Spark SQL, HDFS, Hive, Apache Kafka, Sqoop, Java, Scala, Linux, Azure SQL Database, Azure ML studio Jenkins, Flask Framework, Intellij, PyCharm, Eclipse, Git, Azure Data Factory, Tableau, MySQL, Postman, Agile Methodologies, AWS lambda, Azure Cloud, Docker.