- Extensive professional experience in full Software Development Life Cycle (SDLC), Agile Methodology and analysis, design, development, testing, implementation and maintenance in SPARK, Hadoop, Data Warehousing, Linux and Java/Scala.
- Data analysis, Data modeling and implementation of enterprise class systems spanning Big Data, Data Integration, Object Oriented programming.
- Developed Scala applications on Hadoop and Spark SQL for high - volume and real-time data processing.
- Hands-on experience working with Kusto.
- Worked on Microsoft’s internal tools like Cosmos, Kusto, iScope etc. which are known for doing ETL operations efficiently.
- Experience in writing Unit Test and Smoke Test for testing the code (modules) using Scala Test Framework.
- Good understanding of Classic Hadoop and Yarn architecture along with various Hadoop Demons such as Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, Resource Manager, Node Manager, Application Master and Containers.
- Very good experience with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN) setups.
- Good experience in writing Spark applications using Java and Scala.
- Good experience in creating PowerShell scripts for automating cluster creation and storage accounts in the production environment.
- In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming.
- Expertise in writing Spark RDD transformations, actions, Data Frames, case classes for the required input data and performed the data transformations using Spark-Core.
- Experience in Installing, Upgrading and Configuring Microsoft SQL Server.
- Experience in Migrating data from SQL Server 2008 to SQL Server 2012.
- Experience in designing and creating Tables, Views, User Created Data Types, Indexes, Stored Procedures, Cursors, Triggers and Transactions.
- Designing and developing SQL Server database structure, Stored Procedures and Triggers.
- Excellent in learning and adapting to new technologies.
- Experience in working in 24X7 Support and used to meet deadlines, adaptable to ever changing priorities.
- Proven ability to work independently and as well as in a team and motivated to face challenges and meet deadlines.
Big Data Technologies: HDFS, Hive, Spark, MapReduce, YARN, Spark-Core, Spark-Sql.
Programming Languages: .NET, C/C++, HTML, SQL, PL/SQL, and Scala.
Scripting Languages: Shell Scripting, Bash, Powershell.
Operating Systems: UNIX, Windows, LINUX
Web technologies: ASP.NET, MVC Framework
Databases: Oracle, MySQL, MS SQL.
Microsoft, Redmond, WA
Azure Big Data Engineer
- Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
- Developed Spark applications using Scala and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
- Responsible for estimating the cluster size, monitoring and troubleshooting of the Hadoop cluster.
- Used Zeppelin, Jupyter notebooks and Spark-Shell to develop, test and analyze Spark jobs before Scheduling Customized Spark jobs.
- Undertake data analysis and collaborated with down-stream, analytics team to shape the data according to their requirement.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- To meet specific business requirements wrote UDF’s in Scala and Store procedures.
- For Log analytics and for better query response used Kusto Explorer.
- Replaced the existing MapReduce programs and Hive Queries into Spark application using Scala.
- Deployed and tested (CI/CD) our developed code using Visual Studio Team Services (VSTS).
- Conducting code reviews for team members to ensure proper test coverage and consistent code standards.
- Responsible for documenting the process and cleanup of unwanted data.
- Responsible for Ingestion of Data from Blob to Kusto and maintaining the PPE and PROD pipelines.
- Expertise in creating HDInsight cluster and Storage Account with End-to-End environment for running the jobs.
- Developed Json Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Cosmos Activity.
- Hands-on experience on developing PowerShell Scripts for automation purpose.
- Created Build and Release for multiple projects (modules) in production environment using Visual Studio Team Services (VSTS).
- Experience in using ScalaTest Funsuite Framework for developing Unit Tests cases and Integration testing.
- Hands on experience in working on Spark SQL queries, Data frames, and import data from Data sources, perform transformations; perform read/write operations, save the results to output directory into HDFS.
- Involved in running the Cosmos Scripts in Visual Studio 2017/2015 for checking the diagnostics.
- Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Spark, Kafka, IntelliJ, Cosmos, Sbt, Zeppelin, YARN, Scala, SQL, Git.
SQL Server/BI Developer
- Responsible for requirement gathering from business to create package specs.
- Installing and Configuring SQL server 2008.
- Create Entity Relationship (ER) Diagrams to the proposed database.
- Generated complex stored procedures and functions for better performance and flexibility.
- Designed logical and physical database structure to facilitate analysis of data from both operational and customer perspectives.
- Extensively used Joins and sub-queries for complex queries involving multiple tables from different databases.
- Responsible for creating Databases, Tables, Cluster/Non-Cluster Index, Unique/Check Constraints Views, Stored Procedures, Triggers, Rules and Defaults.
- Creating functions and developing procedures for implementing application functionality at the database side for performance improvement.
- Built and maintained SQL scripts, indexes, and complex queries for data analysis and extraction.
- Performance tuning of SQL queries and stored procedures.
- Perform fundamental tasks related to the design, construction, monitoring, and maintenance of Microsoft SQL databases.