- Overall 8+ years of experience indesigning, implementing and supporting Big Data applications for various ETL data warehousing projects using tools like Apache Spark, Jenkins, Azkaban, Cassandra, Databricks.
- Have hands - on with Hadoop Technologies like Hive, Sqoop, Kafka, Zookeeper, Apache Spark.
- Expertise in understanding the data requirements for designing and implementing optimal enterprise applications.
- Solutioned and architected the most optimized data pipelines for clients that reduced time-to-process latency by 60%.
- Initiated and lead a team of cross functional engineers to bring functional changes for on-time product delivery.
- Implemented highly efficient data pipelines that were capable of processing large sets of Structured, Semi - structured and Unstructured datasets and supporting Bigdata applications.
- Experience in NoSQL databases like Mongo DB and Cassandra.
- Hands on experience in setting up the Apache Spark, Cassandra, MongoDB infrastructure.
- Designed and developed proofs-of-concept Machine Learning application using MLlib libraries.
- Solutioned and designed the infrastructure for most efficient usage of Apache Spark clusters in Dockers
- Experience in Apache Spark cluster and streams processing using Spark Streaming.
- Designed the Continuous Deployment and Continuous Integration pipelines for various teams.
- Conducted Cross-team sessions on best practices of using the data pipelines.
- Excellent leadership, problem solving and time management skills.
- Polyglot abilities with experiences in Scala, Java, Shell Scripting, Python and R.
Programming Languages: Scala, Java, Shell Scripting, Python, R
Database Management Systems: Cassandra, MongoDB, Oracle, Hive
Query Languages: CassandraQL, ANSI SQL, HiveQL
Big Data Technologies: Spark Framework, Azkaban, Presto, Databricks, Tableau, Machine Learning, Neo4J, Kafka, Hadoop Map Reduce, PIG, HBase
Cloud Technologies: Amazon AWS (Novice), Microsoft Azure (Novice)
Build & Deploy Tools: Maven, Gradle, Jenkins
Hardware: Backup and Recovery Management, Installing and configuring the peripherals, components and drivers, LAN/Router setup, VPN setup
Orchestration/ DevOps: Dockers, Jenkins
Confidential, Minneapolis, MN
- Architected an optimized and efficient plug-and-go framework in Scala to facilitate most of the heavy ETL operations in Spark.
- Designed the ETL processes to extract secure data in xml, csv, and fixed width formats from legacy mainframe systems for hydrating the AWS S3 data lake.
- Developed common utility packages to facilitate custom job auditing, sending emails as a failure alert mechanism, date-time conversion and a custom write data to file feature.
- Solutioned an out-of-box design to reduce the data transfer latency by ~20%.
- Assisted the Dev-Ops teams to setup the separate build environments for dev, QA, pre-production and production environments in Jenkins.
- Integrated data application with a Slack Bot to send out Slack messages in an event of failure.
- Designed a pragmatic solution to maintain the sanity of Data Lake by capturing bad/corrupt data.
Confidential, Deerfield, IL
Senior Data Engineer/ Tech Lead - Big Data(Spark/ Scala/Cassandra)
- Architected and designed data ingestion pipelines from Cassandra Database using Spark 1.6.1 framework for Walgreens PhotoeCommerce.
- Set up the Spark infrastructure for WalgreenseCommerce.
- Developed scripts to automate the build and deploy activities to facilitate the QA team to execute the jobs. Presented and educated the QA and support teams on basic concepts and best practices of working with Spark and Cassandra.
- Guided teams of junior developers with best practices of programming in Scala, Shell Scripting, andGradle.
- Conducted meetings with clients to understand the requirements and daily off shore call to guide the junior developers with on-going developmentprocess.
- Developed prototypes applications as a part of proof-of-concept for use cases viz, Recommender Model, Drools Rule Engine onSpark.
- Worked on Proof of Concept for setting up Spark Clusters using Dockers.
- Recognized for optimizing the jobs and reduced run time latency from 14 hours to 3hours.
- Recognized for providing productive and effective solutions to several engineeringproblems.
Confidential, Burbank, CA
Data Engineer Consultant - Hadoop Developer
- Developed ELT Java Spring Batch Framework for validating tables in Datalake - hosted as Spark system AWS S3 as metastore.
- Developed scripting modules for scheduling jobs on Azkaban RHEL Servers inPython.
- Developed modules in SparkApplication to scrape data from external websites with improved efficiency and parallelism.
- Maintained, Organized, Structured data in Datalake for theAnalysts.
- Improved performance of querying data on Datalake from 9 minutes to 40seconds.
- Built high performant table for Disney Movie Anywhere (DMA) customerdetails.
- Automated the data ingestion process for Confidential and Confidential .
- Supported DevOps team in optimizing the new Spark 1.4.1 environment to suit ourbusiness.
- Assisted analytics team on successfully troubleshooting critical issues like missing data, inconsistent data, garbage data, and optimization and queryanalysis.
- Developed python pipelines on Databricks platform.
Confidential, Chicago, IL
- Developed and maintained applications on IIS Webserver.
- Developed Oracle database entities for student thesisdatabase.
- Developed Java Spring Batch applications for Graduate College.
- Administered and maintained databases for Office of Research, Office of Compliance, and Student Thesisdatabase to provide end user support.
- Administered and maintained database for Confidential .
- Maintained web portal for faculty, staff andstudents
Faculty, Department of ComputerScience
- Prepared lectures for undergraduatestudents.
- Developed and supervised various laboratory activities for computer sciencecourses.
- Assisted students in recruitment and outreachactivities.
- Evaluated department programs and participated incommittees.
- Monitored department programs and associatedactivities.