- Have proven track record of working as Data Engineer on Amazon cloud services, Bigdata/Hadoop Applications and product development.
- Well versed with Big data on AWS cloud services i.e. EC2, S3, Glue, Anthena, DynamoDB and RedShift
- Experience in job/workflow scheduling and monitoring tools like Oozie, AWS Data pipeline & Autosys
- Defined and deployed monitoring, metrics, and logging systems on AWS .
- Experience working on creating and running Docker images with multiple micro - services .
- Docker container orchestration using ECS, ALB and lambda.
- Experience with Unix/Linux systems with scripting experience and building data pipelines
- Experience on Cloud Databases and Data warehouses ( SQL Azure and Confidential Redshift/RDS )
- Played a key role in migrating Cassandra, Hadoop cluster on AWS and defined different read/write strategies
- Strong SQL development skills including writing Stored Procedures, Triggers, Views, and User Defined functions.
- Expert in developing SSIS/DTS Packages to extract, transform and load (ETL) data into data warehouse/data marts from heterogeneous sources.
- Good understanding of software development methodologies, including Agile (Scrum).
- Expertise in development of various reports, dashboards using various Tableau Visualizations
- Hands on experience with different programming languages such as Java, Python, R, SAS
- Experience in using different Hadoop eco system components such as HDFS, YARN, MapReduce, Spark, Pig, Sqoop, Hive, Impala, Hbase, Kafka, and Crontab tools.
- Expert in creating HIVE UDFs using java in order to analyze data sets for complex aggregate requirements.
- Experience in developing ETL applications on large volumes of data using different tools: MapReduce, Spark-Scala, PySpark, Spark-Sql, and Pig.
- Experience in using SQOOP for importing and exporting data from RDBMS to HDFS and Hive .
- Created user-friendly GUI interface and Web pages using HTML, CSS and JSP
- Experience on MS SQL Server, including SSRS, SSIS, and T-SQL.
Confidential, Owings Mills, MD
- Was responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
- Coordinated with team and Developed framework to generate Daily adhoc, Report’s and Extracts from enterprise data and automated using Oozie.
- Worked on cloud deployments using maven, docker and Jenkins.
- Designed and Co-ordinated with Data Science team in implementing Advanced Analytical Models in Hadoop Cluster over large Datasets.
- Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs, EC2 hosts using Cloudwatch
- Used AWS Glue for the data transformation, validate and data cleansing.
- Used python Boto 3 to configure the services AWS glue, EC2, S3
Confidential, Madison, SD
- Wrote scripts and indexing strategy for a migration to Confidential Redshift from SQL Server and MySQL databases
- Used AWS glue catalog with crawler to get the data from S3 and perform sql query operations
- Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift
- Used JSON schema to define table and column mapping from S3 data to Redshift
- Wrote indexing and data distribution strategies optimized for sub-second query response
- Developed a statistical model using artificial neural networks for ranking the students to better assist the admission process.
- Designed and developed schema data models.
- Performed Data cleaning and Preparation on XML files.
- Robotic Process Automation of data cleaning and preparation in Python.
- Built analytical dashboards to track the student records and GPAs across the board.
- Used deep learning frameworks like MXNet, Caffe 2, Tensorflow, Theano, CNTK and Keras to help clients build Deep learning models
- Participated in requirements meetings and data mapping sessions to understand business needs.
- Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Confidential Redshift for large scale data handling Millions of records every day
- Worked on Big data on AWS cloud services i.e. EC2, S3, EMR and DynamoDB
- Managed security groups on AWS, focusing on high-availability, fault-tolerance, and auto scaling using Terraform templates. Along with Continuous Integration and Continuous Deployment with AWS Lambda and AWS code pipeline.
- Implementing and Managing ETL solutions and automating operational processes.
- Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics
- Wrote various data normalization jobs for new data ingested into Redshift
- Advanced knowledge on Confidential Redshift and MPP database concepts.
- Migrated on premise database structure to Confidential Redshift data warehouse
- Was responsible for ETL and data validation using SQL Server Integration Services.
- Defined and deployed monitoring, metrics, and logging systems on AWS.
- Implemented Work Load Management (WML) in Redshift to prioritize basic dashboard queries over more complex longer-running adhoc queries. This allowed for a more reliable and faster reporting interface, giving sub-second query response for basic queries.
- Worked publishing interactive data visualizations dashboards, reports /workbooks on Tableau and SAS Visual Analytics.
- Expertise knowledge in Hive SQL, Presto SQL and Spark SQL for ETL jobs and using the right technology for the job to get done.
- Developed stored procedures in MS SQL to fetch the data from different servers using FTP and processed these files to update the tables.
- Responsible for Designing Logical and Physical data modelling for various data sources on Confidential Redshift
- Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in Redshift.
- Experience with building data pipelines in python/Pyspark/HiveSQL/Presto/BigQuery and building python DAG in Apache Airflow.
- Created ETL Pipeline using Spark and Hive for ingest data from multiple sources.
- Involved in using SAP and transactions done in SAP - SD Module for handling customers of the client and generating the sales reports.
- Coordinated with clients directly to get data from different databases.
- Worked on MS SQL Server, including SSRS, SSIS, and T-SQL.
- Designed and developed schema data models.
- Documented business workflows for stakeholder review.
- Worked on developing a product “Ecommerce” a web-based application which is relied on SAP (ERP) using Java, JSPs, HTML, CSS and Java Script.
- Developed reports for the Business using Google charts API
- Built SQL queries to build the reports for pre sales and secondary sales estimations.
- Created user-friendly GUI interface and Web pages using HTML, CSS and JSP.
- Established connection between portal and SAP using JCo Connectors.
- Designed and developed Session Beans for implementing Business logic.
- Worked on developing a product “Ezcommerce” a web-based application which is relied on SAP (ERP) Troubleshooting/Debugging the code and providing support to the client.
- Created complex SQL queries and used JDBC connectivity to access the database.