We provide IT Staff Augmentation Services!

Aws Big Data Consultant Resume


  • 20+ years of experience in database design, architecture development, integration talents, Big Data, and AWS Cloud resulting in stable, scalable, and secure database - driven systems.
  • 15 years of experience in Information modeling, data structures, master data management, data quality management, information governance, Big Data, data warehousing and Object/relational mapping
  • 5+ years of experience in implementing Big Data solutions and AWS Cloud implementation/ migration using cloud native architecture and cloud deign principles.
  • As a Solutions Architect, hands-on capability in building proof of concepts / Micro-service Architectures / Reusable Frameworks/ Lambda Architecture/ Continuous Integration Continuous Deployment
  • Conduct feasibility, design thinking and cost studies and recommend cost-effective AWS Cloud solutions using different hive file formats to reduce I/O and costs and improve compute efficiency.
  • Designed and delivered secure & high-availability Big-Data/Cloud solutions and implemented the design principles to segregate storage and compute separately in AWS Cloud implementations
  • Designed and delivered the data warehousing solutions using AWS Redshift and Redshift Spectrum. Also designed the custom framework to load the data from S3 buckets & Hive tables into the Redshift Cluster tables.
  • Proficient at using Spark APIs to cleanse, explore, aggregate, transform, and store data interaction with CSV, XML, JSON and Hive tables
  • Explored the features in Snowflake DB to implement Virtual warehouse using the MPP architecture
  • Implemented best practices to secure and manage the data in AWS S3 buckets and used the Spark custom framework to load the data from AWS S3 to Redshift using the field mapping config file.
  • Implemented data migration strategy to move 400+ TB Hadoop file system from on-premises to AWS cloud environment using Direct connect and Snowball
  • Demonstrated experience in Big Data platform using Hadoop ecosystems to implement the Big data solutions in batch and real-time using Map-reduce/Tez, Presto, Spark, Hive & Hbase NoSQL database.
  • Built and manage EDW solutions using AWS Redshift and improve the query performance using MPP design strategy and accessing data from S3 data lake using Redshift Spectrum external tables
  • Successfully designed and delivered secure Cloud Solution using AWS services including IAM, S3, Redshift, Spectrum, Athena, EMR, EC2, VPC & RDS
  • Demonstrated experience in Enterprise Data warehouse design using Star Schema, Extended Star Schema, and Snowflake schema dimensional models.
  • Implemented the Canonical Data Model (CDM) using the XML schema to ingest the semi-structured data into Hadoop Distributed File System (HDFS)
  • Implemented CI/CD pipeline using Jenkins to support Configuration Management to automate the deployment process on the transient EMRs.
  • Built and manage containerized architecture using Docker images for Airflow and hosted on Red Hat OpenShift container platform (Kubernetes) to connect AWS services
  • Implemented Splunk infrastructure and managed Splunk knowledge objects using system, application and TWS process logs and created dashboard for Operational and Process insights.
  • Extensive experience in developing the ER Conceptual/Logical/Physical data model for Big Data, transactional systems (OLTP) and data warehouse systems (OLAP).
  • Years of experience in the Information Technology industry including custom development as a Programmer, Systems Analyst, Solution Architect, Data Architect, Project Lead and Project Manager
  • Managed full life cycle Custom Development (SDLC) for several business applications from inception to completion using Agile and Waterfall development methodologies.
  • Highly adept at promptly and thoroughly mastering new technologies and adapting to existing corporate infrastructures, producing results immediately on hire in cross-functional IT roles.
  • Exceptional interpersonal and customer relationship skills; serve as liaison between technical teams, business representatives, client management, and users.


Big Data: CDH, HDFS, Hive, Presto, Tez, M/R, Hbase, Sqoop, OLH, DynamoDB, AWS Redshift

Cloud Technologies: AWS S3, EMR, EC2, Redshift, Spectrum, Glue, Lambda, RDS, Snowflake, Databricks

Language/Others: Python, Scala, Spark, GitHub, GitLab, Jenkins, TFS, Docker, K8s, OpenShift, Airflow, NiFi

Databases: Oracle, Informix 11.x, SQL Server, Sybase 11.x, PostgreSQL

Data Modeling Tools: ERwin, S-Designer, Oracle Designer, ER/Studio

BI Tools: Tableau, Cognos, SAS, Business Objects, Oracle Discoverer and Crystal Reports

ETL Tools: Business Objects Data Services, MS SSIS, Pentaho, Informatica



AWS Big Data Consultant


  • Architecting data pipelines/ELT solutions for extract, load and curating the dataset from multiple data sources using Apache Sqoop, Spark, Glue and load into the hive tables for the BI analytical reporting using Presto, TEZ and Spark SQL computation engines to enable faster, better, data-informed decision-making within the business
  • Implemented AWS Glue ETL solutions to handle the external files through S3 event triggers and Lambda and processed using PySpark framework and stored into the Glue Catalog tables
  • Implemented access security on Glue and Athena and migrated Hive metastore to AWS Glue Catalog
  • Implemented the Apache NiFi Data flow solution to load AWS EMR System log files from S3 buckets to Splunk using HTTP Event Collector (HEC) to produce EMR performance and resource usage metrics.
  • Implemented Splunk Enterprise solutions to collect AWS service metrics and providing analytical solutions to the Stakeholders with resource utilization, alerts and performance metrics
  • Implemented custom data load framework to convert the ORC files used for hive tables and load into the Redshift with SQLWorkbench CLI options.
  • Implemented Lambda functions using Python to collect EMR performance and CloudWatch metrics using Boto3 library and loading into Redshift table for the Performance dashboard
  • Integrated Active Directory SSO authentication with Presto cluster to access hive tables by multi-tenant line of business groups from enterprise.
  • Automating Redshift cluster spin-up and administration activities using the Cloud Formation Template with Lambda functions
  • Tuning Redshift query performance and applying best practices in table design and collecting the table statistics at the time of data load.
  • Implemented the Redshift Workload management queues and query monitoring rules based on the functional group to prioritize and allocate Redshift cluster resources.
  • Implemented workflow orchestration using Apache Airflow and used complex DAGs, Custom Operators to handle big data processing data pipeline jobs using Boto3 to invoke AWS services.
  • Responsible for implementing containerized architecture using Docker images for Airflow and hosted on OpenShift container platform (Kubernetes) to connect AWS services with scalable options.
  • Responsible for implementing Data pipeline using AWS Glue PySpark to extract the Data from JDBC sources and apply Glue built-in transformations to load into different Hive file formats.


Senior Big Data/AWS Solutions Architect


  • Responsible for implementing the Big Data Architecture solutions and developed the HDFS/Hive database design using the XML Schema and ingested the XMLs/CSV data into HDFS
  • Designed and developed the NoSQL Hbase architecture to support for identifying the version of the changes made to the Insurance Providers/Issuers Health Plan variables by the CMS’s Oversight committee
  • Successfully implemented Spark application using DataFrame to read from HDFS and analyze 200+ million Insurance Providers/Issuers Health Unified Rate Review (URR) records to measure the rate increase and provide the analytics to the stakeholders.
  • Analyze XML Schema to identify the Parent/Child root element/XPath to enforce hierarchy and develop the Logical data model to support the cardinality and assign the attributes from the schema XSD files.
  • Implement the Enterprise Data Warehouse solution on AWS Redshift and architect the table design to distribute and persist the data in the compute nodes to support query performance
  • Implementing the data transfer solutions using Apache Sqoop and Oracle Loader for Hadoop component to transfer data between Hive and Oracle schema tables.
  • Written Hive query (HQL) scripts to implement the data transformation logic (ELT) to support to the data warehouse BI reports Cognos and Tableau.
  • Implemented the solutions to automate the Data Dictionary and System Design Documentation using ErWin SDK with ActiveX scripts.
  • Participated in Technical Review Board (TRB) for approving the technical solution to support customer needs. Conducted the brown bag sessions in the area of Information Architecture and Big Data Solutions.
  • Work with Government Technical Lead (GTL), organization leaderships to provide and execute Big Data Architecture strategies based on business needs.
  • Responsible for providing thought leadership and guidance to the team members in data architecture specific areas and health insurance domain efforts for the development of the applications QHP, URR, FFM.
  • Designing and deploying dynamically scalable, highly available, fault tolerant, and reliable applications on AWS. Implemented Redshift WLM to optimize the query performance with queues.
  • Define and deploy monitoring, metrics, and logging systems on AWS using Splunk and CloudWatch
  • Expert-level knowledge of Amazon EC2, Amazon S3, Amazon RedShift, Amazon EMR, Amazon RDS, Amazon ELB, Amazon Cloud Formation, and other services of the AWS family
  • Implemented best practices to secure and manage the data in AWS S3 buckets and used the Spark custom framework to load the data from AWS S3 to Redshift using the field mapping config file.
  • Implemented near-real time data processing using StreamsSets and Spark/Databricks framework.


Senior Data Warehouse Architect


  • Designed Enterprise Data warehouse (EDW) for all 204 bankruptcy, district and appellate courts and collecting the transactional data from Case Management/Electronic Case Files (CM/ECF) system.
  • Designed and developed Data Marts for CGI’s Momentum /Judiciary Integrated Financial Management System (JIFMS) application.
  • Responsible for designing the Conceptual, Logical & Dimensional modeling and get approved from the business domain experts.
  • Responsible for creating and maintaining the metadata repository to include the source data elements, definition of an entity & attributes, ETL mapping documents, perform gap analysis, data profiling statistics
  • Participated in Data Governance (DG) council and define set of procedure and policies and implemented through Master Data Management (MDM) process.
  • Develop and execute strategies for migration of customer data to appliances, integration with ETL and BI processes and environments
  • Provided the bi-directional, peer-to-peer data replication solutions using IBM Infosphere Data Replication to support near real-time BI analytics with lambda architecture.
  • Implemented the BODS Real-Time ETL process using IBM InfoSphere Data Replication to support EDW.
  • Data Profiling and Data Modeling for the Case Management/Electronic Case Files (CM/ECF) system.
  • IBM InfoSphere CDC installation and administration for the Real time process on BODS/Oracle/Informix servers.
  • Configuring and administrating the Apache MQ server for the Real time XML queues and Administrating the BODS Real time Services and Adapter Instances
  • Develop and Implement the ETL technical architecture, data acquisition techniques and decision rules
  • Implements the Entity & Attribute standards and work with DBA to implement the strategy for the SQL query performance tuning.
  • Work with BO Architect to implement the row level security to access multiple court records to support the EDW architecture for every subject area
  • Establish and enforce technical standards and Define meta data standards for the data warehouse
  • Develop Best Practices documents for data dictionary, MDM, ETL mapping documentation


Data Warehouse / BI Architect


  • Administrator for Business Objects (BO) and Crystal Enterprise servers managing over BO Servers in multiple locations supporting several hundred users.
  • Experience in Business Objects Enterprise XI R2/3.1/ BI 4.0 installation and security, User and server administration and maintenance
  • As a Data Architect, responsible for designing the ER logical model & Dimensional modeling and get approved from the business domain experts.
  • Developed the data warehouse dimensional model for SAP Financial data mart to support the analytical and BI reports.
  • Developed several ETL SSIS packages and scheduled to extract the data from the SAP FICO module and loading the data to the SAP Financial data mart.
  • Design and developed the ER logical and physical data model for the Qatar Scholar Management System and Enterprise Scholars & Student applications
  • Designed and developed the Universes for different functional application modules and resolved the Loops, Chasm traps & Fan traps using alias and contexts.
  • Query performance improvement in the Universe design using the Keys, filters & compatible objects
  • Experience in developing the operational and analytical DeskI/WebI reports using different data providers
  • Experience in creating the Crystal reports (all versions) for the operational and analytical reports.

Hire Now