We provide IT Staff Augmentation Services!

Big Data Cloud Architect Resume


  • Very resourceful, high energy enterprise Big data architect offering 15+ years of expertise in architecture definition of large distributed systems, technical consulting, project management and technology implementation in Big Data solutions, Hadoop, Data Warehousing, Data Management and Application integration
  • Experienced in Big data solutions, Hadoop implementations for Banking and finance, communications, Logistic and retail.
  • 5+ years in Architecting, Administering, Designing and deploying large scale big data solution using various Hadoop eco systems and NoSql databases
  • 1.5+ years of experience on AWS Cloud Platform
  • Proficient with Apache Kafka and Apache Spark
  • Experience in installing, configuring and testing Hadoop ecosystem components. Upgraded several Hadoop cluster to next stable version
  • Experienced in installing, configuring, and administrating Hadoop cluster of major distributions
  • Proficient in Data Architecture/DW/Bigdata/Hadoop/Data Integration and BI Reporting projects with a deep focus in design, development and deployment of BI and data solutions using open source and off the shelf BI tools like Platfora, Tableau, Splunk and Hunk
  • 8+ years of experience in Data warehousing & ETL, Pulling data from various sources into Data Warehouses and Data Marts using Informatica Power Center 8.x/7.x/6.x
  • Dimensional Data Modeling of STAR, Snowflake. FACT, dimensions tables using various Data modeling tools like ERWIN, Oracle Designer. managing the Technical Specifications, Test Plans and creating project implementation documents
  • Strong Knowledge in Logical/Physical Data model with Normalized/ De - normalized Databases on both OLTP /OLAP environment using different modeling tools.
  • Recommend new technology, policies or processes to benefit the organization and improve deficiencies in the project/organization


Big Data Frameworks: Hadoop, HDFS, Ambari, Cloudera Manager, Hive, Pig, Impala, Spark(Spark Code, Spark Sql, Spark Streaming), Pyspark, MapReduce, Hcatalog, Sqoop, Oozie, Flume, Apache Kafka, Apache SOLR, Jupyter, Zeppelin

Cloud Tools: AWS (EMR, S3, EC2, Kinesis, DynamoDB, Athena, CloudFormation, VPC, Lambda, Load Balancer, Aurora, IAM, CloudWatch), CloudFoundry

NoSql: Hbase, Cassandra, DynamoDB

Distros: Apache, Cloudera Distribution, Hortonworks Distribution, Informatica BigData Edition

Data Warehousing: Snowflake, Informatica, Talend, Pentaho, SSIS

Databases: MySql, Oracle, Green Plum, Sql Server, Oracle Application DBA, SQL, PL/SQL, Sybase, DB2, MPP Teradata, Teradata Aster, Teradata Loom, Fast export, TPT

Data Integration: Apache NiFi, Streamsets, Pentaho Kettle, Talend, Attunity

Data Modelling: Erwin, Microsoft Visio, Oracle Designer 2000, Enterprise Architect (EA), Logical, Physical and Relational Modeling, ER Diagrams, Dimensional Data Modeling (Star & Snowflake Schema)

Languages: Java, Python, Shell Scripting, XML, SQL/PL - SQL, C, Pro*C, Pro*COBOL, AWK, Perl

Virtualization: VMWare, Virtual Box, Cloud Foundry

Visualization: Platfora, Splunk, Hunk, Apache Zeppelin, Tableau, Datameer, Paxata

Version Controlling: Git, Bitbucket, PVCS, CVS, Sub-version

Scheduling: Oozie, Autosys, Control-M, Crontab

Others: CI/CD, kubernetes, DbVisualizer, TOAD, SQL Work bench, Aqua Data Studio, Toad, MySql, Eclipse, XMLSpy, Spring tool suite, Java Script, VB Script



Big Data Cloud Architect


  • Participate in data integration, business intelligence (BI) and enterprise information management programs, support Design and Development of different layers in Data Lake from which users can create visualization and analytics
  • Design and Develop Streaming application in PySpark to consume Real-time Salesforce data from Kafka topic
  • Hands on experience in Talend Big Data platform for creating data model, data container, views and workflows.Used different components in talend
  • Architected and Developed data warehouse model in snowflake for over 100 datasets
  • Implemented Kafka security features using SSL and Kerberos. Used various Kafka connectors using confluent kafka
  • Stay current with emerging tools and technologies and recommend adoption that will provide competitive advantage and development/delivery efficiencies
  • Participate in architectural meetings with stake holders, Vendors to implement Hybrid Architecture
  • Solution architect to build cloud applications on AWS

Environment: AWS, Cloudera Hadoop, HDFS, Confluent Kafka, Spark, PySpark, Python, Hive, Kudu, Impala, Talend, Hbase, Snowflake, UNIX Shell Scripting

Big Data Consultant



  • As a Big data consultant responsible on-site and off-shore co-ordination, Work with different source owners and create data models
  • Design and Develop business layer from which users can create visualization and analytics
  • Work closely with customers, at a technical and user level, to design and produce solutions

Environment: AWS (EMR, S3, EC2, Athena, Cloudwatch, Aurora, VPC, Lambda, CloudFormation ) Cloudera Hadoop, HDFS, Spark, Kafka, Hive, Control-M, UNIX Shell Scripting

Big Data Architect



  • Big data Architect responsible for Data Architecture, Design Datalake, Hadoop and BI requirements and defining the strategy, technical architecture, implementation plan, Development, management and delivery of Big Data applications and solutions.
  • Conducted POC on Docker Containers and/or Kubernetes
  • Building Data Lake using AWS services, Monitoring and optimizing the data lake
  • Migrated On-premises Datalake to AWS Cloud platform
  • Ingesting wide variety of data like structured, unstructured and semi structured into the Big data eco systems with batch processing (Sqoop), Near real time streaming using Apache NiFi, Kafka and Flume
  • Developed High Speed BI layer on Hadoop platform with Kafka, Apache Spark and Python
  • Design, architecture and development of data analytics and data management solutions through PySpark
  • Developed Spark Streaming to get 5000 lease messages per second from Kafka and store the Streamed data to Hbase and HDFS
  • Installed, Configured NiFi Cluster and Completed end to end design and development of Apache NiFi flow which will ingest the data from various sources to Datalake (Hive ORC table and Hbase tables) and Splunk in near real time & batch processing
  • Installed, Designed, Configured and developed Apache Kafka solution to Ingested Multi-Channel Message Delivery Platform (MDP) data from Rsyslog to Solr
  • Ingested Abuse and DMCA emails to Solr using Flume and Morphalines, Later changed to Java API
  • Installed, Configured and developed Solr Collections for Abuse and DMCA that can collect and index all generated emails in real time and display them in one interface.
  • Administer and Maintain Hadoop cluster and its ecof systems. Upgraded Horton Works Hadoop to 3.0 version from 2.3.4 from Dev to Production clusters
  • Managed, reviewed HDFS file system, monitoring, reviewing Hadoop cluster for capacity planning
  • Created OP5 alert notification for memory, hard disk or any failure on Hadoop eco system
  • Configured Apache Ranger to manage policies for access to files, folders,databases, tables & columns
  • Evaluate the integration tools and create POC to demonstrate stake holders
  • Configured mountable HDFS which enables users to access HDFS file system like traditional file system on Linux

Environment: AWS (EMR, S3, EC2, Athena, Kinesis, Cloudwatch, Aurora, VPC, Lambda, CloudFormation ) HortonWorks Hadoop, Ambari, HDFS, Spark(Spark Core, Spark SQL and Spark Streaming), PySpark, Hive, LLAP, Pig, Kafka, Flume, Sqoop, Hbase, Solr, Splunk, Zeppelin, Pycharm, Jupyter, Apache NiFi, MiniNiFi, Streamsets, Python, Java, Eclipse, Spring boot Suite, Maven and UNIX Shell Scripting.

Big Data Architect



  • Demonstrated personal expertise by participating as a Big Data SME
  • Designed and led a team (Offshore and On-site) to successfully develop and deliver a Big data projects
  • Responsible for managing the development and deployment of Hadoop applications
  • Secured Hadoop cluster using Kerberos KDC installation, OpenLDAP integration.
  • Installing and managing Cloudera distribution of Hadoop for POC applications and also Implemented Proof of concepts on Hadoop stack and different big data analytic tools
  • Work closely with customers, at a technical and user level, to design and produce solutions. Have discussions with vendors and plan for the use cases and demo the product
  • Used Streamsets to seamlessly transfer data to Hadoop and outside with less coding
  • Built a framework to read JMS queue for reference data using Kafka, flume and Spark
  • Installed, Developed, deployed high performance and large scale data analytics solutions using Apache Spark. Explored best solution suited for the application by going through all the options available in Spark
  • Involved in upgrading Cloudera CDH 5.5 to 5.7.1
  • Wrote Spark programs using Java and Scala, Java UDFs for Hive and Pig
  • Installed and configured Hbase and Ingested CPPD data using flume to Hbase
  • As part of EAP Platform designed, architected and developed below applications
  • Bullseye b) AML (Anti Money Laundering) c) CPPD (Customer Predictive & Preventative Dissatisfaction) d) FFS (Financial Full Suite)

Environment: Cloudera Hadoop, Cloudera Manager, HDFS, Hive, Spark, Spark on Hive, Spark Sql, Pig, Impala, Kafka, Flume, Sqoop, Hbase, Talend, Platfora, Datameer, Paxata, Streamsets, HDFS storage formats(JSON, Parquet, RC, ORC, Avro), Python, Java, Scala, Autosys, Jenkins, Eclipse, Maven, Amazon AWS EMR & EC2 and UNIX Shell Scripting.

Hire Now