We provide IT Staff Augmentation Services!

Sr Big Data Architect Resume


  • Big Data Architect having more than 12 years of experience as a trusted consultant, strategic advisor with rare combination of hands - on technology skills and business leadership specializing in Big Data/Hadoop, Cloud, Site Reliability, Networking, Infrastructure, building exceptional products and software.
  • More than 6 years of Experience in managing and leading Big Data Team and manage SRE, OSE and DBA, Hadoop admin .
  • Adept in leading, mentoring and managing large high-performing teams with great track record of shipping high quality software on time and in budget. Expertise in Big data, Hadoop, Machine learning, Data science, Program and Project Management, Release Management and Site Reliability Manager . Release Management, Onshore and Offshore Development, Cloud Computing (IaaS, PaaS, SaaS)
  • Machine learning, Technology Infrastructure, Networking, Data center buildouts.
  • Business Intelligence, Data warehousing, QE, DevOps, all phases of SDLC and 24/7 Support and Operations. Expertise in building and leading high performing IT teams with excellent critical thinking, problem solving and analytical skills.
  • Meticulously managed multi-million-dollar initiatives concurrently with cross functional teams across the globe with tight collaboration and alignment across diverse global business, development, support and testing stakeholders. Consistently worked with difficult situations providing calm, decisive and inspirational leadership managing and balancing competing priorities to deliver world-class forward-looking software and products.
  • Hands-on Cloud Architect, Data Warehouse/BI/Analytics Architect, Data & Big Data Architect. Specializing in powerful designs that extract the maximum business benefit from Intelligence & Insight investments.
  • Hands-on Leader of multiple cloud, Hadoop, NoSQL, BI/DW, database evaluations, performance tuning, rescue projects, IoT, pilots & production implementations that delivered high ROI.
  • EMR, HortonWorks & Cloudera, Kinesis, Storm, Kafka, Spark, PIG, Impala, Hive, ELK and Hadoop security
  • Expert in database & Hadoop security & encryption technologies and architectures.
  • NoSQL (Redis, CouchDB, MongoDB, Cassandra, Neo4J, DynamoDB, Memcache) or SQL (Oracle, SQL*Server, DB2, MySQL, Postgres) modeling & administration.
  • Architects and designs Big Data solutions based on Hadoop and Real Time data processing utilizing
  • Implement Site Reliability Projects for the Big Data Hadoop Infrastructure platforms.
  • Worked in Agile and Waterfall Methodologies with high quality deliverables delivered on-time.
  • Experience with continuous integration and automation using Jenkins.
  • Experience as a Big Data Architect/Cloud Projects
  • Capacity Planning, Configuration, Installation and Maintaining the Big Data Hadoop clusters
  • Cloudera, Hortonworks Distribution Release Management
  • Hadoop, MapReduce, Hbase, Sqoop, Amazon Elastic Map Reduce (EMR)
  • Managed Hadoop clusters: setup, install, monitor, maintain: Cloudera CDH, Apache hadoop
  • Big Data: EMR, HortonWorks & Cloudera, Kinesis, Storm, Kafka, Spark, PIG, Impala, Hive, ELK
  • NoSQL (Redis, MongoDB, Cassandra, Neo4J, DynamoDB, Memcache) or SQL (Oracle, SQL*Server, DB2, MySQL, Postgres) modeling & administration.
  • DevOps/AWS: Redshift, EC2, EBS, S3, RDS, VPC, DynamoDB, Route53, ELB, IAM, CloudFront, CDN, Cloud Formation Lambda, Python boto
  • Recent experience in Google Cloud Platforms ( GCP) which includes storage, IAM, machine learning, Security, Kubernetes, migration of application to GCP, AppEngine and Google Financial Services Solutions
  • Expert in Hadoop Administration and HDFS file systems
  • Experience in scripting of Python, Pandas, puppet, bask, perl, Java, R
  • HBase and analytics using R and Python
  • Experience in Handling large scale Hadoop environment builds and support.
  • Hadoop Administration, development Big Data/Hadoop solution Architecture, Design & Development
  • Implement Datawarehouse solution for petabytes of data.
  • Built real-time Big Data solutions handling billions of records
  • Built scalable, cost-effective solutions using Cloud technologies
  • Big Data/Hadoop solution Architecture and deployments in Cloudera and Hortonworks. Building up of Hadoop clusters in ec2, data centers
  • Expertise, experience in Cloudera Hadoop administration.
  • Installed hadoop clusters on EC2, Rackspace on Redhat, Centos nodes ranging 10-1000's NoSQL and real time analytics PXE, puppet, chef apache, postfix, sendmail, haproxy MySQL, Cassandra, MongoDB, hadoop perl, bash, python.
  • Release Mgmt of the HDFS Distribution itself in the dev, qa and Prod environment .
  • Develop networking strategies, cloud strategies, designs for private, hybrid and public cloud
  • Build up HA (High Availability Solutions for Hadoop Clusters )
  • Tested Site Reliability Solutions for Business Units
  • Manage multiple global resources to resolve all types of incidents, including major incidents, for a global operation open 24X7*365
  • Provide incident communications to senior leadership and impacted business groups. Provide leadership with KPI reports/dashboards with respect to incident trends, SLAs, MTTR, and other information as requested
  • Lead team of engineers for maintaining CR
  • Manage on call-rotations
  • Manage OSE, Hadoop admin, devops, analysts to make sure Big Data cluster availability 24*7*365
  • Implement Kimball and Star Schema Modelling for the DB
  • Implement Snowflake Architecture for the groups .
  • Experience in 3NF, Snowflake, dimensional modelling and dimensional and fact tables, Normalization and Denormalization techniques .
  • Experience in Data Marts and Bus Architecture


RDBMS: Oracle 18c,12c,11g, Oracle 12c Multitenant Databases, Real Application Cluster (RAC), ASM, MS SQL SERVER 2005/2008/2012/2014 , MYSQL 5.7, Teradata, SQL*Server 2012, 2016 . Greenplum

Operating Systems & Administration: Red Hat Linux 6.x, 7,x, AIX 5 Windows HP-UX Sun Solaris 9,10, Exadata x5-2, X3-2, X2-2, x2-8, AWS, Windows Server, VMware.

Languages: Python-3.6 & 2.7, Java, SQL and PL/SQLC, C++, OOPS, JAVA Script, Bash Shell Scripting, UNIX Shell Scripting, Jenkins, uDeploy.

IDE Development Toools: Eclipse, PyCharm, and Sublime Text.

Oracle Tools and Utilities: Data Guard, GoldenGate Logdump, Defgen, Reverse Utility, Veridata 12.2, 12.1, 11.X Web logic server, ODI, OBIEE, WebSphere, RMAN, OEM, Autosys, Tuning Advisors, TOAD, SQL*Loader, Export/Import, Data pump, Erwin


Ticketing Tools: BMC Remedy, JIRA, NEWS, Service Now, HP Quality Center

Others: Concepts and Administration Data Guard (Standby), Real Application Cluster (RAC).Automatic Storage Management (ASM) / p / Active Data Guard Oracle OCFS2 / ACFS Database Cloining via RMAN / Data Pump / DB Links etc., Github, Jenkins, Tableau, SAS

Big Data Distributions: Hadoop, Cloudera, Hortonworks, Mapr, Amazon AWS

Amazon EC2: Amazon ec2, security groups, vpc, EMR, RDS, redshift

Bigdata Ecosystem: Cloudera CDH 6.X 5.X, Hortonworks, HDFS, PIG, MapReduce, Yarn, Zookeeper, HIVE, SQOOP, FLUME, HBase, Storm, Kafka, Redis, Storm, Elastic Search

No SQL DB: Mongodb, Cassandra, Couchdb

Tracking Tools: Bugzilla, Bugnizer (google) and JIRA.

CI/CD: Jenkins

GCP, GCS: Dockers, Kubernetes, Big Query, Dataproc



Sr Big Data Architect


  • Provided hands-on subject matter expertise to design and implement Hadoop-based Big Data solutions for process optimization.
  • Disaster Recovery Projects: Implement DR Strategy for different Business Units
  • Implement Site Reliability Solutions for different business units using Hadoop clusters
  • Implemented Cloudera Hadoop systems 5.x, 6.1 ( CDH) and manage Cloudera Hadoop clusters and work with Cloudera manager and setup Apache Kafka, Apache Spark, Impala, Implement High Availability, setting up Cloudera Director, Zookeeper, Presto, Pig, Hive, Oozie
  • Manage the Hadoop clusters and perform the admin tasks adding nodes, maintain clusters, troubleshooting the issues
  • Manage team of Developers, analyst, Hadoop admins, devops in the organizations
  • Implement Hadoop Clusters in the Financial Domain, capacity planning, configuration, installation and implement in Prod, Dev and UAT
  • Provide and implement Big Data architecture solutions to multiple Line of Business in Banks
  • Implement multiple POC for different Line of Business for Hadoop Big Data Solutions
  • Designed and built data ingestion, cleansing and enrichment pipelines utilizing the following tools and technologies: AWS (S3, EC2, EMR, Lambda, Step Functions, RDS, CloudWatch, DynamoDB, Athena), Java, Python, Spark (Python and Scala API), PostgreSQL.
  • Manage and lead the project for Hadoop security for the Hadoop clusters
  • Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
  • Responsible for design and development of advanced Python programs to prepare transform and harmonize data sets in preparation for modeling.
  • Implement the Site Reliability projects and testing for different Line of Business
  • Developed and maintained data dictionary to create metadata reports for technical and business purpose.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Interaction with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions.
  • Participated in Business meetings to understand the business needs & requirements.
  • Worked with Hadoop ecosystem covering HDFS, HBase, YARN and Map Reduce. .
  • Analyzed the weblog data using the HiveQL to extract a number of unique visitors per day, page views, visit duration, and managed and reviewed Hadoop log files.
  • Build up ETL application using Amazon Redshift: Designing Tables, Loading Data into Redshift, Amazon Redshift Advisors, configuring WLM queries, STL and STV Tables
  • Utilized Spark, Scala,Java, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, Kinesis, Flink, Storm for ETL processing in Hadoop environment
  • GCP, GCS, Big Query, Compute and ETL Projects in GCP
  • Manage Oracle DB and build up pipelines for ingestion to Hadoop clusters
  • Implement cloud architect for databases in aws amazon and set up some test/dev db mysql and oracle in amazon aws . Working on Big Data, Hadoop, NoSQL databases, Dynamo


Solution Architect


  • Provided the architectural leadership in shaping strategic, business technology projects, with an emphasis on application architecture.
  • Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Performed Data preparation on a High dimensional (Big data with large volume and variety) Data sample collected from the live customer data.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for the client .
  • Designing & Creating ETL Jobs through Talend to load huge volumes of data into Cassandra, Hadoop Ecosystem, and relational databases.
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
  • Designed and implemented data ETL pipeline using python for analytics for tracking each operator’s data and performance.
  • Manage Confidential Healthcare and Clinical data and build up ETL pipelines for ingesting the data into the HDFS systems
  • Utilized the following technologies: Storm, Spark, and Kafka for data ingestion and real-time event processing; Hadoop (HDFS, MapReduce, HBase, Pig, Hive, Oozie, HCatalog, Flume, Sqoop, ZooKeeper); Couchbase, Accumulo, Machine Learning and statistics (R, MATLAB, classification, Bayesian ratings, clustering, C5.0, 1R, RIPPER); Data mining and correlation analysis over multi-dimensional datasets; AWS (EC2, S3, EMR); Java, Java EE, RESTful webservices; JSON, Avro, Parquet.
  • Configured and set up projects for Continues Integration utilizing GIT, Maven, Nexus, and Jenkins

Lead Admin / Architect



  • Defined Architecture Standards, BigData Principles, and PADS across Program and usage of VP for Modelling.
  • Created multi-stage Map-Reduce jobs in Python for ad-hoc purposes
  • Developed pig scripts to transform data and loaded into HBase tables.
  • Developed Hive scripts for implementing dynamic partitions
  • Used HDFS and My SQL and deployed HBase integration to perform OLAP operations on HBase data.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
  • Worked on evaluation and analysis of Hadoop cluster and different big data analytic tools like HBase . Developed MapReduce programs to perform data filtering for unstructured data.
  • Collaborated with application teams to install the operating system and Hadoop updates, patches, version upgrades.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring, and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Utilized the following Big Data technologies for Inventory Data Warehouse: Hadoop (HDFS, MapReduce, HBase, Pig, Hive, Oozie, HCatalog, Flume, Sqoop, ZooKeaper).
  • Configured and set up projects for Continues Integration utilizing SVN, Maven, Artifactory, Jenkins, and Sonar

Environment: Hadoop, MapReduce, TAC, HDFS, HBase, HDP Horton, Sqoop, Hive ORC, Data Processing Layer, UNIX, MySQL, RDBMS

Hire Now