Big Data Cloud Architect Resume

SUMMARY

Very resourceful, high energy enterprise Big data architect offering 15+ years of expertise in architecture definition of large distributed systems, technical consulting, project management and technology implementation in Big Data solutions, Hadoop, Data Warehousing, Data Management and Application integration
Experienced in Big data solutions, Hadoop implementations for Banking and finance, communications, Logistic and retail.
5+ years in Architecting, Administering, Designing and deploying large scale big data solution using various Hadoop eco systems and NoSql databases
1.5+ years of experience on AWS Cloud Platform
Proficient with Apache Kafka and Apache Spark
Experience in installing, configuring and testing Hadoop ecosystem components. Upgraded several Hadoop cluster to next stable version
Experienced in installing, configuring, and administrating Hadoop cluster of major distributions
Proficient in Data Architecture/DW/Bigdata/Hadoop/Data Integration and BI Reporting projects with a deep focus in design, development and deployment of BI and data solutions using open source and off the shelf BI tools like Platfora, Tableau, Splunk and Hunk
8+ years of experience in Data warehousing & ETL, Pulling data from various sources into Data Warehouses and Data Marts using Informatica Power Center 8.x/7.x/6.x
Dimensional Data Modeling of STAR, Snowflake. FACT, dimensions tables using various Data modeling tools like ERWIN, Oracle Designer. managing the Technical Specifications, Test Plans and creating project implementation documents
Strong Knowledge in Logical/Physical Data model with Normalized/ De - normalized Databases on both OLTP /OLAP environment using different modeling tools.
Recommend new technology, policies or processes to benefit the organization and improve deficiencies in the project/organization

TECHNICAL SKILLS

Big Data Frameworks: Hadoop, HDFS, Ambari, Cloudera Manager, Hive, Pig, Impala, Spark(Spark Code, Spark Sql, Spark Streaming), Pyspark, MapReduce, Hcatalog, Sqoop, Oozie, Flume, Apache Kafka, Apache SOLR, Jupyter, Zeppelin

Cloud Tools: AWS (EMR, S3, EC2, Kinesis, DynamoDB, Athena, CloudFormation, VPC, Lambda, Load Balancer, Aurora, IAM, CloudWatch), CloudFoundry

NoSql: Hbase, Cassandra, DynamoDB

Distros: Apache, Cloudera Distribution, Hortonworks Distribution, Informatica BigData Edition

Data Warehousing: Snowflake, Informatica, Talend, Pentaho, SSIS

Databases: MySql, Oracle, Green Plum, Sql Server, Oracle Application DBA, SQL, PL/SQL, Sybase, DB2, MPP Teradata, Teradata Aster, Teradata Loom, Fast export, TPT

Data Integration: Apache NiFi, Streamsets, Pentaho Kettle, Talend, Attunity

Data Modelling: Erwin, Microsoft Visio, Oracle Designer 2000, Enterprise Architect (EA), Logical, Physical and Relational Modeling, ER Diagrams, Dimensional Data Modeling (Star & Snowflake Schema)

Languages: Java, Python, Shell Scripting, XML, SQL/PL - SQL, C, Pro*C, Pro*COBOL, AWK, Perl

Virtualization: VMWare, Virtual Box, Cloud Foundry

Visualization: Platfora, Splunk, Hunk, Apache Zeppelin, Tableau, Datameer, Paxata

Version Controlling: Git, Bitbucket, PVCS, CVS, Sub-version

Scheduling: Oozie, Autosys, Control-M, Crontab

Others: CI/CD, kubernetes, DbVisualizer, TOAD, SQL Work bench, Aqua Data Studio, Toad, MySql, Eclipse, XMLSpy, Spring tool suite, Java Script, VB Script

PROFESSIONAL EXPERIENCE

Confidential

Big Data Cloud Architect

Responsibilities:

Participate in data integration, business intelligence (BI) and enterprise information management programs, support Design and Development of different layers in Data Lake from which users can create visualization and analytics
Design and Develop Streaming application in PySpark to consume Real-time Salesforce data from Kafka topic
Hands on experience in Talend Big Data platform for creating data model, data container, views and workflows.Used different components in talend
Architected and Developed data warehouse model in snowflake for over 100 datasets
Implemented Kafka security features using SSL and Kerberos. Used various Kafka connectors using confluent kafka
Stay current with emerging tools and technologies and recommend adoption that will provide competitive advantage and development/delivery efficiencies
Participate in architectural meetings with stake holders, Vendors to implement Hybrid Architecture
Solution architect to build cloud applications on AWS

Environment: AWS, Cloudera Hadoop, HDFS, Confluent Kafka, Spark, PySpark, Python, Hive, Kudu, Impala, Talend, Hbase, Snowflake, UNIX Shell Scripting

Big Data Consultant

Confidential

Responsibilities:

As a Big data consultant responsible on-site and off-shore co-ordination, Work with different source owners and create data models
Design and Develop business layer from which users can create visualization and analytics
Work closely with customers, at a technical and user level, to design and produce solutions

Environment: AWS (EMR, S3, EC2, Athena, Cloudwatch, Aurora, VPC, Lambda, CloudFormation ) Cloudera Hadoop, HDFS, Spark, Kafka, Hive, Control-M, UNIX Shell Scripting

Big Data Architect

Confidential

Responsibilities:

Big data Architect responsible for Data Architecture, Design Datalake, Hadoop and BI requirements and defining the strategy, technical architecture, implementation plan, Development, management and delivery of Big Data applications and solutions.
Conducted POC on Docker Containers and/or Kubernetes
Building Data Lake using AWS services, Monitoring and optimizing the data lake
Migrated On-premises Datalake to AWS Cloud platform
Ingesting wide variety of data like structured, unstructured and semi structured into the Big data eco systems with batch processing (Sqoop), Near real time streaming using Apache NiFi, Kafka and Flume
Developed High Speed BI layer on Hadoop platform with Kafka, Apache Spark and Python
Design, architecture and development of data analytics and data management solutions through PySpark
Developed Spark Streaming to get 5000 lease messages per second from Kafka and store the Streamed data to Hbase and HDFS
Installed, Configured NiFi Cluster and Completed end to end design and development of Apache NiFi flow which will ingest the data from various sources to Datalake (Hive ORC table and Hbase tables) and Splunk in near real time & batch processing
Installed, Designed, Configured and developed Apache Kafka solution to Ingested Multi-Channel Message Delivery Platform (MDP) data from Rsyslog to Solr
Ingested Abuse and DMCA emails to Solr using Flume and Morphalines, Later changed to Java API
Installed, Configured and developed Solr Collections for Abuse and DMCA that can collect and index all generated emails in real time and display them in one interface.
Administer and Maintain Hadoop cluster and its ecof systems. Upgraded Horton Works Hadoop to 3.0 version from 2.3.4 from Dev to Production clusters
Managed, reviewed HDFS file system, monitoring, reviewing Hadoop cluster for capacity planning
Created OP5 alert notification for memory, hard disk or any failure on Hadoop eco system
Configured Apache Ranger to manage policies for access to files, folders,databases, tables & columns
Evaluate the integration tools and create POC to demonstrate stake holders
Configured mountable HDFS which enables users to access HDFS file system like traditional file system on Linux

Environment: AWS (EMR, S3, EC2, Athena, Kinesis, Cloudwatch, Aurora, VPC, Lambda, CloudFormation ) HortonWorks Hadoop, Ambari, HDFS, Spark(Spark Core, Spark SQL and Spark Streaming), PySpark, Hive, LLAP, Pig, Kafka, Flume, Sqoop, Hbase, Solr, Splunk, Zeppelin, Pycharm, Jupyter, Apache NiFi, MiniNiFi, Streamsets, Python, Java, Eclipse, Spring boot Suite, Maven and UNIX Shell Scripting.

Big Data Architect

Confidential

Responsibilities:

Demonstrated personal expertise by participating as a Big Data SME
Designed and led a team (Offshore and On-site) to successfully develop and deliver a Big data projects
Responsible for managing the development and deployment of Hadoop applications
Secured Hadoop cluster using Kerberos KDC installation, OpenLDAP integration.
Installing and managing Cloudera distribution of Hadoop for POC applications and also Implemented Proof of concepts on Hadoop stack and different big data analytic tools
Work closely with customers, at a technical and user level, to design and produce solutions. Have discussions with vendors and plan for the use cases and demo the product
Used Streamsets to seamlessly transfer data to Hadoop and outside with less coding
Built a framework to read JMS queue for reference data using Kafka, flume and Spark
Installed, Developed, deployed high performance and large scale data analytics solutions using Apache Spark. Explored best solution suited for the application by going through all the options available in Spark
Involved in upgrading Cloudera CDH 5.5 to 5.7.1
Wrote Spark programs using Java and Scala, Java UDFs for Hive and Pig
Installed and configured Hbase and Ingested CPPD data using flume to Hbase
As part of EAP Platform designed, architected and developed below applications
Bullseye b) AML (Anti Money Laundering) c) CPPD (Customer Predictive & Preventative Dissatisfaction) d) FFS (Financial Full Suite)

Environment: Cloudera Hadoop, Cloudera Manager, HDFS, Hive, Spark, Spark on Hive, Spark Sql, Pig, Impala, Kafka, Flume, Sqoop, Hbase, Talend, Platfora, Datameer, Paxata, Streamsets, HDFS storage formats(JSON, Parquet, RC, ORC, Avro), Python, Java, Scala, Autosys, Jenkins, Eclipse, Maven, Amazon AWS EMR & EC2 and UNIX Shell Scripting.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship