We provide IT Staff Augmentation Services!

Big Data & Google Cloud Consultant Resume

SUMMARY:

  • A hands on Big Data & Analytics Architect with 14 years’ experience and strong skill set in Big Data, Google Compute Cloud, Hadoop, Amazon AWS, Microsoft Azure. A Certified Scrum Master and Entrepreneur involved in implementation of agile projects in financial services and tech start - ups delivering high quality software in tight deadlines. Strong team player, leader and motivator.
  • Data Consultant for Confidential working with Google Compute Cloud Data Flow and Big Query to manage and move data within a 200 Petabyte Cloud Data Lake for GDPR Compliance
  • Confidential Big Data Architect working with AWS EMR, Redshift Specturm, Athena, Glue, Data Catalogs, Kinesis streams, Informatica, Lambda functions, Cloudwatch
  • Big Data and Analytics Architect for Confidential in New York creating a multi region Azure cloud based data lakes using MS Azure HDInsight, SQL DW, Azure Blobs, AzureSQL, DocumentDB, HBase, Spark, Kafka, Docker, Kubernetes, PMML, IBM BigIntegrate
  • Consultant working on Confidential ’s Data Architecture for a data lake with PB+ data volume using Hortonworks, AWS, Hive, ORC Files, Falcon, Cascading, Flume, Cassandra, Spark
  • Designed a data-warehouse for Confidential using Amazon AWS EMR, hadoop, Pig, Hive, Sqoop, AWS EC2, S3, Kinesis, Cloud Formation, Data Pipeline, Redshift, Qlikview.
  • Designed ETL flows with Oracle ODI to process up to 100 million banking transactions a day. Certified Scrum Master, MSc in Information Systems and Oracle Certified Professional.

COMPUTER PROFICIENCY:

  • Hadoop, Map-Reduce, Flume, Pig, Spark
  • Docker, Kubernetes, Azure Container Services
  • Hadoop YARN, Cascading, Ambari, Hue
  • Azure Data Factory, Polybase, AWS Athena
  • Java, PMML, Python, AzureML
  • Zementis ADAPA, ODG FastScore
  • Informatica BDE, IBM DataStage / BigIntegrate
  • Hadoop HDFS, AWS S3, Azure Blob
  • Oracle Data Integrator 11G / ETL
  • AWS Data Pipeline, Kinesis, Azure Data Factory
  • Enterprise Architect, ERWin
  • PowerBI, Qlikview
  • Cloudera CDH, Hortonworks, AWS EMR, HDInsight
  • Apache Falcon, Oozie
  • Google Big Query, AWS Redshift Spectrum
  • Kafka, Spark Streaming, Kinesis Streams
  • Hbase, DocumentDB, Cassandra
  • Lucene, Solr, Kafka, Sqoop
  • Teradata, Azure SQLDW, Oracle 11G
  • Jira, Bamboo, Confluence, Gliffy, Fisheye
  • Liquibase / Flyway (DB Change Mngt)
  • Linux (shell, bash scripting)
  • Git, Tortoise Sub Version SVN
  • Linux Red Hat, Centos, AWS EC2, CoreOS

CAREER SUMMARY:

Confidential, New York

Big Data & Google Cloud Consultant

Responsibilities:

  • Consultant for Confidential working with Google Compute Cloud Data Flow and Big Query to manage and move data within a 200 Petabyte Cloud Data Lake for GDPR Compliance.
  • Designed Google Compute Cloud Data Flow jobs that move data within a 200 PB data lake
  • Implemented scripts that load Google Big Query data and run queries to export data

Confidential, New York

Big Data & Cloud Architect

Responsibilities:

  • Consultant working on Confidential clients like NovoNordisk creating a AWS based cloud data lakes.
  • Designed the NovoNordisk data architecture with AWS EMR, Redshift Specturm, Athena, Glue, Data Catalogs, Kinesis streams, Informatica, Lambda functions, Cloudwatch etc
  • Created EMR Clusters with Autoscaling with Core and Task nodes and Ranger integration
  • Created and performance tuned Redshift clusters with encryption, Redshift spectrum based S3 data queries
  • Created AWS Lambda function which loads data from s3 into Redshift with batch mode with SNS and DynamoDB integration

Confidential, Bay Area, New York

Co Founder

Responsibilities:

  • Created ARKit2 based AR apps
  • Created GCP ML Engine & Tensorflow based AI Training processes

Confidential, New York

Big Data & Analytics Architect

Responsibilities:

  • Defined Global Data Architecture using MS Azure HDInsight, SQL DW, Azure Blobs, AzureSQL, DocumentDB, HBase, Neo4J, Spark, Kafka, Polybase, IBM DataStage / BigIntegrate, Infosphere IGC
  • Designed Analytics and Model management Archite cture for Python, Scala, R, Java based models, Zementis ADAPA / PMML based model deployment, Microsoft RevoR, AzureML, PowerBI
  • Defined model runtime and management, using Docker containers, Kubernetes, REST APIs
  • Designed data / ETL pipelines using Azure Data Factory, IBM DataStage / Infosphere, Azure Copy, Polybase, Multi region data replication

Confidential, New York, Pittsburgh

Big Data Architect

Responsibilities:

  • Defined ETL strategy using Informatica Big Data Edition’s Blaze component that can work in Hive / Map Reduce / Spark / Native mode which abstracts away mappings from actual code.
  • Created road map for securing sensitive PHI and other health data at Highmark using using HDFS TDE & OS Encryption strategies, Ranger, Knox, Kerberos, Protegrity, Dataguise
  • Designed cluster Architecture to separate Analytics and Operational workloads and merge in Archiving and DR options into it.
  • Defined big data virtualization strategy comparing various tools like Denodo, Cisco Composite and Informatica Data Services

Confidential, Seattle

Big Data Architect

Responsibilities:

  • Created Data Architecture roadmap and Data Governance policies for big data and presented to C Level execs
  • Designed cluster architecture for components like Hortonworks, AWS EMR, Spark, Falcon, Oozie, ORC, Cassandra, HDFS, Flume & Kafka for streaming
  • Implemented Data Governance policies with Knox and Ranger for data access management & auditing and Apache Atlas for Meta & Master Data Management
  • Created HDFS structures’ best practices and AWS S3 bucket naming & security policies
  • Pushed ORC file adoption which resulted in 3x faster jobs and 2.5x better compression
  • Designed Data Frames and RDDs for Spark jobs that ran 20 x faster than older MR jobs
  • Tuned Spark jobs using confs & hive queries using ORC columnar, Tez, CBO explain plans
  • Designed falcon jobs to replicate between environments and archival using HDFS tiers
  • Created conformed dimensions like Geo Dim and designed porting into Hadoop as UDFs / Lucene index and finally into Data Stax Cassandra to support recommender systems

Confidential

Big Data Architect

Responsibilities:

  • Designed a data-warehouse from scratch to process up to 50TB of data using hadoop.
  • Analyzed clickstream data from Google analytics with Big Query.
  • Capacity planned for cloud infrastructure with AWS console, EC2 Instances, AWS EMR, S3
  • Created AWS EMR cluster using Cloud Formation scripts, IAM, Kinesis Firehose, Data Pipeline,
  • Designed fact tables living in Amazon s3 in flat files in Avro and Parquet format on which Hive based table structures and expose the fact tables to Qlikview reporting.
  • Designed Sqoop based data exchange from Redshift, Oracle, SQL Server, Mysql
  • Designed star schemas in Amazon Redshift using compression encodings, data distribution keys, sort keys, and table constraints
  • Designed APIs to load data from Omniture, Google Analytics, Google Big Query

Confidential, New York

Co-Founder & Data Architect

Responsibilities:

  • Created Confidential App, which was featured on the Discovery Channel, Chicago Tribune
  • Tuned EMR Spark cluster / jobs by Spark Executors’ memory, Containers, EC2 instances,
  • Set up Cloudera CDH5 Hadoop cluster with hdfs, hbase, pig, oozie, zookeeper to store and process GTFS and user data.
  • Set up Data Stax Cassandra cluster, Ops Centre for processing of temporal geospatial data and implemented connector for Spark execute on Cassandra data frames.

Hire Now