Big Data Architect Resume

SUMMARY:

As a Big Data expert, I am looking for a senior position in big data architecture/engineering where I can help the company achieves its mandate by utilizing my 20+ years of experiences in data architecture, data analysis, and framework design and development.

SKILLS & ABILITIES:

Big Data:

Hadoop ecosystem: HDFS, Yarn, Ambari, Cloudera Manager, Hive, Impala, HBase, Accumulo, Ranger, Knox, Atlas, Spark, Zeppelin, NiFi, Flume, Kafka, Sqoop, Solr, ZooKeeper, Oozie.

Other NoSQL databases: Couchbase, Neo4j, AWS DynamoDB, Cassandra.

Programming Languages: Python, Scala, Java, R, Shell script, .Net.

Machine Learning Algorithms: Supervised Learning (Linear regression, Logistic regression, Decision Tree/Random Forest, Naïve Bayes, KNN, GBM/XGBoost), Unsupervised Learning (clustering and segmentation)

Containerization: Docker, Kubernetes

DevOps: Chef, Terraform, ELK (Elasticsearch, Logstash and Kibana)

Access control: Kerberos, LDAP

Architecture/modeling tools: Microsoft Visio, CA ERwin, Sybase PowerDesigner

Business Intelligent & Data Warehousing: Amazon Redshift, SQL Server Analysis Services, SQL Server Integration Service, Talend, Tableau, Microsoft Power BI, Microstrategy, Scribe.

Amazon AWS: Elastic MapReduce, EC2, S3, Lambda, RDS, Aurora, DynamoDB, Redshift, Data Pipeline, Machine Learning, Kinesis.

Microsoft Azure: HDInsight, Blob Storage, Data Lake Store, SQL Database

RDBMS: Microsoft SQL Server 2008/2012/2014 , Oracle 10g/11g, PostgreSQL, MySQL.

Database Tools: ScaleArc, Quest Toad for Oracle, Quest LiteSpeed, Quest Spotlight, Confio Ignite8.

Operating Systems: Linux, Unix, Windows

EXPERIENCE:

Big Data Architect

Confidential

Responsibilities:

Work on migrating ETL migration from Informatica to AWS, leverage AWS services such as S3, RDS, Lambda, Glue/PySpark, EMR, Redshift.
Use Python on Spark (PySpark) to develop ETL jobs
Design Physical and Logical Data Model (PDM and LDM) for various Business Units using PowerDesigner

Big Data Architect

Confidential

Responsibilities:

Work on migrating Cloudera cluster to AWS cloud, leverage AWS services such as S3, EMR, RDS, Data Pipeline and Redshift for high availability, high durability and low cost.
Develop and deploy AWS Lambda functions using Python, moving data from S3, Kinesis to RDS Aurora and Hive tables
Design, develop and deploy disaster recovery plan for Cloudera cluster, set up replication schedules to backup HDFS and Hive content to AWS S3 buckets
Develop and deploy Talend jobs for ETL, calling third party SOAP APIs, extracting XML content and saving to HDFS
Upgrade and maintain AWS Aurora clusters, set up multi - instances within RDS clusters for high availability and load balance
Tune and optimize Cloudera Impala and Hive queries

Big Data Architect

Confidential

Responsibilities:

Design Big Data platform to embed and support enterprise data analytics and intelligence products, leverage Hortonworks Data Platform stack, such as HDFS, Yarn, Ambari, Slider, ZooKeeper, Ranger, Accumulo, Kafka, Solr, Oozie, and etc.
Evaluate and validate NoSQL graph database, Neo4j.
Evaluate Microsoft Azure, focus on HDInsight, Data Lake Storage, Cosmos DB and etc.
Use Docker and Kubernetes to containerize Big Data service Zookeeper and Solr

Big Data Architect

Confidential

Responsibilities:

Design and develop Enterprise Data Lake which contains most of the important datasets (e.g., transactional data, network data, CDR, clickstream data, etc.) from entire corporate. It has very flexible data ingestion mechanism by leveraging HBase, NiFi, Kafka, Spark, Scala and Oracle GoldenGate and is capable of collecting data from different data sources that reside on a large variety of systems in both batch and near real-time fashions. It provides flexible data processing pipelines that are built with Hive and Spark. It also enables different types of data consumptions for a large number of internal users and groups.
Develop Customer 360 project (UCAR - Unified Customer Analysis Report)
Data preparation: integrate customer data from various data source, such as Accounting, Billing, Usage and etc., complete data cleanse and feature engineering.
Data analytics: use cases are NPS (Net Promoter Score), predict churn model, CCTS complaint reduce and market campaign.
Identify and work on various items to improve performance, security and stability, resolve issues/challenges in enterprise Big Data platform, such as
Enhancing data security by implementing Apache Ranger, leverage Ranger policy model for dynamic column masking to handle data security PII and row filtering for Hive; also leverage Apache Atlas-Ranger integration to supportclassification (tag) based as well as other dynamic policies (location based, prohibition, data expiration)
Improving Hive performance by setting proper system and session parameters, replacing M/R with Tez as execution engine, using Cost Based Optimization, enabling dynamic partitioning and vectorization.
Solving ‘too many small files’ issue in HDFS by leveraging Hadoop Archive (HAR), Hive partition concatenation and enabling merge task for both Tez and M/R jobs.
Define and maintain guidelines, policies, and templates, such as
Hive development/configuration guidelines
Data access and retention policies
Spark development guidelines
Evaluate and validate new frameworks, such as NiFi, Zeppelin, GoldenGate Big Data framework.

Sr. Database Architect

Confidential

Responsibilities:

Admin of Big Data NoSQL system, Couchbase v3/v4, and SQL Server 2012/2014.
Provision, design and deploy BI Analytics solutions including Big Data Hadoop ecosystem (HDFS, MapReduce, Hive, etc.)
Improve ETL performance, evaluating Apache Spark for current data flow.
Evaluate Hadoop systems, including Cloudera and Hortonworks. Tools: Cloudera Enterprise Manager, Sqoop, Ambari, HCatlog, Pig.
Designed and deployed BI analytics data flow from production databases (Couchbase and SQL Server) to AWS S3 and Redshift, using AWS EMR cluster and Hive for ETL.
Successfully designed and deployed production NoSQL Couchbase clusters in AWS EC2 instances, there is no unexpected downtime up to date.
As member of DevOps team, prepared Chef cookbooks and Terraform files to automatically deploy Couchbase cluster and SQL Server instance in AWS
Successfully migrated production databases SQL Server 2012 AlwaysOn High Availability Group from local datacenter to AWS, improved HA and lower production work load using Replication.

Sr. Database Architect

Confidential

Responsibilities:

Evaluate Hadoop systems, install Cloudera Manager and Hortonworks clusters
Import data from RMDBS to HDFS using Sqoop
Admin of SQL Server 2012

Sr. Database Architect

Confidential

Responsibilities:

Taking charge of database structure design/modeling, high availability configuration and database installation.
Taking charge of system monitoring, performance tuning, data integrity, system security, backup and restore, etc.
Taking charge of disaster recovery plan, identifying mission critical database servers and designing data synchronization.
Business intelligent & data warehouse:
Taking charge of financial reports using SQL Server Reporting Services and Crystal report.
Taking charge of ETL using SQL Server Integration Services.
Taking charge of Microstrategy administration.
Set up 3 nodes Hadoop Cloudera Manager cluster, enabled HDFS high availability and automatic failover.
Successfully remodeled production database server structures, migrated and upgraded from SQL Server 2005 to SQL Server 2008 and 2012, improved database high availability by using Failover Clustering, Log Shipping and Replications, raised server performance over 350% and maintained uptime as 99.99%.
Highly raised database server performance and data integrity, i.e. reduced the latency of text searching from 5 minutes to less than 10 seconds by deploying Full Text Search.
Successfully reverse engineered old production databases, redesigned database structures by using Microsoft Visio and Erwin, optimized whole working process, and improved application performance over 200%.
Successfully designed Business Intelligent/ETL process for Finance reporting system by using SSIS, SSRS and Microstrategy.
Migrated company CRM system from MSCRM/Saleslogix to Salesforce, deployed integration between Salesforce and in-house Oracle system using Scribe.
Successfully upgraded production Oracle 10g database from single instance to RAC (Real Application Cluster). Improved database performance by 200% and improved database high availability by implementing Oracle Physical/Logical Standby DB.

Java Developer

Confidential

Responsibilities:

Worked as the tech lead to capture issues and make a change proposal or solution
Converted business requirements into technical artefacts
Lead developer of Docstore, DecGen and E-Signature which are the common components for all applications(Java/J2EE/Workflow/MQ/Struts/CSS/HTML).
Tech leader of the Enhance/Refactor Outbound/Inbound Fax project to provide enterprise level Fax Solution for all line of businesses(C++/MQ/Web service)
Design / implement backend functions of CMC and lead developers by example (Java/J2EE/Struts/HTML/CSS/Java Scripts)

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship