We provide IT Staff Augmentation Services!

Big Data Architect /data Analyst/data Scientist Resume

Houston, TX


  • Certified Cloudera Hadoop Admin and Developer, over 20 years of experience in IT - Software Engineering. Current working on Big Data/Hadoop and Data science.
  • Installation of Hadoop cluster and ecosystems, upgrade Hadoop, migrate data between clusters, set kerberos/ranger for the cluster, worked on NoSQL (Cassandra, Mongo DB, and Hive, Hbase).
  • Setup environment for data science team, installation of Anaconda, Jupiter hub, spark/python, tensorflow/Keras/hail etc.
  • Hands on experience in Hadoop Ecosystem components such as Hive, Pig, Sqoop, Zookeeper/Kafka.
  • Strong programming skills in Spark/Scala, python
  • Strong knowledge of Hadoop Architecture and all its ecosystem tools.
  • Designed and implementation in Kafka multiple nodes cluster to extract data from Teradata, Oracle database streaming to Hadoop cluster databases.
  • Hands on experience on AWS infrastructure services Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2).
  • Strong Red Hat, Linux Admin skills, working as sys admin on Unix/Linux for last 18 years.
  • Selected the ETL tools, wrote all Sql queries, and transformed integrated data.
  • Worked on feature cross, data representation, regularization for the machine learning project.
  • Researched on multiple algorithm, ex. SVM, RNN and CNN for a deep learning project to get the better predict models.
  • Installation of Oracle RAC/SAP DBA, Goldengate, SAP/ASM and many OLAP projects for data warehousing
  • Strong database and Hadoop cluster network performance tuning skills


Languages: Sql, Spark/Scala, Python, Spark/SQL, Shell script

OS: AWS/EC2, Google Cloud Platform, IBM AIX, Linux - RHEL7

Networks: TCP/IP, Windows NT 4.0 Server

Software: Cloudera/Hortonworks/Ambari, Kafka streaming, Anaconda, MongoDB, Oracle/SAP Enterprise Manager Grid Control (OEM) 12c, GoldenGate 12c, CA Erwin

Hardware: ASM for Oracle RAC/SAP, NFS, AWS/EBS, AWS/S3


Confidential, Houston, TX

Big Data Architect /Data Analyst/Data Scientist


  • Finished a machine learning and deep learning project that predicts patient treatment plan for the cancer center. Gathered patient records from different systems, normalized/validate/integrate all inputs/features, train and build the model, optimized performance to ensure the accuracy of our prediction as well as avoid the overfittings.
  • Setup big data cluster, Anaconda, Jupiter, spark 2 and environment for data science team. Installed all libs, ex. scikit - learn/Tensorflow/Keras/hail/matplotlib, to build the treatment and genomic analysis models for data science team.
  • As an expert data analyst, completed complex queries (Dataframes, spark Sql), against Hive database and mongodb (NoSql database) by both python and scala program.
  • Introduced the Google cloud platform, Google Colab and kaggles to the team and take advantage of its GPUs/TPU to train the models and generate graphical reports for analysis.
  • Installed HDP cluster, kerberized cluster on cluster and edge nodes, security setup with Rangers and user/group sync with AC/LDAP
  • Upgraded HDP cluster and migrated all data files from production cluster to test cluster.
  • Configured and fixed network/NIC issues and moved nodes to different RACK.
  • Added Informatica edge nodes for ETL/streaming data to data lake.
  • Setup strategy on backup/recovery datafiles throughout Hadoop cluster.

Confidential, Jersey City, NJ

Big Data Architect / Hadoop Administrator


  • Installed, configured Cloudera Hadoop and Hortonworks HDP 2.x clusters 5.x on AWS/EC2.
  • Setup Kafka cluster with multiple brokers, partitions and producer, consumer groups.
  • Installed and configured mongodb multiple nodes sharding, integrated with Hadoop cluster system.
  • Setup data streaming from Kafka to mongodb by Spark Scala program and big data load testing with elasticssearch team.
  • Write complex queries by Spark Scala and Python on NoSql databases and RDBMS for data analysis.
  • Provided detailed instruction on installation, configuration and workflow to the team.
  • Involved in data science/algorithm research for the project.
  • Performance tuning and troubleshooting on clusters and recommended solutions.

Confidential, Cincinnati, OH

Big Data Architect / Hadoop Administrator & Lead Developer


  • Installed, configured Cloudera Hadoop components and Hortonworks HDP 2.x.
  • Installed, configured Cassandra cluster by OpsCenter on AWS/EC2/S3.
  • Worked on analytics as lead Big Data/Scala developer.
  • Provided procedure definition and design of solution.
  • Evaluated performance on different Big Data platforms.
  • Performance tuning on peak time processes on Hadoop cluster.
  • Worked on CDH, Ambri and configured ecosystem tools like Hive/HiveQL, Hbase, Sqoop, spark SQL/Scala, Kafka.
  • Configured ETL from Teradata database, Data Lake for data warehouse.
  • Worked on data models and ingested DWH data to Hadoop cluster.
  • Monitoring and troubleshooting on production cluster.
  • Installed and configure machine learning tool - - Alpine data lab and data models
  • Worked on a data pipeline project using flume, Kafka, spark/scala/RDD/Dataframes and store in hive.
  • Generated BI report by Tableau.

Confidential, Seattle, WA

Data Architect / Hadoop Administrator & Developer


  • Actively involved in design, review, implementing and optimizing data transformation processes.
  • In the Hadoop ecosystems.
  • Lead several Hadoop data extraction, warehousing and analytics tasks.
  • Coordinated with offshore team on development task and troubleshooting.
  • Install, manage and support Linux operating systems, ex. RHEL, CentOS, Ubuntu.
  • Installed configured Hadoop Cloudera CDH5, setup Hadoop Cloudera distribution system and monitor by Hadoop Cloudera manager.
  • Hands on experience with Amazon web services, created EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), set EMR (Elastic MapReduce) with Hive scripts to process big data.
  • Worked on Pig and HiveQL. Involved in data warehouse, schemas creation and management.
  • Involved in writing Shell Scripts.
  • Setup and optimize the development and production environment.

Confidential, Sunnyvale, CA

Hadoop Administrator / Developer and Oracle DBA support


  • Installed and configured Red Hat/CentOS and Ubuntu/Cloudera manager with Hadoop multiple nodes.
  • Collected data from different databases (i.e. Teradata, Oracle, and MySQL) to Hadoop.
  • Installed, configured and created Hbase, Hive, and Pig and MapReduce scripts.
  • Worked on Hive/Hbase vs RDBMS, imported data to Hive, created tables, partitions, indexes, views, queries and reports for BI data analysis.
  • Involved in writing Shell Scripts.
  • Conducted introductory classes on Hadoop admin and Hadoop developer
  • Troubleshooting and performance tuning on Hadoop system.
  • Installed, refreshed, upgraded 11g databases.
  • Production Support for any OLTP database issue.
  • Work closely with Application teams to resolve performance issues.
  • Installed Oracle 11gr2 RAC on EMC/ASM storage, installed Data guard
  • Installed and configured Oracle 12c and GoldenGate 12c.
  • Installed and setup Oracle 12c Enterprise Manager and Mongo database.

Confidential, Midland, MI

Oracle RAC Admin and OEM 12C Admin


  • Oracle 11gr2 RAC system with SAP application
  • EMC/Vplex, VMware, Red Hat 6.2 with 24 cluster nodes and 10TB SAP application data.
  • Installed and configured 11gr2 RAC and Oracle Enterprise Manager 12c (OEM) to monitor all 24 nodes 24/7 on cluster, ASM, listeners, databases, agents and performance tuning.
  • Setup notifications, admin groups, monitor templates, incidents rules, scheduled jobs for backups, database clone, SQL performance, AWR reports.
  • Production Support for RAC/SAP system.
  • Documented troubleshooting procedures.

Confidential, Dodgeville, WI

Oracle Architect/DBA


  • Performance tune production database with Solaris 10 kernel.
  • Created performance health check report using OEM Grid control to analyze data in conjunction with AWR analysis by ADDM.
  • Heavy user interface with marketing team to reorg data warehouse to support the Business Intelligence model, also completed many SQL stored procedures for analysis and market trending and cleaned up, refreshed all images for company online sale.
  • Installed and configured Oracle Golden Gate for bi - directional replication.
  • Production Support for all Erwin data models for company databases. Managed requirements and releases for all international marketing work tracks.
  • Designed new production databases merge strategies.
  • Configured Nagios for system-wide monitoring.
  • Re-designed company development architecture and refresh procedures on VMware.

Confidential, Seaside, CA

Senior Oracle DBA


  • Installed and configured OEM (Grid Control) for security system.
  • Designed and maintained relational database model, with many sub models.
  • Monitored and performed troubleshooting for all US Air Force Base PIPS databases.
  • Performed SQL tuning for system statistics reports.
  • Worked on Data warehouse project, ETL scripts and implemented.
  • Setup policy for database security, such as audit vault, FGA and encryption.
  • Optimized system configuration related to daily performance and maintenance.
  • Provided solutions for data guard performance on network issues.
  • Installed Oracle 11g RAC on Linux RH, ASM.
  • Installed and configured Golden Gate to replicate Oracle 11g databases.

Hire Now