Senior Data Engineer (consulting) Resume
Moutain View Northern, CaliforniA
SUMMARY:
- Over 5 years in Hadoop Ecosystem/Big Data Architecture and Development
- Over 15 years of Very Large Database Design, Modeling, Management and Development experience
- Over 12 years of Data Warehousing Architecture, Extract, Transform and Loading experience
- Over 12 years in technical consulting for various companies.
- Over 20 years in software development experience using Java, Perl, Python, SQL, Oracle PL/SQL
- Expert in distributed database systems
WORK EXPERIENCE:
Senior Data Engineer (consulting)
Confidential, Moutain View, Northern California
Responsibilities:
- Designed and Built enterprise - wide Big Data Validation and ETL framework using Python, Java, and Hadoop ecosystem.
- Developed data ingestion pipelines from different database sources and flat files to Hadoop HDFS.
- Improved performance of queries using Spark-SQL
Data Architect and Big Data Engineer (consulting)
Confidential, San Jose, Northern California
Responsibilities:
- Designed data models to support an enterprise-wide Cloud-based application.
- Designed data models and new workflow and interfaces to support data integrity and governance of Cloud Consumption data assets.
- Integrated large data sets from different sources and recommend optimal processing paths.
- Worked on various NoSQL databases (Cassandra vs Mongo vs HBase) and provided recommendations.
- Developed new solutions and methodologies to collect and process large data sets of customer web traffic data.
- Designed and developed Spark programs in Python and Java to support multiple processes running in parallel against Mongo, HBase, or Oracle.
- Converted Oracle Database and PLSQL code to Postgres
- Application Performance tuning of Big Data Analytics and Oracle processes.
- Built end-to-end analytic solutions by considering delivery and presentation of analyses as well as data ingestion, transformation, and data stores.
- Developed big data applications with MapR Hadoop Ecosystem.
Skills set: MapR Hadoop, Postgres, MySQL, Oracle, Python, Java, Hive, Pig, Spark, HB ase, MongoDB, Unix Shell Scripting, Sqoop
Big Data Architect
Confidential, Redwood City, California
Responsibilities:
- Provided technical leadership in open source technologies and their application to Big Data platforms.
- Designed and implemented map-reduce-based big-data processing end-to-end pipeline involving billions of game telemetry records.
- Setup and managed Cloudera Hadoop Clusters to increase scalability and integrity.
- Designed and developed a workflow framework using Python to develop pipelines for ingesting data from various sources (ie. Teradata, MySQL, SQLServer, Oracle) to Hadoop and vice-versa.
- Developed various subsystems based on CDH, Amazon Web Services, HBase, Python, Hive, Impala, Pig, Flume, Kafka, Spark, ElasticSearch, MongoDB and Java.
- Developed applications that transformed and processed raw and unstructured data in AWS EC2 and S3 data storage into readily consumable data for streaming.
- Developed User-Defined functions for Hive in Java. Built transformation functions in Python for intensive data-transformation in Hive. Performance tuning of Hive scripts.
- Developed custom-loaders and extractors for Hbase, MongoDB and ElasticSearch using Java.
- Developed code to work with recommendation engines.
- Developed and built a user segmentation system to support in-game stores.
- Developed Java MapReduce programs to do map-side or reduce-side joins, data filtering, aggregation or transformation.
- Designed logical and physical data models using ERWin, and/or Oracle Database Designer.
- Researched, evaluated and applied new Big Data technologies and initiatives at Confidential .
Skills set: Hadoop, Teradata, MySQL, Python, Hive, Pig, Flume, Kafka, Spark, Impala, Hbase, MongoDB, Unix Shell Scripting, Sqoop, Amazon Web Services (EC2, S3), ElasticSearch, Java MapReduce
Senior Hadoop Developer ( consulting )
Confidential, San Francisco, California
Responsibilities:
- Built, managed and support Hadoop Production and Development Clusters.
- Developed a new ETL framework using Python based on a config-based workflow language that decreases development of ETL processes on Hadoop HDFS considerably.
- Developed and tuned Hive and Sqoop scripts.
- Administer Hadoop clusters.
Principal Database Architect
Confidential, Redwood City, California
Responsibilities:
- Designed and developed BigD ata warehouse in Hadoop.
- Developed Hive Queries for performance.
- Involved in the conversion of Oracle SQL to Hive queries for Hadoop Data Warehouse
- Designed migration of data from Oracle Data Warehouse to Hadoop HDFS.
- Developed database applications using various Oracle products and tools.
- Recommended strategic direction on database technologies for company's products.
- Translated business requirements and models into feasible and acceptable data warehouse designs; designs and builds appropriate data repositories and data movements (dimensional databases) to ensure that business needs are met.
- Wrote technology whitepapers in various 10g/11g technologies including OracleText, Oracle partitioning and Golden Gate replication for company's clients.
- Lead and mentored database development teams in the company. Provided database design, PLSQL training, and SQL code reviews.
- Provided performance recommendations to database teams to make database tasks fast and efficient - achieving 300-600 % performance improvement.
- Designed and developed datawarehousing dimensional databases to store and report comparative analysis of trends involving millions of transactions.
- Designed and developed data collection processes for real-time data warehousing using Change Data Capture technologies.
- Redesigned and refactored database processing PLSQL tasks to increase processing flow performance.
Principal Database Administrator
Confidential, San Mateo, California
Responsibilities:
- Provided Oracle database administration for a large data warehouse running under multi -node RAC environment on 64-bit Linux systems.
- Developed automated scripts using shell scripting and perl and PL/SQL to support and manage Data Warehouse ETLR operations.
- Managed day-to-day operations on the Data Warehouse.
- Automated daily and monthly partitioning management of over 400 tables in the database.
- Designed and developed ETL scripts using External Tables, SQL and data pump features to load 20-30 million records a day. Created and managed cron jobs to balance workloads on multiple servers.
- Proactively provided shell scripting support to the Unix Operations group to manage disk space resources and processes on the application and database servers. Developed monitoring scripts to actively alert technical personnel of impending issues.
- Performed data quality verification activities using shell scripting, Perl and SQL on the data stored in the database.
- Trained database administrators on how to implement Oracle 10g features in a data warehouse environment.
- Developed RMAN backup scripts to implement rolling backup strategy for databases with large number of tablespaces (over 1600+) supporting 200+ hosted customers schemas.
- Designed, implemented and documented all operational procedures and monitoring practices to ensure end-to-end validation is performed in all aspects of Operations both in the database and day-to-day processing.
- Installed and implemented two-node RAC on virtual servers (using VMware), development and production servers.
- Worked with Production Support, Implementation Managers and Technical Support to resolve customer issues.
- Trained and supported the Forensics team in effectively querying the Warehouse database using SQL analytic functions to reduce I/O and speed up query performance.
- Conducted training for Production Support, DBAs and technical support people in the operations side of the Data Warehouse in terms of database administration, data loading operations and reports production.
- Created and maintained logical and physical data models using ERwin/Embarcadero.
- Supported other databases such as SQL Server, DB2 databases during Development and QA.
Oracle DBA (consulting)
Confidential, San Francisco, California
Responsibilities:
- Provided database administration and 24/7 production support for the bank's 3,500 branches in a multi-master Advanced Replication environment.
- Developed shell scripts/PLSQL packages to support database operations and adhoc reporting requirements.
- Developed monitoring scripts to detect/prevent problems in the server/database before they occur.
- Developed scripts in bourne shell or Perl to perform filesystems management such as reporting and managing space used by archivelogs; verifying partitions that are automatically created on the database tables on a daily basis; detecting and reporting issues with DataGuard standby database regarding delayed archived logs or network issues; managing archivelogs disk space via RMAN.
- Managed a large Oracle data warehouse supporting the bank's day-to-day data warehouse reporting requirements and decision support systems.
- Implemented and administered 10g DataGuard Physical Standby Database for High Availability database services.
- Implemented and administered databases using Oracle Streams.
- Designed partitioning strategies for very large tables to provide efficient data management for hundreds of millions of transactions.
- Developed a PL/SQL package to automatically managed time-generated rolling data by dropping and creating new partitions to hundreds of partitioned tables and indexes in a Data Warehouse database on a daily basis.
- Developed backup and disk space management strategies using RMAN and shell scripts to handle large volume of archived logs being generated during heavy loads on the data warehouse database and to support physical standby database requirements.
- Used Enterprise Manager Database Control to monitor and tune databases. Reviewed AWR reports, investigated and executed changes to resolve performance issues in the database.
Senior Oracle DBA /Developer
Confidential, Palo Alto
Responsibilities:
- Provided Oracle database administration support for development, staging and production databases on DEC, HP, SUN Solaris platforms.
- Provided database support for Incyte's e-commerce OLTP site and online-subscription application system.
- Provided technical support for development and support teams.
- Installed, upgraded Oracle software from Oracle 7x to the latest Oracle 9i Release 2.
- Managed and Fine-Tuned Oracle 8i/Oracle9i databases in a 24 x 7 environment.
- Performed performance analysis and tweaked database internals to achieve optimum performance from the database system.
- Performed short term and long term database planning.
- Performed Oracle database migration from 8i to 9i.
- Developed backup and recovery strategy and documented recovery procedures for the company's online database products. Implemented standby databases for high-availability database service.
- Developed several UNIX shell scripts in Bourne shell or PERL to perform automated database monitoring and SUN server disk space maintenance.
- Designed and maintained logical and physical database models
- Recommended server setups for high performance and availability databases.
- Developed and designed databases for the company's core products utilizing Oracle database technology.
- Performed software development work for the server side in languages such as C, Java and JDBC, Perl, and PHP. Developed a web-based database reporting tool in C, Pro*C and PHP for users, developers and fellow database administrators.
- Analyzed and produced reports from BEA Weblogic services access log files to provide specialized and detailed usage reporting on company¡¦s internet site.
- D esigned and developed high-performance extraction, transformation and loading programs to load public domain and Incyte-proprietary genomic data into an Oracle database.
- Developed programs in C, PL/SQL, Perl, PRO*C to extract, transform and load hundreds of gigabytes of data from various file formats into a data-warehouse type gene database running on SUN servers. Extensively used C and PRO*C to extract and upload large quantities of data into an Oracle 8i database (over 600Gbytes) in a fast and efficient manner. Devised new algorithms for parallel loading of data and use of array inserts and user-specified delayed commits to improve Oracle database throughput.
- Mentored and trained junior DBAs on the use of new Oracle technologies.
- Documented all essential database administration procedures from database creation to performance tuning to database recovery.