Technical Architect Resume

SUMMARY:

Architect for large - scale .Net, Java, BigData Database applications, SOA, SaaS, Rest
Expert in numerical methods, algorithms and optimization, Scientific applications, SOA, SaaS, Rest Web services, Database/Data Warehouse applications
Expert knowledge of AWS, Redshift, MPP
Equally proficient in C#, Java and SQL
Working experiences with Hadoop, NoSQL, Scala, Pig, Python,
Linux/Java/Scala
Maven, GIT, SVN, Tomcat, SBT
Eclipse, JDeveloper, JBuilderNetBeans
Core Java, JDBC, Spring, Generics, Annotation, Multithreading, Functional Programming
J2EE, EJB, Servlet, JSP, JINI, RMI, Swing
Scala, Scalatra, Jetty, Akka, Hibernate
Java 7/6/5/1.4 (11yr)
Scala (1.5yr)
Windows/.Net Framework
Visual Studio 2013/2012/2010/2008/2005 TFS
.Net 4.5/4.0/2.0/1.1
LINQ, TPL, Lambda, Generics, Multithreading, ASP.Net, Web Services, ADO.Net, Windows Services, Windows Form
C# 4.0/3.5/3.0/2.0 (12yr)
C++ (4yr)
C (5yr)
Database
SQL Server 2012/2008/2005
Oracle 10g/8i/7
MySql, PostgreSql
AWS Redshift, Kognitio (MPP)
Cassandra, HBase, AWS Simple DB, Thrift, Protocol Buffers
DB Design, DB Administration, DDL, DML, Relations, Views, Stored Procedures, Triggers, Dynamic SQL,Import, Export, Bulk Load, Backup, Clustering, Replication
OLTP, OLAP, Data Mart, Partition
SSIS, SSRS, SSAS
Storage design, Column family design, Index design, Serialization design
T-SQL 2012/ 2008/2005 (10yr)
AWS Redshift, PostgreSQL (1.5yr)
PL/SQL (3yr)
MySql (1yr)
BigData Computing Cloud Computing
AWS, Google Apps, Microsoft COSMOS
EC2, VPC, EMR, S3, SimpleDB, Elastic Beanstalk, RDS, SNS, SQS
Google storage, Google API
Hadoop(5yr), HDFS, PigLatin, Hive, Cascading
Scope
Distributed Computing
.Net/Windows, Java/Linux/Ubuntu/CentOS/RedHat/Solaris
HPC, Data Mining, Text Mining, Information Retrieval/Extraction, Index/Metrics/Report building, Machine Learning, NLP
Client-Server, Publisher-Subscriber, Web Services, SOAP, RPC, REST, Remote, RMI, JINI
MSMQ (2yr)
SQS, SNS(2yr)
SQL Broker(1yr)
SQL Server Replication(2yr)

SKILLS:

People Management: Onboarding talent, mentoring and supervising senior SDEs, Overlooking, guiding and assisting offshore team.

Software Engineering: Data Structure, Architecture Design, Algorithm Design, Performance Optimization, Full SDLC, Agile and Waterfall Development, TDD, Test automation, Architecture documentation, High/Low Level design specs, Deployment documentation, troubleshooting and operation documentation.

PROFESSIONAL EXPERIENCE:

Technical Architect

Confidential

Responsibilities:

Actively onboarding new talents for team growth. Managing and mentoring senior software engineers.
Overlooking offshore development team. Overlooking concurrent development of multiple large scale distributed big data projects.
Providing technical guidance, suggestions and recommendations to development teams.
Leading development from approach evaluation, proof of concept and prototype all the way to production and delivery. Benchmarked Facebook Presto, Cloudera Impala and Hadoop Spark. Delivering a real-time big data ETL and hosting application to enable direct connection from analytical cubes to Hadoop data warehouses. Building the 2nd version of core engine, API and abstract framework with Scala, Java, C# and T-SQL. Presently improving performances for Google DFA, DBM, DCM reports and BigTables in Google BigQuery.
Performing architecture, design and code reviews for various data acquisition projects. Collaborating engineering team with operation and product management team on product roadmap, requirement analysis, functional specification, delivery schedule, deployment instructions and product triage.
Proposing and influencing critical executive technical decisions.
Leading technical advancement for the team. Exploring novel technologies, providing proof of concept work, prototype applications and core functionality libraries for all data acquisition products. Giving lectures and trainings for cutting-edge technologies to global engineering teams and interested parties.
Designed and leading development of a high performance generic data acquisition framework for realtimebigdata ETL services. The SOA based SaaS framework has been implemented in various technical stacks for various platforms, including .Net for Windows, Java/Scala for Linux and has been extended to process over 30 data sources provided in XML, JSON, CSV, binary and unstructured format with terabyte data flow volume. The load balanced, automatically scalable, fault-tolerant framework is implemented in Amazon cloud. The infrastructure is designed in a highly abstract way with all common functions encapsulated in base classes, reusable routine functions assembled in utility packages and APIs exposed to concrete implementations for diverse data sources so that it’s easily extensible for new data sources with little effort. Sophisticate quality control and monitoring mechanism are made possible by integrated logging mechanism and built on central driving databases.
Performed exploration, prototyping and core functional client utility development for multi-terabyte massive parallel processing databases, including distributed Amazon Redshift data warehouse and Kognitio in-memory analytical data platform. Exploring NoSQL data warehouse storage in Hadoop. Discovered and initialized mechanisms to utilize MPP and NoSQL databases through ODBC, JDBC, T-SQL, Linked Server and SSIS packages. Enabled .Net, Java and SQL applications to utilize MPP data warehouses as central storage and leading team development of such applications.
As first concrete utilization of the framework, designed and led development of a high performance, flexible system for subscribing, retrieving and processing Google Dart Display data (DFA) using Java. Spring Framework, C#, Linq, T-SQL, XML and XML streaming transformation. The pipeline initially handled daily load of 85000 DFA reports, totaling 20GB data volume in two hours. The pipeline is now handling 200GB daily data feed.
As subsequent concrete utilization of the framework, designed and led development of data processing pipeline of Google Dart Search data using C#, WCF Restful services. A metadata driven mapping algorithm was developed to handle about 100 data format variations in Dart Search data feed. The central storage of Dart Search has 8 terabyte data in shared memory across 10 cluster nodes, each with 88 effective computing units. The pipeline is handling 500GB daily data feed and is capable for 10TB/day throughput. This product is a complete bigdata re-architecture of a previous Java project developed by a struggling team using conventional technologies. I was loaned to help the team resolve tough technical challenges such as data format variation, throughput and scalability and was able to provide solution with working proof of concept Java code in two days. For the conventional solution, I introduced SQL ranking and window functions in MySQL stored procedures to greatly simplify data warehouse business logic.
As an extension of the framework, designed architecture of realtime data processing pipeline for Google Doubleclick Bid Manager logstream, performed BI data analysis in AWS Redshift, implemented Google API clients and transferred to offshore team for development. Overlooked and assisted offshore team for successful delivery of DBM data pipeline, as well as other data processing pipelines. The pipeline processes 4-10 billion records per day.
As a pioneer of the framework single-handedly developed a metadata-driven data synchronization application for Microsoft/Facebook Atlas aggregated data. The application is so flexible that it had experienced several database level, table level and column level schema changes in target SQL Server database without a single code change in C# or T-SQL. The metadata driven algorithm uses Linq to Object, Linq to SQL to map web service API XML response and database information schema, and uses dynamic SQL to generate SQL commands for data merge. It reduced over 20 stored procedures to 1 that can handle all different types of data merge. The multi-threading infrastructure made it 20x faster than its prototype. Developed unprecedented quality control monitors and automatic error correction for the system.
Developed a series of massive processing applications for Microsoft clickstream using Amazon Elastic MapReduce, Java Cascading, PigLatin plus Amazon SimpleDB and SQS. In one applicationrewrote 10000 lines of legacy code with 12 lines of PigLatin code while achieved 20x throughput. In other applications rewrote legacy EMR code with novel algorithm and achieved 100x processing throughput. The improved performance converts to daily Amazon EMR cost reduction from $1000 to $26.

Sr SDE, Sr BI Developer, SDE

Confidential

Responsibilities:

Developed BI data warehouse for MicrosoftOnline reporting services integration CRM, Product Studio and Customer support databases. Optimized performances, fixed database defects and created various SSIS, SSRS packages and SSAS cubes.
Development and test automation of AdCenter BI pipeline and SSRS reports, SSAS cubes.
Designed and built MSN Site Search query relevance quality pipeline using Cosmos/Scope for cloud MapReduce, C# LINQ, Regular Expressions, T-SQL and PowerShell scripts. Pipeline throughput is 4TB/hour/channel,200TB/day. Pipeline fully automates MapReduce job in cloud, data loading in servers and aggregation in OLAP cubes.
Provided business intelligence intensive mining of Microsoft search and advertisement click-through logs, at peak throughput of 700TB/day in Cosmos. Produced various BI report datasets, charts, scorecards, trends and analysis results.
Built and managed key measure data mart for Bing Search, MSN and MSN toolbar query, in hundred gigabytes scale, for BI analysis, reporting and search relevance monitoring. Built pipelines and automations for click stream log mining data flow with 20TB/day to data warehouses and OLAP cubes with C# applications, massive parallel computation scripts, T-SQL tables, views, cubes, stored procedures, ETL packages, Agent jobs, PowerShell scripts, Windows Scheduler Tasks and Windows Services.
Developed a database automation layer to facilitate business analysts to correct and adjust Bing and MSN KPI results. The layer incorporates distributed queries, views, procedures, SSIS packages and SQL CLR extensions.
Discovered and fixed bugs that influence executive decisions and bugs that confuse end customers. Recovered data loss for years and extended data mining pipelines for ad-hoc usages. Backfilled historical data with renewed formula, algorithms and techniques. Resolved numerous compatibility issues to enable backfill and performed backfill job of 7000TB data.
Optimized performance of bigdata computing tasks. For one system, throughput was boosted from 100GB/hour to 20TB/hour.

Software Engineer

Confidential

Responsibilities:

Own the entire back end for Confidential ™ web services, one of Microsoft's biggest web services with 7 billion hits per day.
Involved in development and internationalization of Confidential Filter, FAQ and feedback web pages.
Exporting Confidential Phishing and Malware protection data to Internet Explorer, Hotmail, Outlook and Family Safety. The backend was able to work reliably for 2 years without a single issue, despite rapid data volume growth.
Resolved bugs that hindered production for months during first week in position. Identified and resolved bugs that cost two years' investigation in first month. First worldwide delivery of IE 8 features was made in 3 months of waterfall development.
Developed better designs, algorithms and remolded major database features with new technologies. Delivered over 20 new Confidential backend features to world-wide production. Developed SSIS packages to ease data transformation and migration in database deployment and upgrade. Developed Windows services in C++ and C# to import XML data into database and export data from database in XML and binary resource format.
Achieved 20x to 100x faster data mining speed, completely recovered data loss due to improper business logics, corrected incomplete data mining business logics for every data export feature.
Achieved 3x faster end to end data processing speed by introducing external CLR functions and stored procedures, partition, materialized views, index remodeling, performance tuning. Reduced data read row count per processing from 2.9 billion to 250 million. Extended database to handle 40 times more data beyond its designed capacity.
Completely formulated BI logic. Identified and resolved incorrect implementations of BI logic in existing codebase. Achieved best data quality in backend output.
Own the Confidential statistics and logging service database; managed and troubleshoot the logging database. Managed databases in terabytes scale.
Team development of an acquisition and detection platform of malicious software, virus and spam for Confidential using C# 2008, LINQ and T-SQL 2008. Unblocked the development team immediately upon joining the team by providing solutions to tough technical challenges. Introduced lots of novel technologies and solutions to the team. Developed terabyte scale "bullet-proof" (quoting tester's judgment) binary storage component in C# for the platform.

Senior Java Developer

Confidential

Responsibilities:

Worked on distributed search systems, deep parsing and natural language processing of semantic real-time web data. Worked on machine learning, trend analysis and top detection.
Developed Java applications using Apache open source technologies and Google Protocol Buffers. Extended the fault-tolerant distributed systems to operate on large data sets.
Replaced web document caching subsystem using Berkeley DB persistence with high efficiency, fault-tolerant Cassandra database persistence in the first two weeks in position.
Designed and implemented prototype pipeline of a web entity popularity indexing service system using Hadoop, MapReduce, Pig, HBase, Cassandra and Google Protocol Buffers. The pipeline incrementally aggregates raw data feed stored in HDFS with MapReduce jobs submitted through PigLatin scripts scheduled and parameterized with Python and Shell scripts to HBase, executes various data mining algorithms on HBase regions to generate popularity indexes with HBase Query Language and PigLatin script and store indexes to Cassandra database using Google Protocol Buffers for quick retrieval by query services. The pipeline is able to process data volume in terabytes scale. Several distributed database storage mechanisms were created to minimize work set volume for high efficiency data manipulation, including region partitions, surgical designed column family indexes, row and column filters and HBase storage utility for Pig scripts.

Web Application Engineer

Confidential

Responsibilities:

Coordinated team developments of B2B .Net web services.
Developed the kernel of a novel internet search engine for B2B application.
Managed developments of web service back-ends.
Trained team for advanced .Net, web and database programming techniques, design patterns and skills.
Profiled, fine-tuned and redesigned legacy .Net 1.1 applications with high-performance algorithms and architecture.
Improved web services response speed by 20x. Conducted QA, NUnit and ACT tests implemented in C#, Javascript, VB Script, PHP and Action Script.
Performed software troubleshooting for all applications.
Developed dynamic real-time web services health monitoring and reporting services using Ajax, Asp.net, CodeDom, Reflection, Cryptography, Dynamic Invocation, Web client and SMTP client techniques. Developed interactive graphic real time database performance and health monitoring, rendering and reporting web applications using Action Script 3.0 and Adobe FlexBuilder 2.
Developed a number of windows desktop applications, utilities and services, including application distribution automation using Secure FTP, MSMQ, Assembly, MS-Build, ASP.NET and Web Services; Windows service loader, controller and monitor, Web service client toolkit, XPath/XQuery toolkit, Regex builder, NUnit Test toolkit and various user applications, controls and libraries; multithreaded RSS and web content retrievers.
Performed database development and maintenance tasks for SQL Server 2000 and 2005 Express clusters, including update, backup, data transformation, statistics and integrity enforcement. Created lots of tables, views, functions, XMLDB functions, stored procedures, jobs and DTS packages.

Academic Research Developer

Confidential

Responsibilities:

Developed automation software for DNA sequencing platform. The C# program (300,000 lines of code) was used to control and monitor digital motor, power supply and DNA sequencing data acquisition instrument.It implemented various numerical modeling and simulation functions as well as complex data analysis functions. Database techniques were used for data management and user management.
Developed a general application program for PC-controlled power supply instrument. The C# program was used to generate analog output signals for in-house made multi-channel power supply boxes used in various laboratory automation applications.
Modified and upgraded several automation software packages for some research facilities for their enhanced functionalities. This involves programming in C#, Labview, VB/VB.net, C++ and SQL.
Provided technical support for a team of 20 bioinformatics researchers. Ensured convenient experiment data collection and organization using centralized data warehouse.
Managed various software development and research projects in bioinformatics. Mentored post-docs, doctorate candidates, masters and graduate students.

Research Assistant

Confidential

Responsibilities:

Developed large-scale web-based distributed numerical computation software used for nano-size magnetic material modeling research.
Application involves programming in Java, C#, XML, PL/SQL, SQL plus and stored procedure using J2EE, Web Services, various data processing, mining and visualization techniques.
The program solved thousands to millions of simultaneously cross-coupled partial differential equations governing the motion of magnetic spins in a material sample.
The program implemented various PDE integration methods such as Euler, Runge-Kutta and Gauss-Seidel methods.
Computation tasks were distributed among scalable heterogeneous web servers running various operating systems, including Windows XP/NT, Linux, Unix, and Solaris. High performance parallel computing was implemented in C# and optimized for most time consuming sections and distributed among several .Net servers.
Computing tasks were created, submitted, performed, stored and retrieved online. Standalone computing engine implemented in both Java and C# was also available for offline computing. The project is a XML based web service supported by Oracle 9iAS(jsp interface) IIS(asp interface) and Oracle 9i database.
User interface and Application tier implemented with J2EE and JAXB technology. Computation engine implemented with Java, C# and XML technology. Communication between computation engines is based on SOAP and XML-RPC. Input/output and configuration are stored in XML text format for cross-platform processing.
Developed several numerical modeling applications, starting from theoretical concepts, using Java, C#, C++, FORTRAN, MATLAB, Mathematica and a variety of numerical algorithms for Linux, Unix and Windows platforms. The numerical computation works involve PDE, FDE, Finite Difference/Element Methods, Fourier Transform analysis/application, transportation equations and energy minimization.
Designed, built and administered Oracle 9i databases for modeling data from distributed data sources. Administrated numerical computing network andheterogeneouscomputer clusters, including Windows XP/2000 and Linux workstation Clusters, HTTP servers, FTP servers and IIS, Oracle 9iAS.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship