We provide IT Staff Augmentation Services!

Chief Data Scientist/architect (director) Resume

3.00/5 (Submit Your Rating)

Greater Boston, MA

SUMMARY:

  • Building deep learning models for image generation, processing, classification, natural language processing (NLP) systems for text summarization, document cross - referencing, document enrichment, text simplification.
  • Working with NLP oriented libraries like Gensim, NLTK for language pre-processing, Application of word embedding and document embedding using NLP models like Word2Vec and Doc2Vec, sentiment analysis, assessment of document similarity and with Keras deep learning library.
  • Developing an information retrieval system based on NLP/Machine Learning technology to enhance search beyond primitive key word search.
  • Utilizing Jupyter notebooks for model development.
  • Apache Spark with Scala and R-studio, Keras/API with Tensorflow, PyTorch and Python, statistical ML, development of analytics within Big Data, Machine Learning domains. Massively parallel data analysis, NVIDIA GPU 12,500+ CUDA core AI-appliance, predictive model development and validation.
  • Big data analytics, EC2 cluster computing, Spark Streaming, Kinesis, AWS S3/HDFS/Spark Data Lake architecture/implementation.
  • Set up AWS VPC clusters, internet gateways, security groups, utilized AWS Glue (Crawlers, metadata catalog) for ETL processes.
  • Smart contract development with Ethereum Solidity/Nodejs javascript.
  • Development of analytical software in defense, financial, climate domains.
  • Experience in equities, fixed income, time series analysis, market tick data analysis, real time analytics.
  • Terabyte scale data analysis. Past work in Digital Signal Processing, Image Processing.

EXPERIENCE:

Confidential, Greater Boston, MA

Chief data scientist/architect (Director)

Responsibilities:

  • Applying Deep Learning/AI technology to solve business problems in image classification, facial image recognition, facial spoofing counter-measures, generation of synthetic visual data using GANs for model training. Keras, PyTorch with python.
  • Have also worked with financial, cyber security, document retrieval (NLP) domains.
  • Facial image recognition, anti-spoofing deep learning networks to recognize attempts to display facial images on tablets, phones, as well as physical masks. Python/Keras
  • Brain lesion image classification, binary classification of two types of lesions, requiring different approaches for management. Python/Keras, Resnet50, Resnet101.
  • Provide business analysis for data lake design/requirement from analytics/data science point of view, based on AWS S3, AWS Glue. Collaborated with team members (customized of ETL scripts, enrichment of meta data Glue catalog, erc...), to load customer's data warehouse, Hadoop/Spark based AWS EMR analytics (financial data, weather data, cyber data clean up, predictive analytics). AWS Redshift warehouse, Redshift Spectrum, Kinesis data streams, firehose, SQL.
  • Working with financial data, price data, compliance data, cyber data, Apache Spark/Hadoop HDFS, Scala, Python, R-studio(R), DPLYR R API with Apache Spark,Amazon Web Services (AWS), S3, EC2 Cluster computing. Statistical Machine Learning (ML), time series analysis, Fourier transform analysis,, analysis of cyber security big data. Nvidia GPU cluster, deep learning (DL), convolutional neural networks (CNN), residual networks (ResNet), PyTorch, OpenCV, Keras API for Tensorflow, SVM, multiple linear regression, Predictive Model development and validation, Word2Vec (NLP), Doc2Vec (NLP) models using Gensim Python module, LSTM, GRU. Using NIH 27.6 million document corpus for training the LDA (NLP), Doc2Vec models. Document similarity ranking based on Doc2Vec Euclidean distance measures.
  • Utilized deep learning models based on Keras for text classification, clustering, sentiment analysis. Tuned hyperparameters for Doc2Vec, Word2Vec embedding models.
  • Development of smart contracts (ERC20 digital tokens/coins aka.
  • Cryptocurrency, ICO sales contracts) for the Ethereum distributed ledger with Solidity. Extensive experience writing unit tests using Truffle/web3.js framework for the Ethereum blockchain, running two full Geth client nodes in-house.
  • Developing Go applications to interact with Ethereum Geth client.
  • Working with node.js. Have advised multiple companies for their ICOs. Through understanding of public/private key systems. Multiple smart contracts deployed live to Mainnet.
  • Provided healthcare related consulting services & worked on a super fast OMOP Cohort Builder for a Japanese Oncology Firm based in Cambridge, MA. R, Spark, Scala.
  • Providing custom development of analytics for financial, defense, security domains. Working with embedded GPUs, Maintaining C/C++ APIs for DSP/Image processing libraries.

Confidential, Billerica, MA

Principal Data Scientist/Quantitative Analyst (Cyber Security)

Responsibilities:

  • Apache Spark, Scala, Python, limited Rstudio(R), Amazon Web Services (AWS), EC2 Cluster computing. Statistical ML, time series analysis, Fourier transform analysis, distributed force directed graph visualization, analysis of cyber security big data. Nvidia GPU cluster, deep learning, neural networks, Digits(R), Theano, SVM, multiple linear regression.

Confidential, Westborough, MA

CTO/Co-founder

Responsibilities:

  • Development of software products and associated consulting services in the financial and defense industries, price feeds, asset valuation, etc... C/C++ quantitative data analysis. POSIX Threads, distributed processing, IPC, Digital Signal/Image processing, asynchronous programming model.

Confidential

Data Scientist/Analyst

Responsibilities:

  • Used Python with HADOOP under Ubuntu OS on Amazon EC2, Amazon S3, EMR (Map/Reduce data modeling) to parse, clean up, analyze NOAA weather data.
  • AWS command line interface CLI.
  • Analyzed behavior of the agricultural maize growth model CERES under volatile temperature conditions.
  • Developed a report about different approaches for predicting corn growth stages.

Confidential

Software Architect/Data Scientist

Responsibilities:

  • An online visual knowledge base repository service integrating web pages snapshots, videos, images, mp3, document files. Integrated with social network services for content filtering and curation.
  • The portal is an SaaS application which is intended to help users to find similar, relevant documents, store them, and refine the content based on other users’ feedback. Allows end-users to add ad hoc links to web documents to connect related content. Javascirpt has been used extensively over two years to implement DOM manipulation.
  • Developed NLP analytics in Python to assess similarity of English medical text documents, web pages, etc...
  • Used Apache Spark to analyze link data to gain an insight into how different pages are related.
  • Defined specifications, managed off-shore software developers for the project.

Confidential

Data Scientist/Developer

Responsibilities:

  • Conducted price data research on multi-terabyte tick database to detect insider trading based on aberrant price movements to protect the company from taking large, market neutral but, risky positions in equity groups.
  • Utilized C/C++ and APL for prototyping data analysis algorithms.
  • Researched and implemented in software Big Data financial analysis algorithms.
  • Developed code to process and analyze tick by tick data for Confidential .
  • Provided analytics for real time assessment of risk.
  • Responsible for data and algorithm validity (model validation) for CMS Spread Options, CMS Caps/Floors, CMS Swaps, BMA Caps/Floors, BMA Swaps.
  • Implementation of new valuation models. Defined test methodologies, interfaces, data structure mappings, etc…
  • Dealt with business issues like cash flows, business calendars, currencies, etc…
  • Collaborated with Convexity Personnel to establish testing protocols, optimized code for speed.

Confidential

Data Scientist/Developer

Responsibilities:

  • Provided consulting services with respect to Mandatory (with/out options) and Voluntary Corporate Actions. Analyzed what procedures need to take place for reconciling fund manager portfolio view with custodian’s portfolio.
  • Examined Bloomberg’s corporate action files examples to affect legacy code modifications.
  • Created programs to automatically populate Front Arena database with CFD (contract for difference) instruments for all equities traded on British, Irish exchanges using SSgA pub-sub interfaces to enable fund managers in US and UK to trade, track and price CFDs.
  • Created programs to automatically populate and keep updated SunGard database with all international calendars.
  • Created programs to define and load issuers, issues, foreign exchanges, end of day pricing, corporate actions, issue classifications for ICB, GICS, Lehman(Bonds), SSgA classification hierarchies in XML format for loading them into SunGard FrontArena product.
  • Modified SSgA pre-existing pub-sub Java code to utilize RKS issue key instead of Reuters Instrument Code (RIC) for the SSgA internal hedge fund.
  • Developed Sybase stored procedures to support pubsub code.

Confidential

Data Scientist/Developer

Responsibilities:

  • Subcontractor to Boston Stock Exchange (BSE) and Boston Option Exchange (BOX)
  • Participated in analysis meetings with Boston Exchange personnel, provided solutions for various trade - through surveillance problems.
  • Analyzed Confidential data and code to determine validity of BBO (Best Bid/Offer) and NBBO (All exchanges BBO combined) computations. Identified erroneous NBBO computations and collaborated with BOX and Confidential personnel to correct the model.
  • Provided code and time series data analysis and advice on processing trade-through transactions for the purpose of compliance monitoring of real-time OPRA and other feeds. Helped debug time-related discrepancies between BOX code and Confidential code.
  • Participated in meetings with BSE/BOX personnel, provided analysis of BOX/BSE requirements. Environment: Linux/C/Python/Streaming Software.
  • Provided software consulting services for a small hedge fund proprietary trading system. The automated system trades by establishing hedge positions within different industrial sectors. Developed real-time event manager, the trading modules are able to register certain events (socket descriptors ready to be read or written, timers going off, exceptions occurring, etc ) with the event manager to enable various callbacks to be invoked when events occur. Defined standards for other developers for creating robust, fast, dead-lock free code. Developed various C++ classes for resource locking by various system threads. Participated in code reviews. Debugged different parts of the system. Optimized multiple C++ modules to improve execution speed. Optimized critical sections of code to achieve real time performance. Optimized communication interfaces to effectively interleave feeds from different sources. Utilized shared memory APIs to enhance performance. Provided analysis on how best to break up some analytical code into different parts to obtain optimal execution time. Utilized POSIX thread libraries under RedHat Linux and Unix (SunOS) to implement various codes. Developed relevant parallel processing algorithms.
  • Developed a container for a client, to augment C++ STL library, based on trie data structures. This container implements an associative container based on a STL string object as a key, allowing insertion and retrieval/search of data within strlen(key) time complexity. The maximum depth of the trie is the maximum key size. The structure is self-sorting, i.e. new insertions occurs exactly in the right place. The project was done under RedHat Linux using g++ with some shell scripting for building the software.
  • Developed a highly efficient program to parse a customer web log, containing duplicate customer IDs and visited web pages in order to analyze statistical occurrences of three (3) page sequences which a customer visited. Utilized STL map and STL vector containers to implement this solution.
  • Developed a product called Global. Global is a parallel processing middle-ware software libraries enabling multiple heterogeneous UNIX workstations to be used as a parallel computer. Defined specifications for this product, developed it, debugged it and maintained it until the IP rights were sold. Global also enabled multi-threaded applications running on remote nodes to seamlessly invoke threads on the user workstation to service application input/output requirements. Enabled user workstation resources to be accessed simultaneously by plurality of remote machines under protection of mutual exclusion locks (mutex). Ported this software to IBM Unix (AIX), Sun Microsystems Unix (SunOS), Red Hat Linux. Utilized posix thread libraries for unix client stations.
  • Contracted by IBM Microelectronics to develop five different optimized analytical/mathematical software libraries for PowerPC chip. Packages included elementary mathematical functions, linear algebra, image processing, etc Resulting code performance for elementary functions exceeded IBM and Motorola's software. IBM microelectronic purchased some 250 licenses for the complete software. Utilized Windows NT and Linux for this project.
  • Developed a parallel heuristic network optimization tool on a farm of processors. This application was developed in C++ for optimizing various communication networks, like data networks, gas pipelines, electric distribution grids, etc The goal of the software was to find a good solution whereas finding the perfect solution was theoretically and practically impossible. Developed this project in C++ utilizing a third party middle-ware (EXPRESS) as the programming base. The program operates by testing what happens with overall cost of the system if one or two links in the grid are removed, different scenarios are explored in parallel, the more optimal is chosen. If the cost is reduced the solution is kept and the program proceeds to experiment further. Either a windows front-end host computer is utilized or a unix workstation with X windows server for user display services.
  • Wrote a parallel-ised option valuation application for computing values of American stock options. Applied a set of 8 processors to a problem of computing an expected value of an American option. Utilized a Cox, Rubenstein binomial tree approach, whose branches were distributed to different processors and then recombined in the end to obtain the final value. The program runs on an embedded system connected to either Windows or Unix workstation (AIX, SunOS, Linux).
  • Developed an analytical vector software library for Texas Instruments processors, developed new algorithms which were specifically tailored to the Texas instruments processors by utilizing Chebychev expansions for elementary functions. This package contains over 300 different analytical functions to operate on time series data. This package has been extensively utilized by various military agencies and contractors like Naval Underwater Warfare Center, Bettis Labs, IBM, ATT, Rafael, as well a lot of foreign companies in Japan and Europe. Developed a 300 page comprehensive user manual for this software. Either Windows or a Unix client station was utilized as a front end to an embedded system running this software.
  • Developed object oriented (OO) C++ interface to the analytics processing software above to enable customers migrating to C++ from C to still obtain great performance which is offered by this software.
  • Developed Image Processing software package for Texas Instrument's processors. Wrote close to 500 different functions to analyze photographic image for quality control systems, for missile defense systems, and other security applications. Wrote critical sections of software in a native assembler to obtain maximum performance. Developed a 600 page user manual for this software package. Some of the things the software does are 2 dimensional fast Fourier transforms, edge detection, image sharpening, filtering and other types of analysis. The software is very flexible, it is designed to be re-entrant, interruptible, relocatable and very fast as well as allowing an application programmer to operate on image subsets.
  • Created an object oriented interface (OO) to the image processing software in C++. Developed various C++ classes to create images and to perform various operations from the above software library, therefore allowing our customers to migrate to C++ still utilizing our robust and fast software. The class libraries running Unix or Windows enable seamless use of embedded accelerator hardware. Wrote scripts to set up the system and to pre-load embedded code.
  • Developed a code generation Linux tool, based on gcc compiler, to automatically convert conventional Windows applications into client/server Linux/Unix services. The tool creates over 3 million lines of code to describe a communication interface between a Windows workstation and a Linux/AIX/Solaris application server. Required understanding of gcc produced C language parse tree.
  • Developed underlying communications client/server protocols based on TCP/IP or UDP to connect Windows User workstations with UNIX application servers. Utilized UNIX signals to simulate communication failures to test communication interface recovery procedures. Created a client-server interface for remote invocation of threads on user workstation to enable remotely run service applications to access local resources. Implemented locking of local operating system resources using mutexes. Implemented an RPC-like protocol to enable remoting of OS to user front end callbacks. Utilized MS Visual C/C++ studio to develop the code, debug the code, and maintain the code. Used simultaneously MS debugger and gdb debugger under Linux/UNIX to debug communication interfaces to correct any problems with mis-aligned message data.
  • Provided numerous consulting services in unix/windows networking and inter-process-communications (IPC). Provided consulting and code development related to dead-lock free resource sharing by multiple threads or processes. These services were provided for client using different flavors of UNIX, AIX, SUN OS, Linux.
  • Designed a proprietary secure protocol to authenticate and logon from a user workstation to an application server. This protocol eliminates man-in-the-middle security flaw, and removes a need for third party CA (certification authority). Managed a contractor to implement said protocol.
  • Managed several projects to develop optimized analytical linear algebra packages for PowerPC, Pentium, and Texas Instrument's processors. Developed scripts and make files to build these projects.
  • Optimized code for several linear algebra analytical packages, C LinPack, C EisPack, C Blas123. Adjusted code to compile more efficiently for particular processor architectures.
  • Developed C++ Object Oriented (OO) Interfaces to the analytical packages. This code permits a C++ application programmer to operate on matrices naturally with regular mathematical operators. It also controls storage formats for different kinds of matrices. Ported the code to IBM Unix (AIX), Sun Microsystems Unix (SunOS), FreeBSD Unix.

We'd love your feedback!