Data Science And Machine Learning Lead Resume
New York, NY
SUMMARY
- Experience in Algorithmic Trading, Backtesting, Low - Latency Trading Algorithms and Limit Orderbook Analysis.
- 10+ years of experience in Machine Learning, Data Science, Statistical Analysis, Optimization, Mathematical modeling and High Performance Computing Research.
- Experience building High performance Compute Clusters, GPU application development, Low Latency designs via Xilinx FPGAs and configuring and installing Linux and Hadoop Cluster nodes.
- 5 years of experience in full Big Data stack, Hadoop, Spark, NoSql, Data driven analysis.
- Experience in Deep Learning frameworks, (Python Scikit, CUDA-CUDNN, TensorFlow), Natural Language Processing Tools (Python-NLTK)
- 5+ years of application development and project management in Java, C/C++.
- Languages: Scala, Java, Python, C/C++.
- Extensive parallel programming experience via MPI, CUDA and PThreads, Java, Scala, large-scale performance tuning and optimization, performance bottleneck detection and improvement.
TECHNICAL SKILLS
Machine Learning Frameworks: TensorFlow, Spark-MLLib, Python-Scikit.
Big-Data Processing: Hadoop, Spark, HDFS, Zookeeper, Hive, (Cloudera, MAPR, Apache)
DataFlow: Kafka, ELK - Elasticsearch, Logstash, Kibana NoSQL MongoDB, Cassandra, Accumulo, HBase, Redis.
Natural Language Processing: Google Word2Vec, Python-NLTK, Python-VADER High Performance Computing Cluster Computing, CUDA Graphics Processing Units (GPU) Computing. Languages Scala, Java, C/C++, Python, Matlab, PThreads, SSE, MPI parallelization library, NVIDIA-CUDA. Cloud/Elastic Computing Docker, Amazon Elastic Compute Cloud (EC2) service.
Scripting: Linux Bash scripting, Torque/PBS Job Scheduler RDBMS MySQL, Oracle
Operating Systems: Cluster Management via CentOS Rocks, Linux, Unix.
Low-Latency Design: Xilinx FPGA ISE Design Suite and Vivado, Verilog Cluster Computing
Administration and Configuration: Built, configured, Linux Cluster with Infiniband network support for research via Heterogeneous High Performance Computing with GPUs and FPGAs.
PROFESSIONAL EXPERIENCE
Confidential, New York, NY
Data Science and Machine Learning Lead
Technologies: Python Tensorflow, C/C++.
Responsibilities:
- Analyze the market microstructure and FIX messages in the Low-Latency regime to reconstruct the Limit-Orderbook model and identify optimal trading window via Machine Learning.
- Design and backtest trade execution strategies via Reinforcement Learning and Q-Learning for optimal order placement and Minimize Volume Weighted Average Price (VWAP) constrained to a market participation rate.
- Designing Machine Learning based algorithm to predict large movements in ask/bid prices, ask/bid volumes and spreads.
- Implemented Stochastic Volatility models via Bayesian Inference to quantify short term behavior for Equity and FX markets.
- High Frequency Time-series analysis and anomaly detection.
- Trade log message ingestion via Kafka combined with Apache-Nifi.
- Developed and deployed Machine Learning algorithm to detect anomalies in trading patterns .
- Designed Lambda Architecture via Spark-MLLib and Spark Streaming for real-time anomaly detection.
- Designed Global Limit Management Compliance for electronic and algorithmic trade data stored in Greenplum database via MAD-Lib Machine Learning Framework.
- The framework is able to detect anomalous trades that do not comply with firm-wide policies on trading limits.
Senior Big Data Engineer
Confidential, New York, NY
Skills: Accumulo, Spark-MLLib, H2O, Hive, Scala, Python, Linux Shell Scripting (Bash, AWK and SED), Salt - cluster management framework.
Responsibilities:
- Built and designed Accumulo NoSql database schema for migration of tables from Hive to NoSql database in order to enable business analytics team to perform complex joins on large-scale data.
- Performed large-scale data analytics of ingested data on Accumulo and built a predictive model via Spark MLLib (Machine Learning) toolbox via Scala and Python.
- Performed deep learning on large volume of impression data via H2O-Sparkling water in order to make accurate predictions for client-side billing.
- Developed Kafka client application running within Docker environment to collect and aggregate impression data collected from a variety of endpoints and set-top boxes.
- Wrote custom Iterators and Combiners in Accumulo for server-side pre-processing of data before it is transferred to the client in order to optimize network bandwidth.
Confidential, Philadelphia, PA
Big Data Consultant
Skills: Java, Map-Reduce, Spark, Scala, Linux, Shell Scripting, YARN, Zookeeper, HDFS, HBase, Sqoop.
Responsibilities:
- Built, installed and configured Big-Data processing cluster via Cloudera Hadoop.
- Implemented Value At Risk (VAR) Calculation for risk analysis via Spark and Monte-Carlo Analysis.
- Designed recommendation engine for targeted advertising via consumer credit score index and risk data via HBase in Spark.
- Designed HBase schema to organize and store large-scale credit swap dataset in ED W for further processing. Optimized and Partitioned database table for performance.
- Led a team of 3 through Software Development Lifecycle (SDLC) via Agile process with adherence to project timeline, budget and deliverables.
Confidential, Horsham, PA
Big Data Consultant
Skills: Java, Hadoop, Linux, YARN, Zookeeper, HDFS, HBase, Sqoop. Avro, Pig, Hive, HiveQL
Responsibilities:
- Migrated enterprise data from relational database to HDFS using Sqoop.
- Used Avro file format to serialize and compress data for faster I/O for batch jobs.
- Performed Map-Reduce operations using Pig scripts using proprietary model for competitive insurance pricing.
- Designed Hive table to store millions of customer insurance and event history which are updated periodically.
- Analyzed customer event patterns stored in Hive Tables using HiveQL for claims processing to enable better decision making.
- Drove and designed the division's Big-Data architectural strategy and platform.
Assistant Professor
Confidential
Skills: MongoDB, Python-NLTK, Python-VADER, NLP Sentiment Analysis
Responsibilities:
- Performed automated web scraping of hundreds of thousands of news articles, blogs and posts from publicly available news archives via Python.
- Automatic document parsing and extraction via Python-BeautifulSoup and filter documents pertaining to topics “economy” and “finance”.
- Stored documents in MongoDB for easy indexing and access for machine learning algorithms.
- Performed sentiment analysis via Python-NLTK and named entity recognition via Python-VADER and Latent Dirichlet Allocation(LDA) .
- Correlated the document sentiment to stock prices of the named entities in the articles and establish causality if the correlation is statistically significant.
- TCP/IP packets were continually scanned by software sensors on routers over large-scale networks.
- The timestamp and location of detected malware are reported to the Big-Data processing stack.
- The novel hierarchical module detection algorithm then localizes the source with the knowledge of network topology.
- The MPI based algorithm runs on distributed nodes alongside Map-Reduce for source detection.
- The resource allocation is managed via Apache Mesos.
- Currently studying applications to (i) Infectious disease spread modeling, (ii) Spreading of rumors in social information networks.