Data Science and Machine Learning Lead Resume New York, NY - Hire IT People

SUMMARY

Experience in Algorithmic Trading, Backtesting, Low - Latency Trading Algorithms and Limit Orderbook Analysis.
10+ years of experience in Machine Learning, Data Science, Statistical Analysis, Optimization, Mathematical modeling and High Performance Computing Research.
Experience building High performance Compute Clusters, GPU application development, Low Latency designs via Xilinx FPGAs and configuring and installing Linux and Hadoop Cluster nodes.
5 years of experience in full Big Data stack, Hadoop, Spark, NoSql, Data driven analysis.
Experience in Deep Learning frameworks, (Python Scikit, CUDA-CUDNN, TensorFlow), Natural Language Processing Tools (Python-NLTK)
5+ years of application development and project management in Java, C/C++.
Languages: Scala, Java, Python, C/C++.
Extensive parallel programming experience via MPI, CUDA and PThreads, Java, Scala, large-scale performance tuning and optimization, performance bottleneck detection and improvement.

TECHNICAL SKILLS

Machine Learning Frameworks: TensorFlow, Spark-MLLib, Python-Scikit.

Big-Data Processing: Hadoop, Spark, HDFS, Zookeeper, Hive, (Cloudera, MAPR, Apache)

DataFlow: Kafka, ELK - Elasticsearch, Logstash, Kibana NoSQL MongoDB, Cassandra, Accumulo, HBase, Redis.

Natural Language Processing: Google Word2Vec, Python-NLTK, Python-VADER High Performance Computing Cluster Computing, CUDA Graphics Processing Units (GPU) Computing. Languages Scala, Java, C/C++, Python, Matlab, PThreads, SSE, MPI parallelization library, NVIDIA-CUDA. Cloud/Elastic Computing Docker, Amazon Elastic Compute Cloud (EC2) service.

Scripting: Linux Bash scripting, Torque/PBS Job Scheduler RDBMS MySQL, Oracle

Operating Systems: Cluster Management via CentOS Rocks, Linux, Unix.

Low-Latency Design: Xilinx FPGA ISE Design Suite and Vivado, Verilog Cluster Computing

Administration and Configuration: Built, configured, Linux Cluster with Infiniband network support for research via Heterogeneous High Performance Computing with GPUs and FPGAs.

PROFESSIONAL EXPERIENCE

Confidential, New York, NY

Data Science and Machine Learning Lead

Technologies: Python Tensorflow, C/C++.

Responsibilities:

Analyze the market microstructure and FIX messages in the Low-Latency regime to reconstruct the Limit-Orderbook model and identify optimal trading window via Machine Learning.
Design and backtest trade execution strategies via Reinforcement Learning and Q-Learning for optimal order placement and Minimize Volume Weighted Average Price (VWAP) constrained to a market participation rate.
Designing Machine Learning based algorithm to predict large movements in ask/bid prices, ask/bid volumes and spreads.
Implemented Stochastic Volatility models via Bayesian Inference to quantify short term behavior for Equity and FX markets.
High Frequency Time-series analysis and anomaly detection.
Trade log message ingestion via Kafka combined with Apache-Nifi.
Developed and deployed Machine Learning algorithm to detect anomalies in trading patterns .
Designed Lambda Architecture via Spark-MLLib and Spark Streaming for real-time anomaly detection.
Designed Global Limit Management Compliance for electronic and algorithmic trade data stored in Greenplum database via MAD-Lib Machine Learning Framework.
The framework is able to detect anomalous trades that do not comply with firm-wide policies on trading limits.

Senior Big Data Engineer

Confidential, New York, NY

Skills: Accumulo, Spark-MLLib, H2O, Hive, Scala, Python, Linux Shell Scripting (Bash, AWK and SED), Salt - cluster management framework.

Responsibilities:

Built and designed Accumulo NoSql database schema for migration of tables from Hive to NoSql database in order to enable business analytics team to perform complex joins on large-scale data.
Performed large-scale data analytics of ingested data on Accumulo and built a predictive model via Spark MLLib (Machine Learning) toolbox via Scala and Python.
Performed deep learning on large volume of impression data via H2O-Sparkling water in order to make accurate predictions for client-side billing.
Developed Kafka client application running within Docker environment to collect and aggregate impression data collected from a variety of endpoints and set-top boxes.
Wrote custom Iterators and Combiners in Accumulo for server-side pre-processing of data before it is transferred to the client in order to optimize network bandwidth.

Confidential, Philadelphia, PA

Big Data Consultant

Skills: Java, Map-Reduce, Spark, Scala, Linux, Shell Scripting, YARN, Zookeeper, HDFS, HBase, Sqoop.

Responsibilities:

Built, installed and configured Big-Data processing cluster via Cloudera Hadoop.
Implemented Value At Risk (VAR) Calculation for risk analysis via Spark and Monte-Carlo Analysis.
Designed recommendation engine for targeted advertising via consumer credit score index and risk data via HBase in Spark.
Designed HBase schema to organize and store large-scale credit swap dataset in ED W for further processing. Optimized and Partitioned database table for performance.
Led a team of 3 through Software Development Lifecycle (SDLC) via Agile process with adherence to project timeline, budget and deliverables.

Confidential, Horsham, PA

Big Data Consultant

Skills: Java, Hadoop, Linux, YARN, Zookeeper, HDFS, HBase, Sqoop. Avro, Pig, Hive, HiveQL

Responsibilities:

Migrated enterprise data from relational database to HDFS using Sqoop.
Used Avro file format to serialize and compress data for faster I/O for batch jobs.
Performed Map-Reduce operations using Pig scripts using proprietary model for competitive insurance pricing.
Designed Hive table to store millions of customer insurance and event history which are updated periodically.
Analyzed customer event patterns stored in Hive Tables using HiveQL for claims processing to enable better decision making.
Drove and designed the division's Big-Data architectural strategy and platform.

Assistant Professor

Confidential

Skills: MongoDB, Python-NLTK, Python-VADER, NLP Sentiment Analysis

Responsibilities:

Performed automated web scraping of hundreds of thousands of news articles, blogs and posts from publicly available news archives via Python.
Automatic document parsing and extraction via Python-BeautifulSoup and filter documents pertaining to topics “economy” and “finance”.
Stored documents in MongoDB for easy indexing and access for machine learning algorithms.
Performed sentiment analysis via Python-NLTK and named entity recognition via Python-VADER and Latent Dirichlet Allocation(LDA) .
Correlated the document sentiment to stock prices of the named entities in the articles and establish causality if the correlation is statistically significant.
TCP/IP packets were continually scanned by software sensors on routers over large-scale networks.
The timestamp and location of detected malware are reported to the Big-Data processing stack.
The novel hierarchical module detection algorithm then localizes the source with the knowledge of network topology.
The MPI based algorithm runs on distributed nodes alongside Map-Reduce for source detection.
The resource allocation is managed via Apache Mesos.
Currently studying applications to (i) Infectious disease spread modeling, (ii) Spreading of rumors in social information networks.

We provide IT Staff Augmentation Services!

Data Science And Machine Learning Lead Resume

New York, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship