We provide IT Staff Augmentation Services!

Data Science And Machine Learning Lead Resume

3.00/5 (Submit Your Rating)

New York, NY

SUMMARY

  • Experience in Algorithmic Trading, Backtesting, Low - Latency Trading Algorithms and Limit Orderbook Analysis.
  • 10+ years of experience in Machine Learning, Data Science, Statistical Analysis, Optimization, Mathematical modeling and High Performance Computing Research.
  • Experience building High performance Compute Clusters, GPU application development, Low Latency designs via Xilinx FPGAs and configuring and installing Linux and Hadoop Cluster nodes.
  • 5 years of experience in full Big Data stack, Hadoop, Spark, NoSql, Data driven analysis.
  • Experience in Deep Learning frameworks, (Python Scikit, CUDA-CUDNN, TensorFlow), Natural Language Processing Tools (Python-NLTK)
  • 5+ years of application development and project management in Java, C/C++.
  • Languages: Scala, Java, Python, C/C++.
  • Extensive parallel programming experience via MPI, CUDA and PThreads, Java, Scala, large-scale performance tuning and optimization, performance bottleneck detection and improvement.

TECHNICAL SKILLS

Machine Learning Frameworks: TensorFlow, Spark-MLLib, Python-Scikit.

Big-Data Processing: Hadoop, Spark, HDFS, Zookeeper, Hive, (Cloudera, MAPR, Apache)

DataFlow: Kafka, ELK - Elasticsearch, Logstash, Kibana NoSQL MongoDB, Cassandra, Accumulo, HBase, Redis.

Natural Language Processing: Google Word2Vec, Python-NLTK, Python-VADER High Performance Computing Cluster Computing, CUDA Graphics Processing Units (GPU) Computing. Languages Scala, Java, C/C++, Python, Matlab, PThreads, SSE, MPI parallelization library, NVIDIA-CUDA. Cloud/Elastic Computing Docker, Amazon Elastic Compute Cloud (EC2) service.

Scripting: Linux Bash scripting, Torque/PBS Job Scheduler RDBMS MySQL, Oracle

Operating Systems: Cluster Management via CentOS Rocks, Linux, Unix.

Low-Latency Design: Xilinx FPGA ISE Design Suite and Vivado, Verilog Cluster Computing

Administration and Configuration: Built, configured, Linux Cluster with Infiniband network support for research via Heterogeneous High Performance Computing with GPUs and FPGAs.

PROFESSIONAL EXPERIENCE

Confidential, New York, NY

Data Science and Machine Learning Lead

Technologies: Python Tensorflow, C/C++.

Responsibilities:

  • Analyze the market microstructure and FIX messages in the Low-Latency regime to reconstruct the Limit-Orderbook model and identify optimal trading window via Machine Learning.
  • Design and backtest trade execution strategies via Reinforcement Learning and Q-Learning for optimal order placement and Minimize Volume Weighted Average Price (VWAP) constrained to a market participation rate.
  • Designing Machine Learning based algorithm to predict large movements in ask/bid prices, ask/bid volumes and spreads.
  • Implemented Stochastic Volatility models via Bayesian Inference to quantify short term behavior for Equity and FX markets.
  • High Frequency Time-series analysis and anomaly detection.
  • Trade log message ingestion via Kafka combined with Apache-Nifi.
  • Developed and deployed Machine Learning algorithm to detect anomalies in trading patterns .
  • Designed Lambda Architecture via Spark-MLLib and Spark Streaming for real-time anomaly detection.
  • Designed Global Limit Management Compliance for electronic and algorithmic trade data stored in Greenplum database via MAD-Lib Machine Learning Framework.
  • The framework is able to detect anomalous trades that do not comply with firm-wide policies on trading limits.

Senior Big Data Engineer

Confidential, New York, NY

Skills: Accumulo, Spark-MLLib, H2O, Hive, Scala, Python, Linux Shell Scripting (Bash, AWK and SED), Salt - cluster management framework.

Responsibilities:

  • Built and designed Accumulo NoSql database schema for migration of tables from Hive to NoSql database in order to enable business analytics team to perform complex joins on large-scale data.
  • Performed large-scale data analytics of ingested data on Accumulo and built a predictive model via Spark MLLib (Machine Learning) toolbox via Scala and Python.
  • Performed deep learning on large volume of impression data via H2O-Sparkling water in order to make accurate predictions for client-side billing.
  • Developed Kafka client application running within Docker environment to collect and aggregate impression data collected from a variety of endpoints and set-top boxes.
  • Wrote custom Iterators and Combiners in Accumulo for server-side pre-processing of data before it is transferred to the client in order to optimize network bandwidth.

Confidential, Philadelphia, PA

Big Data Consultant

Skills: Java, Map-Reduce, Spark, Scala, Linux, Shell Scripting, YARN, Zookeeper, HDFS, HBase, Sqoop.

Responsibilities:

  • Built, installed and configured Big-Data processing cluster via Cloudera Hadoop.
  • Implemented Value At Risk (VAR) Calculation for risk analysis via Spark and Monte-Carlo Analysis.
  • Designed recommendation engine for targeted advertising via consumer credit score index and risk data via HBase in Spark.
  • Designed HBase schema to organize and store large-scale credit swap dataset in ED W for further processing. Optimized and Partitioned database table for performance.
  • Led a team of 3 through Software Development Lifecycle (SDLC) via Agile process with adherence to project timeline, budget and deliverables.

Confidential, Horsham, PA

Big Data Consultant

Skills: Java, Hadoop, Linux, YARN, Zookeeper, HDFS, HBase, Sqoop. Avro, Pig, Hive, HiveQL

Responsibilities:

  • Migrated enterprise data from relational database to HDFS using Sqoop.
  • Used Avro file format to serialize and compress data for faster I/O for batch jobs.
  • Performed Map-Reduce operations using Pig scripts using proprietary model for competitive insurance pricing.
  • Designed Hive table to store millions of customer insurance and event history which are updated periodically.
  • Analyzed customer event patterns stored in Hive Tables using HiveQL for claims processing to enable better decision making.
  • Drove and designed the division's Big-Data architectural strategy and platform.

Assistant Professor

Confidential

Skills: MongoDB, Python-NLTK, Python-VADER, NLP Sentiment Analysis

Responsibilities:

  • Performed automated web scraping of hundreds of thousands of news articles, blogs and posts from publicly available news archives via Python.
  • Automatic document parsing and extraction via Python-BeautifulSoup and filter documents pertaining to topics “economy” and “finance”.
  • Stored documents in MongoDB for easy indexing and access for machine learning algorithms.
  • Performed sentiment analysis via Python-NLTK and named entity recognition via Python-VADER and Latent Dirichlet Allocation(LDA) .
  • Correlated the document sentiment to stock prices of the named entities in the articles and establish causality if the correlation is statistically significant.
  • TCP/IP packets were continually scanned by software sensors on routers over large-scale networks.
  • The timestamp and location of detected malware are reported to the Big-Data processing stack.
  • The novel hierarchical module detection algorithm then localizes the source with the knowledge of network topology.
  • The MPI based algorithm runs on distributed nodes alongside Map-Reduce for source detection.
  • The resource allocation is managed via Apache Mesos.
  • Currently studying applications to (i) Infectious disease spread modeling, (ii) Spreading of rumors in social information networks.

We'd love your feedback!