Principal Data Scientist & Managing Consultant Resume
4.00/5 (Submit Your Rating)
Ny, NY
PROFESSIONAL EXPERIENCE
Confidential, NY, NY
Principal Data Scientist & Managing Consultant
Responsibilities:
- Developing a De - identification system for the AWS ecosystem that complies with HIPAA Privacy Rule & De-identification Standard. Underlying premise is that an entity’s de-identified and identified data trails tend to become identical as the number of databases becomes sufficiently large. Current framework code and sample population generator are written in C99 with basic headers plus openssl/sha.h, CommonCrypto/CommonDigest.h for inclusion of hash functions, editing is done in vi and xcode on a Macbook Air running Sierra.
- Set up Apache Ambari hadoop developer environment via Hortonworks Docker Sandbox HDP 2.5 and Oracle Virtual Box 5.1.32 running CentOS Linux 7 on an Ubuntu(14.04) server -, upgraded memory on a Toshiba Portege R830 from 4Gb to 16Gb to handle load of the VM, needs 1.35V on RAM. Pig, Hive, HDFS, TEZ, MapReduce are Native, placed & queried delimited reformatted log & population sample files in HDFS with Pig, Hive using TEZ - preferred for speed - either or MR - dink around.
- Collaborate with the Design, Strategy, Sales teams to propose insightful actions regarding Big Data, Data Science, Data Architecture, IoT, Modeling or Analytics in RFP creation.
- Evaluate partner software; SigOpt produces software that optimizes ML parameters, checked for performance, integrability. Analytics Engines has a preconfigured hadoop environment XDP, went through their training program, tan setup my own dev environment with HDFS, Hive, MongoDB, Neo4j, PostgreSQL, PySpark, SparkR, Spark SQL, R Shiny.
- Built:
- a recommendation system for an assortment of categories with content and collaborative filtering; used Google Knowledge Graph API w/keys, and Twitter API w/keys, python 3.4 imported tweepy, Stream (tweepy), OAuthHandler (tweepy), csv, re, collections, json, urllib, StreamListener (tweepy.streaming), LancasterStemmer (nltk.stem.lancaster), time, urllib.request, stopwords (nltk.corpus), urlparse (urllib.parse). Used consumer history, a flat file, in conjunction with GKG to retrieve top N similar products, placed in another file, for each item in each category in the history. Tan get last M user tweets and bucket sentiment according to category. Weight N products according to M filtered tweets for recommendation.
- a Reverse IP tool & stat package for analyzing unformatted web logs; prototype project smaller data, used Whois Reverse IP REST API, C99 with the system command to fire Excel worksheets preloaded with VBA macros & shell scripts for sorting to distinguish bots from humans and study each entity’s behavior with respect to website navigation. We studied pages visited, how long, entry point, revisits, downloads, tan operating systems + IP intelligence to determine the domain most likely to be identity so that sales, strategy or security may be alerted and a possible action can be taken.
- a Data Discovery Observatory tool for visualizing client data; created d3 geo-visual renderings of prescription medication use per med during different time periods to see the shifts in usages with respect to health challenges, TEMPeffectively viewing the analytics of an RFP pitch. Used json files, geojson files and csv files. Also did a d3 rendering for an HR questionnaire with mean, standard deviation & prevalent comments being extracted in R to gauge sentiments across the globe.
Confidential, NY, NY
Data Scientist
Responsibilities:
- a cross-selling email marketing campaign; Learning was done using an experimental non-Markov stochastic model with diminishing weights on memory. See publication section. Segmentation of the consumer population was done first by considering a household as a single unit, tan bucketing households according to purchase amounts, and tan clustering by basket items. Used R 3.0.1 for computation of suggested items. Data stored in flat files.
- a model to detect eye malady stages; kaggle: Given a set of set of retinal images with malady severity levels ranked from 1 to 5, the task is to detect the severity level of the malady in fresh retinal images. 1+TB of data for training. Needed to normalize images, cropped image using pixel intensity + black border, resized, reoriented, tan applied histogram equalization, used ImageMagick. Used CNNs for ML, convolution layers were followed by batch normalization( Ioffe, Szegedy, Batch Normalization:…) & rectifiers, pool layers used max pooling, work was to minimize the multiclass logloss function (different TEMPthan kaggle) training done using SGD with momentum algorithm - python, SGDClassifier (sklearn.linear model), StandardScaler (sklearn.preprocessing), Keras.
- a telematic anomaly detector for drivers; kaggle: Given a set of drivers each with lists of trips described by coordinates per second, the task is to determine which of the trips in each list are anomalies. Assumed certain percent of trips for each driver were true, all other drivers trips were false, trained on that assumption. Features were engineered - total distance, average angle, trip start minus finish distance, velocity, stops, so forth - from histograms & percentiles tan applied Gradient Boosting. Used RDP algorithm, from numpy, on each trip tan segmented with a SVM. Smoothed trips with a Savitzky-Golay filter from scipy tan looked at angle, distance pairs and segmented again with an SVM. Used Lasso model in sklearn to ensemble the methods.
- a recommender system to cross sell LOBs for AON. Details.
- predictive models for Mass Mutual to address network vulnerabilities & malware attacks. Details.
- a supply chain disruption predictor for McKesson. Details.
- a prescriptive model for access management that reduced human involvement. Details.
- Consulted, managed client expectations by day, led geographically dispersed team by night.
- Accomplishments: FSI Kwhiz top 10: Oct-Dec ’13, Confidential PM Elite Lite, Coursera::Machine Learning - {Octave, ANN, SVM, Multinomial Polynomial & Logistic Regression, bump functions, image recognition, predictions }, Coursera::Data Science - {Pig, Hive, HDFS, YARN, NoSQL, MongoDB, MapReduce, TEZ, Spark}, Probability P/1 SOA/CAS.
Confidential, NY, NY
Principal
Responsibilities:
- Built trading model analytics underneath heat maps for option traders and business analysts, average portfolio size was 10mm. The analytics consisted of ad hoc computations mixed with matching a basket of technical indicators and greeks to trader configured trading triggers and/or alerts all while keeping within portfolio margin requirements. Technical indicators included momentum, pivot points, RSI, RVI, SMA & crossovers, volume weighted pricing. Greeks included delta, vega, theta. The coding cycle was iterative and consisted of listen-read-watch-discuss-code-test-implement-repeat. One directive was to automate trader tasks that were deemed to automatable which included in-house tools to check for trade discrepancies and provide necessary actions to be taken.
- Designed, coded, updated RT workbooks via VBA functions, macros & C# add-ins as per fund’s design specs and RT feed. Worked with data providers to maintain feeds, researched trading strategies, researched equipment, ordered, purchased and installed devices. Set up a RT trading system for a single trader using a Sparc Sunblade 2500 server to receive info that was communicated from a Windows 7 Excel RT feed. On the Sparc trade logs were evaluated in C, on the Windows side, numbers were formatted with VBA within Excel, the communication happened via formatted flat files, a VB script started the process.
- Turned papers into client implementations. Wrote C/C++/Java/python code for models, troubleshooting, integrating and optimizing. One gig involved interest rate curve generation with Suite LLC working on ALib an analytic library developed by LabMorgan, they were interested in getting the BGM model implemented. Further tasks were developing, compiling, porting, testing C/C++/Java/JNI code between Solaris 10, Linux Debian-RedHat-Ubuntu between x86, x64 and Sparc architectures. Maintained & created shell scripts for automated testing, prepared deliveries of newer versions and patches. Performed system admin duties; installed packages, deciphered logs, added users, changed permissions, installed and configured graphics cards. Networks consisted of servers behind the firewall and servers that were accessible from the web via ssh or some VPN. Also sftp-ed and scp-ed files about the network for delivery prep, client environment replication, outside sources, testing.
- Used regression to help several NYC developers in their pricing of construction deals - based on closed real estate transactions during a configurable time period, a distance parameter, a neighborhood factor, and a few other inputs from contractors. Supported same developers in their inventory tracking, recordkeeping & software updates. Also acted as an engineer, liaison, code decipherer to a consortium of building architects, gaining huge insight into the interesting field of construction policy and the NYC Department of Buildings. Concentrated on Green development with cutting edge eco-friendly materials and renewable energy - solar panels, insulation, lighting & design, function over form treescaping.
- Became hands on familiar with OBD technology, (CAN, OBD2), scanners, pass through devices (own a J2534), Ford’s Motorcraft Services, Chrysler’s TechAuthority – updating vehicle software, module programming, DTCs, troubleshooting & eliminating CELs, pin out & wiring diagrams, complete powertrain. Extended into interest in IoT and the WIPO debate.
- Managed a Blackberry application development team bringing mobility to the masses & led by coding example. Deciphered packet headers for protocol, tied in an encryption portion using third party software, storing encrypted packets in the Oracle 8i DB. Also worked on packet prioritization, caching & guaranteed forwarding using Oracle8i, logging packet events. Set up PLSQL scripts to be triggered in the DB to enable these events.
- Packet evaluation code was in C, edited with vi, Linux RedHat, GCC - GNU compiler.
- Wrote Oracle 8i odbc apps in the Visual C++ env, with Sybase counterparts, used Visual Studio *, built MFC GUIs for business users, wrote underlying code, specs, fixed system bugs, edited the windows registry.
Confidential, NY, NY
Senior Software Developer
Responsibilities:
- Wrote code and managed an DevOps team for the online retailer that developed a B2B portal with co-branding which was a filtered partner program allowing partner employees access to incentive purchasing programs. Used IBM Net.Commerce C++ container classes for logins, shopping cart, used html to interface with design’s mockups, went to the Oracle 8 DB for filtering partner information.
- Developed a customer tracking system & monitored product purchases. Used IBM Net.Commerce C++ container classes for logins, page tagging, the shopping cart, piped to Oracle 8 DB for storing information, used SQL for investigating data or small data processing & wrote PLSQL scripts for bulk data processing.
- Developed a minimart – kozmart – with an incentive purchase point system, products ran the gamut from sundries to restaurant affiliate entrees to movie rental & purchases. Used IBM Net.Commerce C++ container classes for shopping cart and product hierarchy, Oracle 8 DB was used for capturing information. Set up triggers in the DB to enable discounts to be incorporated into user shopping cart via the point system or coupons.
- Part of the organizing team that architected the DB2 to Oracle 8 migration, and part of group that did the actual migration. Mapped dependencies, enumerated discrepancies, ran trials, regression tested the migration outcome.
- Trained junior developers on IBM Net.Commerce and C++, designed functions, wrote functions, debugged and tested html, C++ and Java code across multiple browsers, system wide. Organized semi-monthly team building activities.
- Trade equities and equity options using Track Data software.
Confidential, NY, NY
Financial Engineer/App Developer/Shareholder
Responsibilities:
- Principal architect of a C++ multidimensional multi asset implied volatility calibration library used firm wide. The base library overloaded operators giving a linear algebra package. Two root finding algorithm’s were the center of the library, the first algorithm a hybrid one dimensional root finder combining Brent’s method, bisection method and the one dimensional Newton’s method. The second algorithm developed was an offshoot of Nelder-Mead and used Taylor’s Theorem to relax the orthogonality constraint in Newton’s method for higher dimensions thereby reducing function calls. Code was written on a Sparcstation 10 running Solaris 2.6, documents were written on Word 6.0 / Windows 3.x. Ported code to Visual C++ 6 with MFC. Replaced the NAG library in calibration routines.
- Developed Back Office reports in C/C++ with embedded SQL; country specific holiday calendar, counterparty details, fixed income asset details & valuation, portfolio details & valuation, exotic details & valuation, volatility smiles. Reports were populated from Sybase and Oracle DBs, user input, Black-Scholes, other Greeks.
- Wrote system functions in C/C++ for Summit v2.3 and v2.4 that tokenized swap details, calculated bond accruals, performed Monte Carlo simulations, valued bond options, caps, floors, equity swaps, vanilla swaps, & exotics all using the Summit data environment. Used vi and Emacs for editing, makefile was configured to link with system and personal dev environment.
- Wrote white papers on cubic splines and yield curve construction, extending the Monte Carlo method and calibration.
- Eliminated bugs, regression tested functionality in releases and patches, configured GUIs, involved in the full SDLC.
- Support traders by monitoring positions, reconciling trades in real time, calling floor brokers for fresh looks, reporting MTM daily to the fund’s owners, following RT charts, configure RT spreadsheets, execute trades on Instinet, get earnings & news for trading desk from Bloomberg terminal, send and receive faxes, ordering and picking up lunch. Track Date was the RT charting platform. Portfolio size ranged from 2-5mm, .
- Passed Series 7 exam first attempt.
TECHNICAL SKILLS
- python
- numpy
- pandas
- scikit-learn
- tweepy
- Jupyter
- IPython
- R
- Shiny
- Octave
- Tableau
- Java
- C
- C++
- C#
- Node.js
- Javascript
- d3
- Net.Commerce
- Cygwin
- Netezza
- Aginity
- Anaconda
- Hadoop
- HDFS
- MapReduce
- TEZ
- YARN
- Mesos
- Hive
- Pig
- Spark(2)
- Oozie
- Zookeeper
- Storm
- Kafka
- Flume
- Spark Streaming
- Cassandra
- MongoDB
- HBase
- MySql
- SQL Server
- DB2
- Oracle
- Sybase
- Ambari
- Cloudera
- Hue
- Drill
- Phoenix
- Presto
- Zeppelin
- Azure
- SAS
- SQL
- NoSQL
- PLSQL
- OSX
- Red Hat
- Ubuntu
- Solaris
- CentOS
- Windows
- PuTTY
- Google developer tools
- Excel/VBA
- PPT
- Virtual Machines
- AWS
- EC2
- S3
- Mathematics
- Probability
- Statistics
- Polynomial & Logistic Regression
- Neural Nets
- evaluation metrics
- KPIs.