A top - performing data scientist and researcher credited with combining research, critical thinking, and analytical expertise to deliver significant contributions to data-driven projects.
AREAS OF EXPERTISE
- Complex Data Analysis
- Statistical Analysis
- Grid Computing
- Instrumentation & Modeling
- Critical Thinking
- Software Development
- Data Collection
Languages: Python, C++, Java, SQL, Teradata, LabVIEW, Fortran, Bash, HTML, LaTeX, CSS
Platforms: GNU/Linux, Mac OSX, VirtualBox, Docker
High Performance Computing: HEP Grids for LHC and Belle II experiments, AWS, HTCondor
Analytics Packages: ROOT, Tableau, R, MS Excel, Turi-Graphlab, pandas, Mat Lab, NumPy
Distributed Computing: Hadoop, MapReduce, Hive, NoSQL, PIG, Spark, DIRAC, PanDA, GridFTP
Machine Learning Algorithms: Neural Net, Boosted Decision Trees, SVM, Logistic and Multi-Variate Linear Regression Analysis
Other: Intel and GNU Compilers, Git, SubVersion, GNU make
- Coordinator of Belle II collaboration wide-area-network (WAN) spread across 5 continents.
- Manager of computing working group for a project called Project8.
- Developer of software modules in Python for DIRAC, a distributed computing framework, crucial for monitoring grid health and scheduling replication of multi petabytes of data across the grid.
- Deployed and managed distributed data management system in production setting on a Linux VM for Belle II collaboration.
- Developed monitoring/visualization for distributed data management in Jupyter notebook using MySQL plugin.
- Analyze multi-millions MySQL entries using Pandas, Numpy and Sci-Kit to predict task completion times.
- Contribute to project on Integrated End-to-End Performance Prediction and Diagnosis for Extreme Scientific Workflows.
- Headed team on grid network data challenge to optimize petabytes of data distribution across Asia, Europe, and North America, developing analytics in Python and ROOT/C++ to identify bottlenecks and potential performance loss issues.
- Created a new partial tag technique on petabytes of data to identify a rare process in nature that occurs less than one in a million times, increasing detection efficiency over other common algorithms by a factor of ten.
- Utilized expert-level knowledge of C++ based Belle data analysis libraries to perform clustering, data classification, Monte Carlo simulations, and complex data analysis on large datasets.
- Provided user support and retained Belle experiment software stack at PIC (PNNL cluster).
Post doctoral fellow
- Key contributor to the discovery of Higgs boson in 2012 as part of the LHC-ATLAS team.
- Utilized machine learning techniques to enhance ATLAS detector’s discovery potential for new particles by 10%.
- Discovered, analyzed, and proposed a solution to a significant feature in simulation model of a major Monte Carlo application recovering statistical estimates in most research by up to 20%.
- Developed critical data quality software for a sub-system of the LHC-ATLAS experiment.
Confidential, Pittsburgh, PA
Graduate Research Assistant
- Designed, installed, and commissioned one sub-detector for LHC-ATLAS experiment at CERN, Geneva and developed software in LabView to stress test the sub-detector.
- Pioneered development of trigger-aware analysis framework in C++, enabling the system to sift through petabytes of raw data and extract normalized event rate critical for meaningful analysis and providing software tutorials for the same available on official CERN wiki sites.
- Performed statistical analysis in C++ and Python to predict sensitivity reach of ATLAS experiment to yet undiscovered processes.