Text Analytics and Software Analyst Resume

SUMMARY

Self - driven software developer with a public trust clearance, 2+ years in developing and productionizing J2EE and Hadoop-/Spark-based cloud applications. Recipient of the 2017 COBRA Hero Award, for proactive R&D and prototyping of application features resulting in 20% overall savings of time and budget.
2+ years developing/engineering big data/cloud/Hadoop (Hortonworks) applications: HDP 2.5 - 2.7, YARN, Spark 1.6.3-2.2.x, Zookeeper, Hive
2+ years real-time/streaming applications & batch processing: NiFi 0.6-3.0, Kafka
2+ years document database: Lucene Solr 5.5 - 7.1
2+ years NLP-based data science: Stanford NLP, TF-IDF
2+ years machine learning algorithms & workflows: Support Vector Machine (SVM), Random Forest, Logistic Regression, Artificial Neural Networks (Deep Learning)
2+ years on SQL and NoSQL databases: HBase, PostgreSQL
2+ years on Windows 8 & 10, and 2+ years Linux Red Hat 6.x
2+ years JVM languages: Java 8, Scala 2.10 - 2.12, Spring
2+ years developing RESTful web services: Spring REST, Tomcat 8.5
2+ years AGILE with Atlassian suite
2+ years Git source control
< 1 year developing AWS applications

PROFESSIONAL EXPERIENCE

Text Analytics and Software Analyst

Confidential

Responsibilities:

Productionized Support Vector Machine (SVM), Random Forest, Logistic Regression, Artificial Neural Networks (deep learning), and Genetic Algorithms for document classification Spark 1.6.3 -2.2.x using both Java 8 lambda syntax and Scala 2.10 - 2.12.
Designed and constructed a document ingestion and cleaning system, with Stanford NLP techniques, Apache NiFi, and Lucene Solr 5.x - 7.1
Created Spring-based REST APIs for exposing the power of Apache Spark’s machine learning (ML and MLLib) to user’s fingertips, yet hiding workflow details of Apache Kafka, NiFi, and YARN
Architected system-wide features for auditing of document changes and user actions via ETL pipelines through NiFi and Kafka, and into NoSQL HBase databases and PostgreSQL databases
Managed and distributed back-end service-writing and machine learning tasks to a sub-team of four members to utilize their strengths, improve their weaknesses, and guide them to their career goals through AGILE methodologies and Atlassian (JIRA) products
Developed back-end systems for Technology Assisted Review (TAR) and Continuous Active Learning (CAL) in predictive analytics, resulting in 75% increase in performance of document analysis.
Boosted accuracy of binary classifiers by over 25% while analyzing and optimizing machine learning classifiers through Apache Zeppelin notebook and Python libraries
Engineered ETL and analytics workflows for email-threading within Neo4j graph databases
Administered multiple Linux- and Windows-based work environments through Red Hat 6 (RHEL) with OS virtualization via Hyper V Manager
Researched and developed preliminary pipelines for launching Apache Mahout computation via Spark execution on distributed GPUs through YARN on NVIDIA CUDA cards
Maintained and managed Git branch merges for machine learning and REST services

Back-end Developer

Confidential

Responsibilities:

Engineered ETL pipelines for digesting medical dictionaries and patient documentation for analytic work
Developed a workflow for parsing stored medical information and comparing medical terms to physician diagnoses
Designed a semantic search system that would provide a best-guess approach to similarities between medical terms using Random Index Vectoring (RIV)
Administered Apache NiFi and RapidMiner softwares onto AWS servers for building automated data ingestion workflows into MongoDB

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship