Text Analytics And Software Analyst Resume
5.00/5 (Submit Your Rating)
SUMMARY
- Self - driven software developer with a public trust clearance, 2+ years in developing and productionizing J2EE and Hadoop-/Spark-based cloud applications. Recipient of the 2017 COBRA Hero Award, for proactive R&D and prototyping of application features resulting in 20% overall savings of time and budget.
- 2+ years developing/engineering big data/cloud/Hadoop (Hortonworks) applications: HDP 2.5 - 2.7, YARN, Spark 1.6.3-2.2.x, Zookeeper, Hive
- 2+ years real-time/streaming applications & batch processing: NiFi 0.6-3.0, Kafka
- 2+ years document database: Lucene Solr 5.5 - 7.1
- 2+ years NLP-based data science: Stanford NLP, TF-IDF
- 2+ years machine learning algorithms & workflows: Support Vector Machine (SVM), Random Forest, Logistic Regression, Artificial Neural Networks (Deep Learning)
- 2+ years on SQL and NoSQL databases: HBase, PostgreSQL
- 2+ years on Windows 8 & 10, and 2+ years Linux Red Hat 6.x
- 2+ years JVM languages: Java 8, Scala 2.10 - 2.12, Spring
- 2+ years developing RESTful web services: Spring REST, Tomcat 8.5
- 2+ years AGILE with Atlassian suite
- 2+ years Git source control
- < 1 year developing AWS applications
PROFESSIONAL EXPERIENCE
Text Analytics and Software Analyst
Confidential
Responsibilities:
- Productionized Support Vector Machine (SVM), Random Forest, Logistic Regression, Artificial Neural Networks (deep learning), and Genetic Algorithms for document classification Spark 1.6.3 -2.2.x using both Java 8 lambda syntax and Scala 2.10 - 2.12.
- Designed and constructed a document ingestion and cleaning system, with Stanford NLP techniques, Apache NiFi, and Lucene Solr 5.x - 7.1
- Created Spring-based REST APIs for exposing the power of Apache Spark’s machine learning (ML and MLLib) to user’s fingertips, yet hiding workflow details of Apache Kafka, NiFi, and YARN
- Architected system-wide features for auditing of document changes and user actions via ETL pipelines through NiFi and Kafka, and into NoSQL HBase databases and PostgreSQL databases
- Managed and distributed back-end service-writing and machine learning tasks to a sub-team of four members to utilize their strengths, improve their weaknesses, and guide them to their career goals through AGILE methodologies and Atlassian (JIRA) products
- Developed back-end systems for Technology Assisted Review (TAR) and Continuous Active Learning (CAL) in predictive analytics, resulting in 75% increase in performance of document analysis.
- Boosted accuracy of binary classifiers by over 25% while analyzing and optimizing machine learning classifiers through Apache Zeppelin notebook and Python libraries
- Engineered ETL and analytics workflows for email-threading within Neo4j graph databases
- Administered multiple Linux- and Windows-based work environments through Red Hat 6 (RHEL) with OS virtualization via Hyper V Manager
- Researched and developed preliminary pipelines for launching Apache Mahout computation via Spark execution on distributed GPUs through YARN on NVIDIA CUDA cards
- Maintained and managed Git branch merges for machine learning and REST services
Back-end Developer
Confidential
Responsibilities:
- Engineered ETL pipelines for digesting medical dictionaries and patient documentation for analytic work
- Developed a workflow for parsing stored medical information and comparing medical terms to physician diagnoses
- Designed a semantic search system that would provide a best-guess approach to similarities between medical terms using Random Index Vectoring (RIV)
- Administered Apache NiFi and RapidMiner softwares onto AWS servers for building automated data ingestion workflows into MongoDB