Seeking Big Data Hadoop/Spark opportunity
- Experience working in Department of Homeland Head Quarter to perform data ingestion, enhance and utilize data frame application to ingest hundreds of million records daily from DHS agencies.
- Perform and oversight data ingest from landing server to HDFS of +70 servers, data mapping, data load to Accumulo, index Solr Cloud, running Hadoop Map/Reduce jobs with Pig, UTF, and cron shell script
- Set up Accumulo table and auth list, set up Cloudera SolrCloud instances and shardings, perform Solr index, Solr performance turning. Monitor Accumlo performance, data node, recycle Job trackers and data nodes across 70+ servers.
- Profound experience in full software development cycle, Object Orientation, distributed systems, multiple - tier (SOA, EIS, middleware),JEE, public/private cloud, database, and integrationEducation:MS Comp Science Johns Hopkins Univ. BS Comp. Science Univ. of Maryland
Languages: Scala, Spark, Pig/Hive, JAVA, XML, PL/SQL, Unix/Linux shell scriptings. Database/Accumulo, SolrCloud, ElasticSearch, MySql, Oracle, SQLServerSearch Engines
Skills: ETL, Amazon EC2, S3, private cloud, Hadoop ingestion, SolrCloud indexing JEE (JSF, JSP, Servlet, Restful JAX-RS, JAX-WS, EJB, JPA, JMS, JDBC, CDIJSON-P, etc), text and binary parsing, PL/SQL, Shell scripting)
Software: Hadoop, Puppet, Kibana, Nagio, Virtualbox, GPG, Velocity, Subversion, Mockito.
Confidential, Washington, DC
- Oversee contractors’ development of effort for Grant Modernization Project.
- Work side by side with contractors in development and platform, laying ground work for the new project initial effort.
- Work with cross team and office of Chief Information Office personnel on corporation effort.
Lead Software Engineer
Confidential, Washington, DC
- Develop and enhance Web application iService Benefit Inquiry using Velocity, JSF, Java, IBM Websphere.
- Maintain and modify Hibernate and MyBatis data access object to support user inquiry data.
- Performed bug fix and production support.
Confidential, Washington, DC
- Perform daily data ingestion and data search index at the department of Homeland data warehouse.
- Operated and enhanced implementation of data frame programs and utilities to support Data Scientists with Accumulo.
- Operations consists of ingestion of over hundreds of million records of Big data daily from many data sources at the landing server into HDFS (operated over 70 Hadoop data nodes.) Performed data transformation from HDFS to Accumulo using Java UDF and Pig script.
- Developed program to track and report data metrics, failed jobs, and rerun failed jobs. Monitored and ensured DataNode and JobTracker availabilities.
- Performed data mapping and Solr schema for SorlCloud which consists of four server nodes. Set up solr cores (instances), data collections, sharding. Indexed data in SorlCloud, maintained, trouble-shoot, and monitoring. Utilize static and dynamic fields with multiple value attributes to resolve searching adhoc of thousands of data similar data fields. Configure and tuning SolrCloud by increase caching.
- Created Accumulo tables with split file and performed data query.
- Setup Accumulo column visibilities.
- Assisted System Admin in creating KVMs, Puppet settings, assist scheduled maintenance of Hadoop ecosystem and services.
- Mitigated risks when planning switching search engines.
- Developed and modified Java programs to scan and pinpoint for special characters that causes corrupted file ingestion.
- Performed trouble-shoot on ingestion and indexing issues.
- Recycled Jobtracker and data node services.
- Configure bloom filter on Accumulo and Solr caching to improve performance.
- Implement Java application using PDFBox and Apache POI to extract plain text content from Adobe PDF and MSDoc files and index them to ElasticSearch engine.
- Performed ElasticSearch index and inquiries.
- Facilitate third party software vendor, MuleSoft product in response to the client request.
Senior Principle Software Engineer
Confidential, Hanover, MD
- Developed Java projects and deploy in the public and private cloud. Setup and manage EC2 instances, user, group, policies using AWS IAM console.
- Utilize S3 bucket containers services such as creating buckets and putObject, listing bucket objects, etc.
- Elaborate AWS Exception when creating a new bucket which already preexists by other in Java.
- Utilize public/private keys encryption features in secured applications.
- Utilize Java and Spring integration framework such as channel adapters, chain-channel, service-activator, transformer, etc to develop data flows.
- Utilize JPA and Spring transaction managers for data persistence. Utilize database to manage many large data flows in both directions.
- Design and develop the enterprise Bi-directional XOP (XML-binary Optimized Packaging) File Processor software that are deployed at two remote terminal servers to construct XOP data file from complex data mappings and various MIME type payloads such as text, binary, XML, MS documents, JPEG, PDF, gzip, tar and other proprietary packages for one end server and to extract XOP components to their consumable formats for other end server with many configurable features such as compression, segmentation, sequential reassembling. The software is highly available and operational around the clock with capacity of several tens of thousands data files.
- Apply polymorphism, effective enumeration for compact data mappings, design patterns, name/value pair list POJO and recursion to construct sophisticated objects and afford data expansion, extensive usage of parallel I/O streams for real time file processing to reduce memory footprint for very large data file, utilization of series of parsing techniques such as regular expression, DOM XML, binary indexing, thread safe concurrency, check-pointing, base 64 encoding, and other framework such as Velocity template engine, Tika MIME type detection to implement the software.
- Software is also developed to have server synchronization mechanism using locking features to accurately track data sent and received between the two remote servers with multiple level acknowledgments and resend feature to ensure data delivery.
- Develop REST Webservices to provide client interfaces using Jersey framework to consume and produce XML and JSON formats. Implement the back-end DBMS(db persistence unit) by JPA framework.
- Generate GPG public/private key pair, import client public key for encryption. Implement shell program to automate GPG encryption and decryption in junction with the application.
- Develop tool suite by shell script to automate software build, deployment, and test. Implement series of watch consoles to interactively monitor data flow in real time and to perform periodic reports for daily, weekly, and monthly scheduled by cron jobs.
Software Engineer Staff
Confidential, Hanover, MD
- Developed and maintained PHP web application, Java, MySQL to monitor network vulnerabilities.
- Developed web database applications. Implemented REST Web Services with various parameter inputs such as path, query, matrix, etc. producing XML and JSON data.
- Developed DBMS using JDBC AND JPA object relational mapping (ORM,and DAO). Implemented smart object (collection of key/value pair) to accommodatedynamic growth data structures. Developed database change notification mechanism using Tibco JMS messaging service and Observer pattern to generate event notification and provide listener registration for callback.
- Lead in Object Oriented design and development the database application to detect database change and to propagate such data change events to Tibco JMS messaging service via callback mechanism. Provide guidance to peers to use Observer pattern to register their callback for database event notification.
- Performed software integration and productized the research prototypes into operations.
- Provided architecture design guidance to team members to integrate their components into the overall architecture.
- Implemented the script to improve installation process by simplifying database and software configurations from three days down to five minutes.
- Plans, conducts, and coordinates software development activities.
- Designs, develops, documents, tests, and debugs software. Corrects program errors, prepares operating instructions, compiles documentation of program development, and analyzes system capabilities to resolve questions of program intent, output requirements, input data acquisition, programming techniques, and controls. Ensures software standards are met.
Confidential, Vienna, VA
- Developed, enhanced, and supported the Enterprise Information Service (EIS) system of Automated Commercial Environment program for U.S.
- Custom and Border Protection hosted at IBM facility.
- Provided production support for Java based components in J2EE environment and other Java Connector Architectures such as DB2, SAP, etc.
- Component supports include EJBs, SAP structure mapping and interfaces, immigration of image data from Documentum to DB2, and implementation of Unicode conversion for ID verification.
Enterprise Software Engineer
Confidential, Rosslyn, VA
- Developed Electronic File Delivery component of Enterprise Staging Content product for Confidential .
- Implemented the routine automate database maintenance service configurable using Spring framework, Quartz enterprise job scheduler, and iBatis data mapping framework.
- Utilized injection dependencies and auto-start feature to automate program self start.
- Utilized Java object database and SQL statement mapping configurations for event synchronization.
- Developed object relational mappings using hibernate.
Confidential, Washington, DC
- Performed ETL using Hibernate mapping and Java implementation to migrate massive claim data from SQLServer to Oracle database.
- Designed and developed the entire web based imaged document application - Auto Damage Server to deliver electronic documents to Confidential legal community using Java, Servlets, JNDI, Oracle JDBC, FiletNet ISRA, and Websphere server.
- Designed and developed common components that can be integrated with or without the application server.
- Implemented core binary parser to extract meta-data, text, and variety of image document such as jpeg, pdf, MSWord, etc. from compressed format.
- Setup and configured J2EE application server environments such as database connection pool, JNDI, JMS, log4j, and JCA connector(FileNet image resource adapter).
- Developed and deployed EJB components using Weblogic.
- Developed Java message service components such as topic and queue messaging. Implemented SAX and DOM parsers for XML data.
- Implemented and deployed Web services using Axis SOAP engine.
- Provided implementations to publish and to consume vendor service.
- Successfully proposed and developed the printing solution using Java Printable interface to deliver dynamic and robust features such as variety format of text and image rendering capabilities, resolution for over-run margin, network intelligence to locate alternative printers, and printer alert feature in event of device failures paper jams, paper shortage, and hardware issues.
- Designed and developed the SOA InfoCenter application server using Singleton and lazy-loading technique to provide data inquiries for fifty states and other U.S. territories.
- Revamped the entire Diary SOA server to enhance performance by fixing 10 MB of memory leakage and improved source code efficiency.
- Discovered and resolved IBM Visual age bug.
- Created database data models and schemas. Transformed object database model to relational database model for Payment service oriented server.
- Implemented database manager layers to insulate the application from vendor specific. Applied Singleton pattern and lazy loading technique to improve performance.
- Developed screen scrapping programs to read and write data to the mainframe.
- Developed Korn shell programs to build the entire code baseline on AIX platform with other vendor products such as Tuxedo middle-ware.
Confidential, Rockville, MD
- Designed and implemented the most modernized and advanced aircraft communication components: Automatic Dependent Surveillance (ADS) and Controller Pilot Data Link Communications (CPDLC) of the air traffic controller software for aircrafts equipped with satellite surveillance to fly in oceanic and non-radar coverage regions.
- Developed Context Manager (CXM) using C++ in distributed system to retrieve or register aircraft information in dual Flight Data Processors and Radar Processors and communication establishments, information status, etc from controllers to pilots directly.
- Designed and developed the software to run on a dual redundant distributed system using object-oriented design, check pointing, client-server and sub/publisher (Observer pattern), and fault-tolerant collection to ensure automatic data recovery (data reconstitution) and system availability.
- Programmed flight-deck using Ada and C++ interfaces.
- Deployed traffic controller software on multiple processors and console stations. Set up the entire air traffic controller system and simulated flight traffic scenarios using Confidential software.
- Implemented PL/SQL scripts to create tables, constraints, triggers, procedures for airport facilities database. Implemented Oracle forms to validate and to generate system initialization data.
- Implemented ProC programs to persist and generate airport adaptation data.
- Implemented shell scripts to invoke ProC from Oracle forms.
- Performed database administration tasks such as export, import, and cloning database.