We provide IT Staff Augmentation Services!

Senior Software Engineer Resume

5.00/5 (Submit Your Rating)

Los, AngeleS

OBJECTIVE:

A challenging position as Principal Software Engineer or Senior Big Data engineer

SUMMARY:

  • Hand - on experience of getting new projects take-off with full-ranged skills: analyze requirement, architect design, prototyping, coding, all the way to production deployment.
  • Very quick learner of both emerging technologies and overall business perspectives.
  • Great engineering depth of cross-disciplines, including Linux system & network, root cause analysis and troubleshooting with proven records.
  • Very comfortable with language-agnostic, heterogeneous development environments: SQL vs NoSQL, Hadoop vs. Spark etc. Long-terms goal: Big data architect.
  • Over 15 years’ experience in software development.
  • 15+ years in software development: Java (15+ years ) with experience writing multi-thread KafkaConsumers, Ruby (4+ years), Scala (2+ year), Python (2+ years), shell script (2+ years), Go (1 year)
  • 10+ years in SQL/ Relational database: MySQL, Postgres, 5 years in MongoDB master in shell command, and aggregation framework; experience of building replica set & shard; data migration, MongoClient Java, Mongo shell Javascript. 1- year Cassandra
  • 10+ years in Linux (both Ubuntu and Redhat). Daily operations of SSH login, SCP, Wireshark etc. Skillful at SVN and Github. Master in Maven and skillful at gradle (java), sbt (Scala), rake (Ruby) and Go build.
  • 8+ years in JMS; open source messaging: 4 -year in RabbitMQ and 2+ year in Kafka .
  • 10+ years in Spring-Framework, 6 year in in REST/ JSON, 2 year in Spring Boot
  • 8+ years in Application servers: Tomcat, Jetty embedded; 3+ years in Apache web server (virtual host, mod jk, mod rewrite), 1 year Docker
  • 4+ years in Hadoop ecosystem: experience of building Cloudera Hadoop cluster and 3 year Spark and DataFrame: deploy Spark.applications in Hadoop Yarn cluster (I built and configure locally) and AWS-EMR (ssh, tunneling, event-log debugging & performance tuning). Demo is available upon request. Having experience with Hive and Sqoop.
  • 3+ years in Cloud Computing: accessing AWS EC2 and S3 using AWS CLI. Experience of using EC2-api-tools to start AMI instances and read/write GB data to S3
  • Work on various private projects hosted in my github account including Kafka-streams projects (running integration tests on embedded server) and Spark tutorial 2 project (running movie recommendation using Spark DataFrame and ML-ALS and Spark-EMR
  • Writing various algorithms, including Dijkstra's shortest path, K-means clustering, Knapsack, (in Java); Extended Euclidean & Chinese Remainder (in Ruby). Anagram, Water Pouring, Bloxorz solver, Sudoku solver (in Scala) and working experience writing recursive parsers.
  • Lifetime learner. Active in Coursera and edX Online Course community. Honored to server as Community TA of ‘Functional Programming Principles in Scala’ course in Coursera

EXPERIENCE:

Confidential, Los Angeles

Senior Software Engineer

Responsibilities:

  • Coded and migrated Danube interfacing with RabbitMQ to Apache Kafka 0.10. Having great understanding of difference between RabbitMQ and Kafka leads to a sound migration strategy.
  • Understand CAP theorem and trade-off between consistency and availability of distributed system (ex. Kafka leader election choice) and consistent hashing
  • Add Danube envelope to capture essential Rabbit property headers and data body. Kafka did not support headers until version 0.11.
  • Single thread-safe KafkaProducer publish messages with well-distributed keys for partition.
  • Add multiple KafkaConsumers with multiple-threads sharing consumers in thread-safe way with the following in mind:
  • To leverage parallelism of Kafka multiple partitions.
  • Not to compromise computing power of the computer hosting the consumer application
  • Reduce expensive re-balance at initial multi-consumers join group stage
  • Evaluate the trade-off between possible data loss due to auto-commit and possibly re-processing stale data due to manual commit and choose manual commit over auto commit. Danube is built-in to discard stale data.
  • Choose ‘earliest’ auto.offset.reset strategy to avoid synchronization of producer and consumer applications when they roll up to new environments.
  • Trouble shoot a co-worker’s bug due to thread-unsafe coding practice by identifying thread-unsafe symptoms: a) unable to duplicate the same error in unit tests (single-thread environment), b) Inconsistent results of the same messages in multiple integration runs. Fixed the bug by changing the troubled instance variable (instantiated on shared heap) to method local variable (on thread stack).
  • Add Avro Schema, serialization and de-serialization on top of Kafka messages to leverage Avro compact data format and its versioning management.
  • Change Danube-Core to be able to dynamically injected with a marshaler of different output format (Avro json in this case) and also support multiple input models for transformations.
  • Write two new parsers using recursive: converting XSD to Danube LdmModel and converting Avro GenericRecord to Danube ErMessage based upon LdmModel and one simulator to convert Avro GenericRecord between two different schemas.
  • Confidential Resolver logs with common fields to generate sorted details discrepancy report
  • Filtering, parsing Danube Transformer logs of both systems into common objects with additional numeric field (0 or 1) then union together to generate grouping report side by side
  • Apply the same methodology on resolver logs to generate crosstab summary report flagged with levels of discrepancy using nested ‘when’ and ‘otherwise’ etc. sql functions.
  • Move data to AWS S3. Automate instantiation of AWS-EMR instance and deployment of the above application using AWS CLI (in Python).
  • Clear out possible Confidential lag and identify the root cause: the insertion of 20-K+ entries into MYSQL queue table with 20K+ single SQL statements
  • Implements Spring JdbcTemplate batchUpdate with rewriteBatchedStatements enabled and “ Confidential ” to ensure no rollback due to duplicate keys
  • Perform volume tests up to 100K / batch and stress tests up to 50K / thread of 16, multi-thread environments against a structure-like table of the same database
  • Build up a generic aggregator framework allowing aggregating multiple Confidential Matador resources to FanTV contents and adding new resources seamlessly.
  • Overcome a technical difficulty to integrate with FanTV pipeline endpoint of Kafka 0.7 incubate built with Scala legacy 2.8
  • Principle data engineer for ingestion of terabytes of data with processing speed of 2M+ records/hour. Push data production release, including deployment, planning, and tune up, trouble shooting and improvement (in Java). Automate data ingestion/ submission scheduling process (In Python)
  • Domain expert on RabbitMQ, setting up loopback exchange and configure handler for extension routing key
  • Tremendous performance gains: cut analysis report time of very large collections (80 million records/53 terabytes each) to within an hour per resource using sound methodology explained later. It took 40 hours previously using brute force.
  • Javascript (native to Mongodb shell) multi-thread programming is used to cut down execution time. I use splitVector, an internal Mongodb command for sharding to split data evenly by chunk size. Apply random selection of data splits to threads to ensure fair results

Confidential

Senior Software Engineer

Responsibilities:

  • Avoid retrieving video aggregate.data from Confidential of duplicate videos by implementing secondary sort algorithm.
  • Store JSON response returned from Confidential in Hive table and interpretate record of JSON data correctly by applying a well-written third-party JsonSerDe
  • Use MultipleOutput to output preference and Confidential data simultaneously followed by creating Hive external tables referencing those output. Generate final metrics by joining the above Hive tables.

Confidential, Los Angeles

Senior Software Engineer

Responsibilities:

  • Correct a mixture of Hadoop MR-1 (JobTracker) and MR-2, Yarn (ResourceManager) issue and recover by reconfiguring corrupted NameNode DataNode. Migrate successfully hadoop.tmp.dir to a partition with capacity for growth to correct lots of unknown system errors
  • Rewrite Client-report transaction extracts to one-step automation process. Utilize Sqoop to directly import data from MySQL to Hive/ HCatalog table. Accomplish Hive dynamic partitions using new features that Sqoop 1.4.4 integration with HCatalog provide

Confidential, Thousand Oaks

Senior Software Engineer

Responsibilities:

  • It gets a partial share of postal saving from aggregators when those mail recipients choose ‘Go Paperless’.
  • That intends to replace prolonged verification via mail. We integrated with third parties APIs to verify the address and last name corresponding to the input phone number indeed matching Confidential records and send verification code to users via SMS or voice mail.
  • To comply with PCI Data Security Standard, Confidential ensures that customers’ PIIs (Personal Identification Information) in database are secure. Confidential generates encryption key using AES-256 algorithm for the maximum security. Also Confidential implements full set of security measure: Master Key, Encryption Key, Master Key rotation.
  • I also enhance to allow variables in patches that we can define different values in profile YAML by environments. We accomplish this by using Ruby ERB, binding and meta-programming.

Confidential, Los Angeles

Lead Software Engineer

Responsibilities:

  • Create customized Analyzer with stemmers & filters and enhanced stop words to improve indexing and searching.
  • Add advance queries with additional NumericRangeQuery, PhraseQuery and SpanQuery group.
  • Provide spell check suggestions based upon LevenshteinDistance when no match was found.
  • Maintain Solr server.
  • Completely architect & rewrite Digital Dailies and launch this application companywide successfully.
  • All users can access it internally or via internet on multiple medias: desktop, iPad, iPhone.
  • Able to accommodate business needs to upload media files of > 10GB size (applet: client-side FTP upload solution)
  • Develop feeds using flexible Velocity templates injected data from PANDA database, user-defined variables and Spring configuration; thus reducing tremendously development lead time

We'd love your feedback!