- 6+ years of experiences in IT industry for software application development using various software tools and programming languages
- 3+ years of experiences in big data development with Hadoop ecosystem, Hadoop, Spark, Hive, HDFS, HBase, Cassandra, Sqoop, Flume, Kafka, ZooKeeper, and Oozie
- Proficient writing big data program with Scala, Java and Python
- Experienced with distributions includes Cloudera CDH, Hortonworks HDP, and MapR Data Platform
- Expertise in writing MapReduce programs to parse and process unstructured and semi - structured data with Java API in Hadoop
- Experienced on Spark computation engine with Spark core, Spark Streaming, Spark SQL, and MLlib for data processing
- Experienced in writing HiveQL queries and Pig Latin script to do data cleaning and processing
- Involved manage batch job scheduling workflow using Oozie
- Experienced transferring the bulk data between Hadoop and RDBMS with Sqoop
- Experienced in ETL process by using varieties technologies (Flume, Kafka, Sqoop) from different data sources etc, AWS S3/Kinesis, Web log files, RDBMS
- Well understanding varieties serialization data format including sequence file, Avro, Parquet,
- Experienced with RDBMS including Oracle 10g, MySQL 5.0, PostgreSQL 9.x
- Worked with NoSQL databases including HBase 0.98, Cassandra 2.0, and MongoDB 3.2
- Experienced with Web Application development life cycle including planning, analysis, design and development, testing, and maintenance
- Experienced using varieties web framework including Hibernate, Spring MVC, Python Django NodeJS, and ExpressJS
- Solid Fundamental with Core Java, Algorithm Design, Object-Oriented Design(OOD) and Java components including Collections Framework, Exception handling, I/O system, and Multithreading
- Worked with data visualization tools including Tableau, D3.js, and matplotlib
- Extensive Experience in Unit Testing with JUnit, Scala Test, Pytest
- Worked in development environment and used tools including Agile/Scrum, Git/GitHub, SVN, Confluence, JIRA, Jenkins, Spiral and Waterfall
- Excellent communication skills, collaboration with other team members, and self-motivated learner with passion about new technologies
Hadoop/Spark Ecosystem\ Web Development Framework: Hadoop 2.x, MapReduce, Pig 0.12, Hive 0.14, \ JQuery, Ajax, AngularJS, Bootstrap, Hibernate, \ Spark 2.x, Sqoop 1.4.6; Flume 1.6.0, \ spring MVC, Node.js, ExpressJS and Python \ Kafka 2.10, HBase, Cassandra 2.0, ZooKeeper, \ Django\ Oozie
Programming Language\ Data Analysis & Visualization: Java, Scala, Python, C++ \ MatLab, Tableau, D3.js, matplotlib
Cloud Platform\ Environment & Tools: Amazon Web Servies EC2/S3/EMR, \ Agile/Scrum, Git/Github, SVN, Confluence\ Heroku\ JIRA, Jenkins, Waterfall
Database\ IDE: Oracle 11g, MySQL 5.x, PostgreSQL 9.x, \ IntelliJ, Eclipse, Sublime Text\ Cassandra, HBase 0.98, MongoDB 3.2
Confidential, Durham, NC
Big Data Developer
- Worked on building an Omni-channel, self-learning recommendation engine, which understand customers ‘preferences and provide tailor-made financial solutions to help them accomplish their life-long financial goals.
- Worked on a real-time analytic environment to process and analysis petabyte data by using verities big data technologies includes Kafka, Flink, Spark, Nifi, ELK, Ngnix, HDFS, and Cassandra.
- Developed a micro-service module with Java Spring Boot to read/write data from Kafka topic and compute scores based on the real-time & batch data.
- Developed a Kafka Stream module by Java to read/write data from Kafka topic and join the real-time data & batch data.
- Developed a Queryable State for Flink by Scala to query streaming data and enriched the functionalities of the framework.
- Build a Nifi flow running daily to migrate data from data warehouse to Kafka effectively.
- Added authentication to the Flink dashboard by using Nginx to ensure Flink jobs running safely.
- Developed a functional test framework and wrote Junit tests to test functionalities of modules in different environments includes DEV, SIT, PERF, and PROD.
- Created Ansible playbooks which automate the process to build the new infrastructure for the tech-stack backend in different environments and decommission the old data nodes.
- Used Jmeter to do performance test of different modules in PERF environment.
- Involved development of Elasticsearch to collect system and application logs and sending alerts.
- Worked with Agile/Scrum methodology and collaborate tracking the work with Confluent, JIRA, Git, Stash, and Jenkins.
Environment: Flink 1.5, Kafka 3.3.1, Kafka Stream, Nifi 1.7, HDFS, ELK, Scala 2.1, Java 8, Java Spr ing, Scala Test, Cassandra 2.0, Oracle 11g, IntelliJ
Confidential, Needham, MA
Big Data Developer
- Worked on the environment with Agile/Scrum methodology
- Write HiveQL to do data cleaning and filtering to ensure the data reliability
- Developed multiple UDFs in Java with Hive to satisfy the requirements to process the specific data sets, such as unstructured data sets in web logs.
- Used the optimization techniques including partitioning and bucketing in Hive to enable query the data more efficiently
- Developed Spark programs with Scala, and applied principles of functional programming to process the complex unstructured and structured data sets
- Created RDDs and DataFrames in Spark to do the transformations and actions with Scala and Spark SQL, and used Spark Streaming to process click streams
- Extensively involved in installation and configuration of big data tools such as Spark, Hadoop, Flume, and Hive
- Involved exploit Flume to collect, aggregate, and moving large amount of semi-structured/unstructured data from web log files
- Used Sqoop to efficient transfer bulk data from RDBMS (Oracle database) to HDFS
- Convert raw data with sequence data format, such as Avro, and Parquet to reduce data processing time and increase data transferring efficiency through the network
- Store the processed data into Cassandra database for further analysis and process
- Worked with data science team to build statistical model with Spark MLlib and PySpark
- Worked with BI team to prepare and Data Visualization in Tableau for reporting
- Collaborate and tracking the work with Confluent, Git and JIRA
- Designed unit testing program using Scala Test, Pytest
Environment: Spark 1.6, Spark SQL, Spark Streaming, HDFS, Scala 2.1, Java 8, Sqoop 1.4.6, Flume 1.6.0, Hive 0.14, Pyspark, MLlib, PySpark, Tableau 9.2, Scala Test, Pytest, Cassandra 2.0, Oracle 11g, IntelliJ
Confidential, San Francisco, CA
Big Data Developer
- Worked on importing user data into HDFS and Hive Metastore from RDBMS (Oracle databases) using Sqoop
- Used Apache Flume to aggregate and move data from varieties data sources to HDFS
- Developed multiple MapReduce jobs in Java for data cleaning and processing
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior
- Involved in Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
- Developed customized UDF's in java to extend Hive and Pig Latin functionality
- Exported the analyzed data to the relational databases using Sqoop for visualization with Tableau and to generate reports for the BI team
Environment: Cloudera CDH 4, Hadoop 2.0, MapReduce, Sqoop, Flume, Pig, Hive, Oracle 10g, Tableau 9.2, Eclipse
Confidential, San Jose, CA
Java Full Stack Developer
- Involved in development of business domain concepts into Use Cases, Sequence Diagrams, Class Diagrams, Component Diagrams and Implementation Diagrams
- Implemented various J2EE Design Patterns such as Model-View-Controller (MVC), Data Access Object, Business Delegate and Transfer Object
- Responsible for analysis and design of the application based on MVC Architecture, using open source Spring Framework
- Designed a light weight model for the product using Inversion of Control principle and implemented it successfully using Spring IOC Container
- Provided connections using JDBC to the database and developed SQL queries to manipulate the data
- Developed Data Access Objects (DAO) using spring JDBC Template to run performance intensive queries
- Developed various test cases such as unit tests, mock tests, and integration tests using the JUNIT
Confidential, Santa Monica, CA
Java Full Stack Developer
- Designed and implement SOAP-based Web Service with Spring-WS
- Implemented controller, page handler, service classes using Spring MVC, Spring IOC, Spring security
- Developed data access project and CRUD operation through Hibernate framework
- Used SOAP UI for testing responses of SOAP web service, also generate Mock Service in SOAP UI to simulate and test the functionality of web service
- Frequently wrote SQL on PL/SQL Developer to update and retrieve data from Oracle database
- Developed unit test cases with JUnit
Environment: Java/J2EE, Spring, Hibernate, SOAP Web service, Junit, JQUERY, AJAX, JSP, HTML, CSS, UML, Oracle 10g, Eclipse, Maven
Confidential, Los Angeles, CA
Jr. Front-end Developer
- Involved design the business logic in Java classes using Core Java
- Developed the Web Forms based on the design of web page design team
- Used Firebug for debugging and element styling
- Maintain consistency of front-end for different web browsers