We provide IT Staff Augmentation Services!

Big Data Engineer Resume

5.00/5 (Submit Your Rating)

Piscataway, NJ

SUMMARY:

  • Experience in Big - Data Developer\Engineer, Data Analyst with data modeling, programming, data mining, large scale data acquisition, transformation and cleaning of structured and unsecured data, data integration, data quality, and database architecture and data governance. Experience with automate data extraction and processing of data for modeling purposes where applicable, and experience with big data storage systems that enable production and research use-cases such as HDFS. Strong experience in building, management, optimizing and customizing Data Warehouse, Data-Streaming and ETL products and solutions applying best practices.
  • Experience with design and implement distributed data processing pipelines using Spark, Hive, Scala, and other tools and languages prevalent in the Hadoop ecosystem. Ability to design and implement end to end solution. Experience with real-time Distributed Stream Processing frameworks for Fast & Big Data like Spark Streaming, Impala, Kudu, Kafka, Storm, Sqoop. Good knowledge in using job scheduling and monitoring tools like Oozie and Zookeeper. Strong experience with data analyzes framework R package development with CPLEX and deployment in big data platform such as Hadoop and SparkR.
  • Experience in Hadoop administration activities such as installation and configuration of clusters using Apache, Cloudera and AWS. Experienced in designing, built, and deploying a multitude application utilizing almost all the AWS stack (Including EC2, R53, S3, RDS, DynamoDB, SQS, IAM, and EMR), focussing on high-availability, fault tolerance, and auto-scaling.
  • Experience in working with RDBMS such as Oracle, MySQL, Kudu, Impala and experience with open source NOSQL technologies such as HBase, MongoDB, and experience in writing HiveQL Queries and Spark SQL for preprocessing and analyzing large volumes of data.
  • Experience with OO Software Requirement Specification, OO Software Design techniques including UML Use Cases, Sequence Diagrams and Class Diagrams and develop them using Rational Rose, Visual Paradigm and Visio. Experience with user stories and sprint tasks development in Agile method SCRUM. Experience with Version Control Tools such as Git, CVS, SVN, JIRA and Jenkins.
  • Expert in developing methods, design and evaluation criteria in full backend stack development framework such as Spring Boot, Spring MVC, Hibernate, Express.js, ASP.net, and back-end development technology such as JSP, Servlet, JDBC, and strong experience with RESTful APIs design, and have knowledge of data serialization and familiar with data formats including SequenceFile, Avro, Parquet, XML and JSON. Experience with front-end development framework AngularJs and front-end development technology such as HTML, CSS, JavaScript, Bootstrap, jQuery, AJAX.
  • Experience with software validation techology such as JUnit, Testthat, ScalaTest and project documentaion verification.
  • Hands on experience in solving software design issues by applying design patterns including Singleton Pattern, Business Delegator Pattern, Controller Pattern, MVC Pattern, Factory Pattern, Abstract Factory Pattern, DAO Pattern and Template Pattern.
  • Experience with building and maintaining the Confidential (WMS), Transportation Management System (TMS), Order Management System (OMS), Employee Management System (EMS).

TECHNICAL SKILLS:

Programming Languages: \ Version Control: \

Java, C++, C#, Scala, R, Haskell, PHP, \ Git, CVS, SVN, JIRA, Jenkins\

Javascript\

Hadoop/Spark Ecosystem: \ Database: \

Hadoop 2.7.3, MapReduce, HDFS, Yarn, Kudu \ Oracle 11g, MySQL 5.5.51, MongoDB

Spark Core 1.6.2, SparkSQL, SparkStreaming, \

Hive 1.2.1, Impala, Kafka, Storm, Sqoop, Hbase \

R package development\ Front-end Development\

RCplex, Testthat, RShiny, Rjson, Bootstrap \ AngularJs 1.6.9, HTML 5, CSS 3, JavaScript, \

Resample\ ThreeJS, jQuery 2.0.2, Bootstrap 3, AJAX, JSON\

Back-end Development\ Software Engineering\

Spring 5.0.4, Spring MVC, Spring Boot, \ Software requirement specfication, OO design, \

Hibernate 5.3.1, Node.js, Express.js 4.16.4, \ UML, Automatically Test, Software code \

PHP, ASP.Net 4.7.1, RESTful API, GSON, \ validation, Software documentation verification\

Servlet, JSP, JDBC, JMail, JUnit, ScalaTest\

PROFESSIONAL EXPERIENCE:

Confidential, Piscataway, NJ

Big Data Engineer

Responsibilities:

  • Focused on the re-development with Java Spring MVC and Hibernate in the back-end of the functionalities of Purchase Management, Inspection Management, including purchase plan, purchase application, purchase order, receipt notice, inspection application, purchase inspection, people inspection, seasonal inspection, annual inspection, meeting application and etc.
  • Used NGINX to listen and allocate the request first and record the log data as the source, and used Flume to collect and sink log files into the certain folder, named by the date and the type of its behaviour, into HDFS as CSV, Parquet or other file formats.
  • Used Spark Core and Spark SQL to read log files and do ETL to the log data, then, wrote the cleaned data into numerical tables and saved them into Hive as Parquet and data analyst group wrote algorithm to analyze them.
  • Moved Relational Database data using Sqoop into HDFS and HBase Dynamic partition tables using staging tables. Performed aggregations using Spark by loading data from HDFS, and worked with different file formats from HDFS, such as Parquet, Avro, etc. Increased the job performance by implementing parallelism in Spark.
  • Transformed RDDs to DataFrames for querying and analytical purposes. Populated DataFrames to Hive metastore and imported data into managed tables.
  • Used HBase for storing data as a backup, and used Hue for Hive queries and created partitions according to day using Hive to improve performance. Developed, validated and maintained HiveQL queries, implemented Partitions, Bucketing concepts in Hive and designed both Managed and External tables for optimized performance, and designed UDFs in Hive.
  • Used JUnit and ScalaTest to test the programes.
  • Used Agile method, SCRUM. Helped SCRUM Master designed some of the user stories and sprint tasks, and helped SCRUM Master wrote some parts in the Requirement Specification Document and Software Design Documentation such as the Sequence Diagrams, Activity Diagrams, Communication Diagrams, State Diagrams of the Purchase Inspection, People Inspection, Meeting Application and etc.
  • Used GIT to do version control.

Enviornment: Spring MVC, hibernate 5.3.1, Hadoop 2.6, Mapreduce, HDFS, Spark 1.6, Kafka 2.10 - 0.10.0.1, Hive 0.14, Hbase 3.0.0, Sqoop 1.4.7, NGINX PLUS R8, Flume 1.7.0, ETL, Ubuntu, Oracle 11g, Zookeeper 3.4.10, Java 8, JUnit 4.8, ScalaTest, SCRUM

Confidential, Winona, MN

Big data Developer

Responsibilities:

  • Used Flume to listen and collect real-time log data from the requests which send to the server and sink log-data into Message Queue of the Kafka. Then, use consumer groups to consume the real-time log data into Spark Streaming.
  • Used Kafka to listen users’ behaviour and record them into Redis, then processing and analyze them in the Spark.
  • Used Spark Streaming to do real-time streaming processing to the log data from the Message Queue, and process data with both stateless and stateful transformations with different log data.
  • Wrote UDFs by using Spark SQL and Spark Core to do ETL processes including data processing and data storage, transform processed data into tables and store them into Kudu through Impala for scalable storage and fast query. Used Impala to analyze the data.
  • Built pipeline which focused on OMS and WMS part, including extract and process the data from the submitted order from customers, sending the order information to the TMS and get the transportation information to OMS, scheduling the people resources from ERP and transform data to the WMS to schedule the store position of the order, etc.
  • Designed and created of Impala tables and worked on various performance optimizations like Partition, Bucketing in Impala.
  • Implemented Impala custom UDFs and Analyzed large data sets by running SQL to achieve comprehensive data analysis.
  • Migrated of MapReduce jobs and Impala queries into Spark transformations and actions to improve the performance.
  • Utilized JDBC to import and output data between MySQL database and HDFS.
  • Configure the Sqoop incremental import job for importing the updated input data.
  • Convert raw data with sequence data format, such as Avro, and Parquet to reduce data processing time and increase data transferring efficiency through the network.
  • Wrote unit test by using JUnit and ScalaTest to do Functionality Validation.
  • Involved in application performance tuning and troubleshooting.
  • Collaborate and tracking the work with Git and JIRA.
  • Actively participated and provided feedback constructively during daily Stand up meetings and weekly Iterative review meetings with SCRUM development.

Environment: Hadoop 2.5, Mapreduce, HDFS, Spark 1.6, Kafka 2.10 - 0.10.0.1, Impala 2.1.0, Kudu 1.6.0, Flume 1.7.0, Redis 4.0.2, ETL, Ubuntu, Oracle 11g, Zookeeper 3.4.10, Java 8, JUnit 4.8, ScalaTest, SCRUM

GUI Tool for using BOSS Framework, La Crosse, WI

Data Analyst

Responsibilities:

  • Created the R package and wrote the ShinyUI and the ShinyServer to build the RShiny web application structure.
  • Wrote the Spark Core and Spark SQL in SparkR to extract, transform and load the data, and wrote them into Hive.
  • Implemented the BOSS Framework by using R, Rcplex, boot to do the Bootstrap and Resample the cleaned data and run the BOSS Framework to get the origin solution and the Alternative Optimal Solutions and save them as a list of new RDDs.
  • Wrote Spark Core to analyze the result of the solutions and generate the tables and the diagrams and save them into HDFS in the folder named by the ip address of the user.
  • Build the whole server in Ubuntu, installed and configured the Hadoop, Spark framework, R shell, Cplex and all the R packages used in the project.
  • Used the Testthat package to do Unit test.
  • Used the GIT to do version control.
  • Wrote the whole Requirement Specification Document and Software Design Document and the Thesis.

Environment: R 3.4.0, BOSS Framework, Bootstrap Resample, Alternative Optimal Solutions, Cplex 12.7.1, Rjson 0.2.18, RShiny 1.0.5, Testthat 2.1.1, Spark 1.7, SparkR, Hadoop 2.6, Mapreduce, HDFS, Ubuntu, GIT

Confidential, Chicago, IL

Data Analyst

Responsibilities:

  • Experienced on loading and transforming of large sets of structured and semi structured data
  • Created Hive tables, analyzed data with Hive Queries, and written Hive UDFs
  • Experience in using Partitions, bucketing to create Hive tables for performance optimization
  • Experience in writing scripts and using Crontab for data preprocessing
  • Migrated data between RDBMS and HDFS/Hive with Sqoop
  • Experience in defining job flows and wrote simple to complex MapReduce jobs
  • Cluster coordination services through Zookeeper
  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files
  • Involved in Unit testing using JUnit and MRUnit
  • Create shell scripts using Python for administration, maintenance and troubleshooting
  • Involved in reviewing Functional requirements and designing solutions
  • Documented systems process and procedures for future references
  • Involved in gathering the requirements, designing, development and testing

Environment: Hadoop 2.5, Mapreduce, HDFS, Spark 1.6, Hive 0.14, Pig 0.15, Sqoop 1.4.2, NGINX PLUS R7, Flume 1.6.0, ETL, Ubuntu, Oracle 11g, Zookeeper 3.4, Java, JUnit 4.8, Python iFlytek Co., Ltd., Hefei, Anhui, China

Software Engineer

Confidential

Responsibilities:

  • Wrote Controller by using Spring controller and Servlet to handle request following RESTful APIs, including register driver, change the status of a driver, update information of a driver such as name, vehicle information, address, scores, etc, and send back the JSON data to the front-end with response.
  • Used Spring Object relational mapping (ORM) and company’s custom Hibernate for persistence in database, created DAO's. Implemented ORM Hibernate framework, for interacting with Database.
  • Used SCRUM to develop and deploy projects in three months and used GIT as version control tool, to keep track of all the work & all changes.
  • Did stand-up meeting twice every day and used SSH tools Putty to maintain the database every day.
  • Traveled to the clients’ location and helped deploying products to their server and did more custom changes because of the difference of data type etc.

Environment: Java, Spring MVC, Hibernate 4.0, RESTful API, MySQL 5.1.54, SCRUM, GIT, Putty, GSON

Software Enginee r

Confidential

Responsibilities:

  • Designed the specific object detection algorithm and do experiment in MATLAB. This algorithm including image pre-processing algorithm, set the threshold and use Image Binarization. Then it will be using Harris Corner Detection to find and remove all the point that might be human from the crowd. After that, it uses Hough Transform and Canny Edge Detection to find out the zone of the flag (banner) from the image.
  • Wrote the algorithm into Java back-end and give it to Big-data group and they will write a Mapreduce version of this algorithm.
  • Wrote the Servlet handler to accept the submit of the form of the specific object detection functionality. Save the image or video into the HDFS and record the information of its location then save the information into the MySQL through JDBC.
  • Pass image or video to the Mapreduce, analyze them and get the result. Process the result and extract the information including the boundary of the area of the flag (banner) in the image or each frame of the video, the RGB of the flag (banner), the context in the flag (banner), etc.
  • Wrote the result into the JSP and pass the result back to the front-end.
  • Wrote other Servlet handlers to accept some sub functionalities submit of the form, such as get the part of the information of the processed result or the past processed result which saved in the user’s storage, through JDBC and write the result into JSP and pass back to the front-end.
  • Used SCRUM to do Agile develop and use GIT to do version control

Environment: Harris Corner Detection, Image Binarization, Hough Transform, Canny Edge Detection, Matlab, Java, Servlet, JSP, JDBC, MySQL 5.1.54, Hadoop 2.5, MapReduce, HDFS, Oozie, SCRUM, GIT, OAuth

We'd love your feedback!