We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Somerville, MA

SUMMARY

  • 7+ years of experience in computer science, especially Big Data, Scala and Java
  • Worked in multiple fields including music, finance, insurance, and telecommunication
  • Certified Cloudera Spark and Hadoop Developer & Oracle Java SE 8 Programmer me
  • Worked under Big Data ecosystems include Google Cloud Platform, Cloudera CDH and Hortonworks HDP
  • Extensive knowledge of Big Data with Hadoop, YARN, HDFS, Google Cloud Storage, MapReduce, Spark, Beam, Pub/Sub, Kafka, BigTable, HBase, BigQuery and Hive
  • Experienced in languages such as Java, Scala, Python, SQL and R
  • Experienced with Real - time data processing mechanism such as Google Cloud Pub/Sub, Apache Kafka and Spark Streaming
  • Expertise in Apache Beam Scio(Scala) API, Spark Scala API, MapReduce with Crunch(Java) and Scalding(Scala) to create pipelines for data ETL and model building
  • Developed data models dat powers content grouping, ranking, mapping and predicting
  • Involved in writing Query on BigQuery and Hive with UDF for data analyzing and evaluation
  • Good knowledge of working with Docker containers
  • Experienced in dependencies management and workflow scheduling in Big Data ecosystem
  • Conducted data transformation with data formats like Avro, Parquet and Sequence File
  • Adept at using Sqoop to migrate data between RDBMS, NoSQL databases and HDFS
  • Worked with NoSQL databases including BigTable, HBase, Cassandra and MongoDB
  • Worked with RDBMS including MySQL, Oracle SQL
  • Experienced in back-end Java web application development using Apollo, Hibernate, Spring as well as configuring Web Services with SOAP and REST
  • Involved in developing Machine Learning algorithms including Linear Regression, Logistic Regression, K-Means, Decision Trees
  • Worked with Machine Learning libraries including NLTK, Scikit-learn, SciPy
  • Created data visualization with matplotlib, ggplot and Tableau for reports
  • Worked with Windows & Linux operating systems for development
  • Strong knowledge of Linux/Unix Shell Commands
  • Good knowledge of Unit Testing with ScalaTest, JUnit, MRUnit and Pytest
  • Familiar with developing environments like JIRA, Confluence and Agile/Scrum
  • Successfully worked under high pressure and completed projects with tight deadlines
  • A self-motivated learner, challenger and good team player

TECHNICAL SKILLS

Hadoop Ecosystem\Google Cloud Platform\: YARN, HDFS, HBase, Hive, Sqoop, Pig, \Google Cloud Storage, BigTable, BigQuery, \Flume, Zookeeper, Oozie\ DataFlow, Dataproc, Pub/Sub, Apache Beam\

Web Development\ Programming Languages\: Java EE, Apollo, Hibernate, Spring, SOAP, \Java, Scala, Python, SQL, R, JavaScript \REST\

Database\ Unit Testing\: MySQL, Oracle, MongoDB, HBase, Cassandra\ ScalaTest, JUnit, MRUnit, Pytest\

Data Analysis & Visualization\Data Pipeline Development\: R, Python, Scala, NLTK, Scikit-learn, \Spark, Scio, Scalding, Crunch, MapReduce\ggplot, matplotlib, Tableau\Pub/Sub, Kafka\

PROFESSIONAL EXPERIENCE

Confidential, Somerville, MA

Big Data Engineer

Responsibilities:

  • Worked on Google Cloud Platform(GCP) with Agile development cycle
  • Developed pipelines with Scio(Scala) API of Apache Beam for data ETL and model building
  • Improved and modified MapReduce Scalding(Scala) and Crunch(Java) data pipelines with models for music content grouping/ranking and third party catalog mapping
  • Designed Google BigTable schema with Turtle to store third party metadata
  • Developed real-time models with Scala and Pub/Sub to consolidate third party metadata with Confidential entities in BigTable based system to support in-client features
  • Migrated data pipelines from on premise Hadoop system onto GCP, dockerized and configured Spark jobs to run with Cloud Dataproc
  • Built back-end endpoints with Java and Apollo library for accessing grouped music contents
  • Modified REST services for transcoding and ingesting data to Google Cloud Storage
  • Worked with data scientists to explore log data and extract content to power search model
  • Wrote Python code with Luigi modules to handle workflow dependencies resolution
  • Configured with Styx to manage Docker container executions and batch job scheduling
  • Wrote ad-hoc SQL on Google Big Query for analyzing and evaluating large datasets
  • Performed unit testing with ScioTest, ScalaTest and JUnit
  • Maintained system with multiple projects dat powers more TEMPthan 300 daily partitioned data storage endpoints
  • Involved in designing products, evaluating work scopes and defining testing metrics
  • Provided technical supports for other teams such as music editors, data scientists, downstream data consumers and teams responsible for in-client features
  • Used Git for version control, Jenkins for continuous integration and JIRA for project tracking

Environment: Google Cloud Platform, Google Cloud Storage, BigTable, BigQuery, Pub/Sub, Hadoop, HDFS, Cassandra, Docker, Luigi, Styx, Apache Beam, Scio, Spark, MapReduce, Scalding, Crunch, Scala, Java, Python, SQL

Confidential, Newark, NJ

Big Data Engineer

Responsibilities:

  • Worked on CDH 5.x with Agile development cycle
  • Designed HBase schema for the ingestion of streaming time series data
  • Developed real-time data pipelines with Kafka to receive data from multi-sources
  • Configured Spark Streaming with Kafka to clean, aggregate real-time data, tan store processed data into HBase
  • Wrote Spark and Spark SQL in Scala for data ETL and model building, also changed pipelines from Java MapReduce job to Spark
  • Developed time series data analysis models with PySpark
  • Used Sqoop to move data between Oracle and HBase
  • Integrated HBase with Hive, and wrote HiveQL for data analysis and updates
  • Transferred data from HDFS to Tableau and created visualization for report
  • Deployed workflows in Oozie for workflow scheduling and executions
  • Performed unit testing with ScalaTest, ScalaCheck, JUnit and Pytest
  • Used Git for version control, JIRA for project tracking and Jenkins for continuous integration

Environment: Cloudera CDH, AWS, Hadoop, HDFS, HBase, Hive, Oracle, Sqoop, Oozie, Kafka, Spark, Spark Streaming, MapReduce, Scala, Python, Java, Tableau

Confidential, New York, NY

Big Data Engineer

Responsibilities:

  • Worked on Hortonworks Data Platform 2.x with Agile methodology
  • Designed and built Hive databases with partitioned and bucketed tables
  • Extracted data from MongoDB through MongoDB Connector for Hadoop
  • Used Sqoop to transfer data from RDBMS to HDFS
  • Worked with multiple data formats (Avro, Parquet, CSV, JSON)
  • Wrote customized Hive UDFs, HiveQL for data retrieval and analyzing
  • Worked with Flume to capture web server log data
  • Developed PIG Latin scripts to transform data and load into HDFS
  • Implemented predictive and statistical model in Python and R with Hadoop MapReduce
  • Created data visualization in Python, R and Tableau dashboard for report
  • Performed unit testing using Pytest, JUnit and MRUnit
  • Used Git for version control and JIRA for project tracking

Environment: Hortonworks HDP, Hadoop, HDFS, Hive, Pig, Flume, Sqoop, Oracle, MySQL, MongoDBJava, Python, R, HiveQL, Tableau

Confidential, New York, NY

Big Data Engineer

Responsibilities:

  • Developed MapReduce jobs in Java for data cleaning and transformation, and programmed in R for data analysis
  • Wrote HiveQL and Pig Latin for data retrieval
  • Involved in configuring Hadoop tools including Hive, Sqoop, Pig and R
  • Extracted and loaded data from RDBMS to HDFS using Sqoop
  • Used Flume to transfer log source files to HDFS
  • Performed unit testing using JUnit and MRUnit

Environment: Hadoop, AWS, HDFS, YARN, MapReduce, Hive, Pig, Flume, Sqoop, Zookeeper, Oracle, Java, R

Confidential

Java/J2EE Developer

Responsibilities:

  • Developed Web pages using Struts view component JSP, JavaScript, HTML, jQuery, AJAX, to create the user interface views from 3rd party applications
  • Implemented client-side application to invoke SOAP and REST Web Services
  • Developed and configured the Java beans using Spring and Hibernate framework

Environment: Java, JSP, JavaScript, HTML, jQuery, Hadoop, MapReduce, SOAP, REST, Hibernate

Confidential

Front End Developer

Responsibilities:

  • Developed HTML, CSS, JavaScript, jQuery with JavaScript libraries
  • Designed browser-based web dat supports many different screen resolutions and performed user interface testing to check the compatibility of web pages
  • Worked with Java back-end, utilizing AJAX to pull in and parse XML

Environment: HTML, JavaScript, Java, CSS, AJAX, jQuery, XML

Hire Now