Big Data Engineer Resume Somerville, MA - Hire IT People

SUMMARY

7+ years of experience in computer science, especially Big Data, Scala and Java
Worked in multiple fields including music, finance, insurance, and telecommunication
Certified Cloudera Spark and Hadoop Developer & Oracle Java SE 8 Programmer me
Worked under Big Data ecosystems include Google Cloud Platform, Cloudera CDH and Hortonworks HDP
Extensive knowledge of Big Data with Hadoop, YARN, HDFS, Google Cloud Storage, MapReduce, Spark, Beam, Pub/Sub, Kafka, BigTable, HBase, BigQuery and Hive
Experienced in languages such as Java, Scala, Python, SQL and R
Experienced with Real - time data processing mechanism such as Google Cloud Pub/Sub, Apache Kafka and Spark Streaming
Expertise in Apache Beam Scio(Scala) API, Spark Scala API, MapReduce with Crunch(Java) and Scalding(Scala) to create pipelines for data ETL and model building
Developed data models dat powers content grouping, ranking, mapping and predicting
Involved in writing Query on BigQuery and Hive with UDF for data analyzing and evaluation
Good knowledge of working with Docker containers
Experienced in dependencies management and workflow scheduling in Big Data ecosystem
Conducted data transformation with data formats like Avro, Parquet and Sequence File
Adept at using Sqoop to migrate data between RDBMS, NoSQL databases and HDFS
Worked with NoSQL databases including BigTable, HBase, Cassandra and MongoDB
Worked with RDBMS including MySQL, Oracle SQL
Experienced in back-end Java web application development using Apollo, Hibernate, Spring as well as configuring Web Services with SOAP and REST
Involved in developing Machine Learning algorithms including Linear Regression, Logistic Regression, K-Means, Decision Trees
Worked with Machine Learning libraries including NLTK, Scikit-learn, SciPy
Created data visualization with matplotlib, ggplot and Tableau for reports
Worked with Windows & Linux operating systems for development
Strong knowledge of Linux/Unix Shell Commands
Good knowledge of Unit Testing with ScalaTest, JUnit, MRUnit and Pytest
Familiar with developing environments like JIRA, Confluence and Agile/Scrum
Successfully worked under high pressure and completed projects with tight deadlines
A self-motivated learner, challenger and good team player

TECHNICAL SKILLS

Hadoop Ecosystem\Google Cloud Platform\: YARN, HDFS, HBase, Hive, Sqoop, Pig, \Google Cloud Storage, BigTable, BigQuery, \Flume, Zookeeper, Oozie\ DataFlow, Dataproc, Pub/Sub, Apache Beam\

Web Development\ Programming Languages\: Java EE, Apollo, Hibernate, Spring, SOAP, \Java, Scala, Python, SQL, R, JavaScript \REST\

Database\ Unit Testing\: MySQL, Oracle, MongoDB, HBase, Cassandra\ ScalaTest, JUnit, MRUnit, Pytest\

Data Analysis & Visualization\Data Pipeline Development\: R, Python, Scala, NLTK, Scikit-learn, \Spark, Scio, Scalding, Crunch, MapReduce\ggplot, matplotlib, Tableau\Pub/Sub, Kafka\

PROFESSIONAL EXPERIENCE

Confidential, Somerville, MA

Big Data Engineer

Responsibilities:

Worked on Google Cloud Platform(GCP) with Agile development cycle
Developed pipelines with Scio(Scala) API of Apache Beam for data ETL and model building
Improved and modified MapReduce Scalding(Scala) and Crunch(Java) data pipelines with models for music content grouping/ranking and third party catalog mapping
Designed Google BigTable schema with Turtle to store third party metadata
Developed real-time models with Scala and Pub/Sub to consolidate third party metadata with Confidential entities in BigTable based system to support in-client features
Migrated data pipelines from on premise Hadoop system onto GCP, dockerized and configured Spark jobs to run with Cloud Dataproc
Built back-end endpoints with Java and Apollo library for accessing grouped music contents
Modified REST services for transcoding and ingesting data to Google Cloud Storage
Worked with data scientists to explore log data and extract content to power search model
Wrote Python code with Luigi modules to handle workflow dependencies resolution
Configured with Styx to manage Docker container executions and batch job scheduling
Wrote ad-hoc SQL on Google Big Query for analyzing and evaluating large datasets
Performed unit testing with ScioTest, ScalaTest and JUnit
Maintained system with multiple projects dat powers more TEMPthan 300 daily partitioned data storage endpoints
Involved in designing products, evaluating work scopes and defining testing metrics
Provided technical supports for other teams such as music editors, data scientists, downstream data consumers and teams responsible for in-client features
Used Git for version control, Jenkins for continuous integration and JIRA for project tracking

Environment: Google Cloud Platform, Google Cloud Storage, BigTable, BigQuery, Pub/Sub, Hadoop, HDFS, Cassandra, Docker, Luigi, Styx, Apache Beam, Scio, Spark, MapReduce, Scalding, Crunch, Scala, Java, Python, SQL

Confidential, Newark, NJ

Big Data Engineer

Responsibilities:

Worked on CDH 5.x with Agile development cycle
Designed HBase schema for the ingestion of streaming time series data
Developed real-time data pipelines with Kafka to receive data from multi-sources
Configured Spark Streaming with Kafka to clean, aggregate real-time data, tan store processed data into HBase
Wrote Spark and Spark SQL in Scala for data ETL and model building, also changed pipelines from Java MapReduce job to Spark
Developed time series data analysis models with PySpark
Used Sqoop to move data between Oracle and HBase
Integrated HBase with Hive, and wrote HiveQL for data analysis and updates
Transferred data from HDFS to Tableau and created visualization for report
Deployed workflows in Oozie for workflow scheduling and executions
Performed unit testing with ScalaTest, ScalaCheck, JUnit and Pytest
Used Git for version control, JIRA for project tracking and Jenkins for continuous integration

Environment: Cloudera CDH, AWS, Hadoop, HDFS, HBase, Hive, Oracle, Sqoop, Oozie, Kafka, Spark, Spark Streaming, MapReduce, Scala, Python, Java, Tableau

Confidential, New York, NY

Big Data Engineer

Responsibilities:

Worked on Hortonworks Data Platform 2.x with Agile methodology
Designed and built Hive databases with partitioned and bucketed tables
Extracted data from MongoDB through MongoDB Connector for Hadoop
Used Sqoop to transfer data from RDBMS to HDFS
Worked with multiple data formats (Avro, Parquet, CSV, JSON)
Wrote customized Hive UDFs, HiveQL for data retrieval and analyzing
Worked with Flume to capture web server log data
Developed PIG Latin scripts to transform data and load into HDFS
Implemented predictive and statistical model in Python and R with Hadoop MapReduce
Created data visualization in Python, R and Tableau dashboard for report
Performed unit testing using Pytest, JUnit and MRUnit
Used Git for version control and JIRA for project tracking

Environment: Hortonworks HDP, Hadoop, HDFS, Hive, Pig, Flume, Sqoop, Oracle, MySQL, MongoDBJava, Python, R, HiveQL, Tableau

Confidential, New York, NY

Big Data Engineer

Responsibilities:

Developed MapReduce jobs in Java for data cleaning and transformation, and programmed in R for data analysis
Wrote HiveQL and Pig Latin for data retrieval
Involved in configuring Hadoop tools including Hive, Sqoop, Pig and R
Extracted and loaded data from RDBMS to HDFS using Sqoop
Used Flume to transfer log source files to HDFS
Performed unit testing using JUnit and MRUnit

Environment: Hadoop, AWS, HDFS, YARN, MapReduce, Hive, Pig, Flume, Sqoop, Zookeeper, Oracle, Java, R

Confidential

Java/J2EE Developer

Responsibilities:

Developed Web pages using Struts view component JSP, JavaScript, HTML, jQuery, AJAX, to create the user interface views from 3rd party applications
Implemented client-side application to invoke SOAP and REST Web Services
Developed and configured the Java beans using Spring and Hibernate framework

Environment: Java, JSP, JavaScript, HTML, jQuery, Hadoop, MapReduce, SOAP, REST, Hibernate

Confidential

Front End Developer

Responsibilities:

Developed HTML, CSS, JavaScript, jQuery with JavaScript libraries
Designed browser-based web dat supports many different screen resolutions and performed user interface testing to check the compatibility of web pages
Worked with Java back-end, utilizing AJAX to pull in and parse XML

Environment: HTML, JavaScript, Java, CSS, AJAX, jQuery, XML

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Somerville, MA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship