Big Data Engineer Resume
Somerville, MA
SUMMARY
- 7+ years of experience in computer science, especially Big Data, Scala and Java
- Worked in multiple fields including music, finance, insurance, and telecommunication
- Certified Cloudera Spark and Hadoop Developer & Oracle Java SE 8 Programmer me
- Worked under Big Data ecosystems include Google Cloud Platform, Cloudera CDH and Hortonworks HDP
- Extensive knowledge of Big Data with Hadoop, YARN, HDFS, Google Cloud Storage, MapReduce, Spark, Beam, Pub/Sub, Kafka, BigTable, HBase, BigQuery and Hive
- Experienced in languages such as Java, Scala, Python, SQL and R
- Experienced with Real - time data processing mechanism such as Google Cloud Pub/Sub, Apache Kafka and Spark Streaming
- Expertise in Apache Beam Scio(Scala) API, Spark Scala API, MapReduce with Crunch(Java) and Scalding(Scala) to create pipelines for data ETL and model building
- Developed data models dat powers content grouping, ranking, mapping and predicting
- Involved in writing Query on BigQuery and Hive with UDF for data analyzing and evaluation
- Good knowledge of working with Docker containers
- Experienced in dependencies management and workflow scheduling in Big Data ecosystem
- Conducted data transformation with data formats like Avro, Parquet and Sequence File
- Adept at using Sqoop to migrate data between RDBMS, NoSQL databases and HDFS
- Worked with NoSQL databases including BigTable, HBase, Cassandra and MongoDB
- Worked with RDBMS including MySQL, Oracle SQL
- Experienced in back-end Java web application development using Apollo, Hibernate, Spring as well as configuring Web Services with SOAP and REST
- Involved in developing Machine Learning algorithms including Linear Regression, Logistic Regression, K-Means, Decision Trees
- Worked with Machine Learning libraries including NLTK, Scikit-learn, SciPy
- Created data visualization with matplotlib, ggplot and Tableau for reports
- Worked with Windows & Linux operating systems for development
- Strong knowledge of Linux/Unix Shell Commands
- Good knowledge of Unit Testing with ScalaTest, JUnit, MRUnit and Pytest
- Familiar with developing environments like JIRA, Confluence and Agile/Scrum
- Successfully worked under high pressure and completed projects with tight deadlines
- A self-motivated learner, challenger and good team player
TECHNICAL SKILLS
Hadoop Ecosystem\Google Cloud Platform\: YARN, HDFS, HBase, Hive, Sqoop, Pig, \Google Cloud Storage, BigTable, BigQuery, \Flume, Zookeeper, Oozie\ DataFlow, Dataproc, Pub/Sub, Apache Beam\
Web Development\ Programming Languages\: Java EE, Apollo, Hibernate, Spring, SOAP, \Java, Scala, Python, SQL, R, JavaScript \REST\
Database\ Unit Testing\: MySQL, Oracle, MongoDB, HBase, Cassandra\ ScalaTest, JUnit, MRUnit, Pytest\
Data Analysis & Visualization\Data Pipeline Development\: R, Python, Scala, NLTK, Scikit-learn, \Spark, Scio, Scalding, Crunch, MapReduce\ggplot, matplotlib, Tableau\Pub/Sub, Kafka\
PROFESSIONAL EXPERIENCE
Confidential, Somerville, MA
Big Data Engineer
Responsibilities:
- Worked on Google Cloud Platform(GCP) with Agile development cycle
- Developed pipelines with Scio(Scala) API of Apache Beam for data ETL and model building
- Improved and modified MapReduce Scalding(Scala) and Crunch(Java) data pipelines with models for music content grouping/ranking and third party catalog mapping
- Designed Google BigTable schema with Turtle to store third party metadata
- Developed real-time models with Scala and Pub/Sub to consolidate third party metadata with Confidential entities in BigTable based system to support in-client features
- Migrated data pipelines from on premise Hadoop system onto GCP, dockerized and configured Spark jobs to run with Cloud Dataproc
- Built back-end endpoints with Java and Apollo library for accessing grouped music contents
- Modified REST services for transcoding and ingesting data to Google Cloud Storage
- Worked with data scientists to explore log data and extract content to power search model
- Wrote Python code with Luigi modules to handle workflow dependencies resolution
- Configured with Styx to manage Docker container executions and batch job scheduling
- Wrote ad-hoc SQL on Google Big Query for analyzing and evaluating large datasets
- Performed unit testing with ScioTest, ScalaTest and JUnit
- Maintained system with multiple projects dat powers more TEMPthan 300 daily partitioned data storage endpoints
- Involved in designing products, evaluating work scopes and defining testing metrics
- Provided technical supports for other teams such as music editors, data scientists, downstream data consumers and teams responsible for in-client features
- Used Git for version control, Jenkins for continuous integration and JIRA for project tracking
Environment: Google Cloud Platform, Google Cloud Storage, BigTable, BigQuery, Pub/Sub, Hadoop, HDFS, Cassandra, Docker, Luigi, Styx, Apache Beam, Scio, Spark, MapReduce, Scalding, Crunch, Scala, Java, Python, SQL
Confidential, Newark, NJ
Big Data Engineer
Responsibilities:
- Worked on CDH 5.x with Agile development cycle
- Designed HBase schema for the ingestion of streaming time series data
- Developed real-time data pipelines with Kafka to receive data from multi-sources
- Configured Spark Streaming with Kafka to clean, aggregate real-time data, tan store processed data into HBase
- Wrote Spark and Spark SQL in Scala for data ETL and model building, also changed pipelines from Java MapReduce job to Spark
- Developed time series data analysis models with PySpark
- Used Sqoop to move data between Oracle and HBase
- Integrated HBase with Hive, and wrote HiveQL for data analysis and updates
- Transferred data from HDFS to Tableau and created visualization for report
- Deployed workflows in Oozie for workflow scheduling and executions
- Performed unit testing with ScalaTest, ScalaCheck, JUnit and Pytest
- Used Git for version control, JIRA for project tracking and Jenkins for continuous integration
Environment: Cloudera CDH, AWS, Hadoop, HDFS, HBase, Hive, Oracle, Sqoop, Oozie, Kafka, Spark, Spark Streaming, MapReduce, Scala, Python, Java, Tableau
Confidential, New York, NY
Big Data Engineer
Responsibilities:
- Worked on Hortonworks Data Platform 2.x with Agile methodology
- Designed and built Hive databases with partitioned and bucketed tables
- Extracted data from MongoDB through MongoDB Connector for Hadoop
- Used Sqoop to transfer data from RDBMS to HDFS
- Worked with multiple data formats (Avro, Parquet, CSV, JSON)
- Wrote customized Hive UDFs, HiveQL for data retrieval and analyzing
- Worked with Flume to capture web server log data
- Developed PIG Latin scripts to transform data and load into HDFS
- Implemented predictive and statistical model in Python and R with Hadoop MapReduce
- Created data visualization in Python, R and Tableau dashboard for report
- Performed unit testing using Pytest, JUnit and MRUnit
- Used Git for version control and JIRA for project tracking
Environment: Hortonworks HDP, Hadoop, HDFS, Hive, Pig, Flume, Sqoop, Oracle, MySQL, MongoDBJava, Python, R, HiveQL, Tableau
Confidential, New York, NY
Big Data Engineer
Responsibilities:
- Developed MapReduce jobs in Java for data cleaning and transformation, and programmed in R for data analysis
- Wrote HiveQL and Pig Latin for data retrieval
- Involved in configuring Hadoop tools including Hive, Sqoop, Pig and R
- Extracted and loaded data from RDBMS to HDFS using Sqoop
- Used Flume to transfer log source files to HDFS
- Performed unit testing using JUnit and MRUnit
Environment: Hadoop, AWS, HDFS, YARN, MapReduce, Hive, Pig, Flume, Sqoop, Zookeeper, Oracle, Java, R
Confidential
Java/J2EE Developer
Responsibilities:
- Developed Web pages using Struts view component JSP, JavaScript, HTML, jQuery, AJAX, to create the user interface views from 3rd party applications
- Implemented client-side application to invoke SOAP and REST Web Services
- Developed and configured the Java beans using Spring and Hibernate framework
Environment: Java, JSP, JavaScript, HTML, jQuery, Hadoop, MapReduce, SOAP, REST, Hibernate
Confidential
Front End Developer
Responsibilities:
- Developed HTML, CSS, JavaScript, jQuery with JavaScript libraries
- Designed browser-based web dat supports many different screen resolutions and performed user interface testing to check the compatibility of web pages
- Worked with Java back-end, utilizing AJAX to pull in and parse XML
Environment: HTML, JavaScript, Java, CSS, AJAX, jQuery, XML