We provide IT Staff Augmentation Services!

Big Data Engineer Resume

3.00/5 (Submit Your Rating)

Renton, WA

SUMMARY

  • 6+ years of working experience in IT Industry including Big Data and Java/J2EE
  • Worked in various domains included E - commerce, telecommunication, finance and investment
  • Strong experience in Hadoop 2.3 Eco system technologies such as in HDFS, YARN, Map Reduce, Spark 1.3+, Hive 1.0+, Pig 0.14+, HBase 0.98, Zookeeper 3.4+, Oozie 3.0+, Flume 1.3+, Sqoop 1.3+, Impala 1.2+ and Kafka 0.8+
  • Experienced in building, maintaining multiple Hadoop clusters of different sizes and configuration and setting up the rack topology for large clusters also in Hadoop Administration/Architecture/Developer
  • Developed Java 6+ and Scala 2.10+ applications on Hadoop and Spark Streaming for high-volume and real-time data processing
  • Expertise in Spark Streaming and Spark SQL with Scala
  • Exceptional skills with NoSQL databases such as HBase and Cassandra
  • Experienced in writing Sqoop, HiveQL and Pig scripts as well as the UDFs to do ELT process
  • Utilized Kafka and Flume to gain real-time data stream and save it in HDFS and HBase from the different data sources
  • Experience withAWSCloud IAM, Data pipeline,EMR, S3, EC2,AWSCLI, and other services.
  • Experience with Azure Function, EventHub, Application Insight, Cosmos DB
  • Experience other Hadoop ecosystem tools in jobs such as ZooKeeper, Oozie, Implala
  • Experience in using notebooks like Zeppelin and Jupyter
  • Experience in all the phases of Data warehouse life cycle involving requirement analysis, design, coding, testing, and deployment
  • Strong in core Java, data structure and Java components like Collections Framework, Exception handling, I/O system, and Multithreading
  • Earned designation with J2EE development by using frameworks such as Spring MVC and Hibernate 3 & 4, as well as using Web services such as SOAP and REST
  • Familiar in Agile/Scrum Development and Test Driven Development (TDD)
  • Extensive Experienced in Unit Testing with JUnit, MRUnit, Pytest
  • Worked in development environment tools such as Git, JIRA, Jenkins, Agile/Scrum and Waterfall
  • Self-motivated, teamwork, working under high pressure, working in several projects simultaneously, dynamic problem-solving, adept at working with minimal to supervision, highly-caliber teams of professionals

TECHNICAL SKILLS

Hadoop Eco Systems\Web Technologies\: Hadoop 2.7.0+, MapReduce, HBase 0.98\SOAP, REST, JSP 2.0, JavaScript 1.8\ Spark 1.6+, Hive 1.0+, Pig 0.14+, Kafka 0.8+\Servlet 3.0, HTML 5, CSS 3\ Sqoop 1.3+, Flume 1.3+, Impala 1.2+, Oozie \ 3.0+, Zookeeper 3.4+\

NoSQL\Java Frameworks\: HBase 0.98, Cassandra 2, MangoDB 3\Spring MVC, Hibernate 3 & 4\

Programming Languages\Others: Java, Scala, SQL, SparkSQL\JSON, AVRO, Parquet, ORC, XML \HiveQL, Pig-Latin, C#, C, Python\RabbitMQ 3.0+\

PROFESSIONAL EXPERIENCE

Confidential, Renton WA

Big Data Engineer

Responsibilities:

  • Designed and built data pipeline for large amounts of data in Azure platform
  • Generated Heartbeat message to pipeline to check the health of the pipeline and the pipeline latency
  • Designed JSON schema of the table according to the BI requirement
  • Configured EventHub to receive stream data and use EventFlow to collect cluster data
  • Developed the Azure Function for loading test in 8k messages per second and 20k messages per second
  • ETL: Extract data from Blob Storage, Transfer it to Application Insight for BI team and Load the data to S3 bucket from EventHub
  • Transfer data from S3 to Snowflake Database by using Azure Function
  • Involved in Kinesis POC and Alooma
  • Deploy the Kafka to server and write simple producers and consumers

Environment: Azure, EventHub, Azure Functions, Cosmos DB, Application Insight, Snowflake, Kinesis, Firehose, S3, Lambda, Kafka, Zookeeper, bash, Powershell, Eclipse, Python

Confidential, Seattle WA

Sr. Big Data Developer

Responsibilities:

  • Develop big data applications in CDH platform and maintain the data processing workflows
  • Design the Hive schemes based on the requirements and Hive data migration and validation
  • Apply Sqoop to extract data from MySQL to do the continually data cleaning and aggregating, then store the data in HDFS
  • Work on creating HiveQL scripts to process and load data into Hive tables
  • Experienced performing data quality investigations writing interactive ad-hoc queries using Spark shell
  • Experienced loading historical data into HDFS via Spark SQL
  • Expertise at designing tables in Hive, MySQL using Sqoop and processing data like importing and exporting of databases to the HDFS
  • ETL: Data extraction, managing, aggregation and loading into Hive
  • Extensive experience in data monitoring and integrating existing monitoring to maintain data workflows via DataDog
  • Develop POC using Hive and deploy on the Test cluster, compared the performance of the old data architecture
  • Develop predictive analytic using Impala and check the data consistency
  • Develop workflow automation in Oozie and define coordinator to connect the upstreaming workflow and generate trigger files for downstreaming workflow

Environment: Cloudera, Hadoop, MySQL, HDFS, YARN, Impala, DataDog, Spark, Spark SQL, Zookeeper, Hive, Kafka, Oozie, Eclipse, Scala, Python, UNIX Shell Scripting

Confidential, Seattle WA

Sr. Big Data Developer

Responsibilities:

  • Design and build scalable infrastructure and platform for very large amounts of data ingestion, aggregation, integration and advanced analytics in Hadoop, including Map Reduce, Spark, Hive, HBase, Pig
  • Design the HBase schemes based on the requirements and HBase data migration and validation
  • Design and implement the HBase query APIs in Java for BI teams
  • Apply Spark Streaming to receive data from Kafka to do the continually data cleaning and aggregating, then store the data in HBase
  • Work on the core and Spark SQL modules of Spark Streaming extensively
  • Write customized Spark SQL UDFs in Scala and Java
  • Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response
  • Expertise at designing tables in Hive, MySQL using Sqoop and processing data like importing and exporting of databases to the HDFS
  • ETL: Data extraction, managing, aggregation and loading into HBase
  • Involve in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala to improve the performance
  • Develop multiple POCs using Scala and deploy on the Yarn cluster, compared the performance of Spark, with Hive and SQL
  • Develop predictive analytic using Apache Spark Scala APIs
  • HBase, Spark Streaming performance tuning
  • Configure Zookeeper to coordinate and support Kafka, Spark, Spark Streaming, HBase and HDFS

Environment: Hadoop 2.6, HDFS, YARN, Spark 1.5, Spark SQL, Spark Streaming, Zookeeper 3.5, HBase 0.98, Cassandra, Hive 1.2, Kafka 1.3, RabbitMQ 3.4.4, Oozie 4.1, Eclipse, Scala, UNIX Shell Scripting

Confidential, Seattle WA

Senior Data Engineer

Responsibilities:

  • Analyzed largedatasets by running custom MapReduce and Hive queries
  • Designed the HBase schemes based on the requirements
  • Designed Cassandra schemes and connect it by Spark
  • HBase data migration and validation
  • Assisted in exporting analyzeddatato relational databases using Sqoop
  • Utilized Kafka and RabbitMQ to capture the data stream
  • Processed data by using Spark Streaming and Kafka then stored the results in HBase
  • Worked on the core and Spark SQL modules of Spark extensively
  • Loaded the data from different source such as HDFS or HBase into Spark RDD and do in memory data computation to generate the output response
  • Implemented POC to migrate MapReduce jobs intoSparkRDD transformation using Scala IDE for IntelliJ
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala to improve the performance
  • Developed predictive analytic using Apache Spark Scala APIs
  • Scheduled workflow with Oozie

Environment: Hadoop 2.6, HDFS, Spark 1.5, Spark SQL, Spark Streaming, Zookeeper 3.5, HBase 0.98, Hive 1.2, Kafka 0.8, RabbitMQ 3.4.4, Oozie 4.1, IntelliJ, Scala, UNIX Shell Scripting

Confidential, Seattle WA

Senior Data Engineer

Responsibilities:

  • Worked on Confidential EMR 4.x with Agile methodology
  • Developed Kafka consumer to receive and store real time data from Kafka to Confidential S3
  • Used Flume to collect, aggregate, and store web log data from different sources
  • Used Sqoop to transfer data between RDBMS and HBase
  • Extracted data from MongoDB through MongoDB Connector for Hadoop
  • Involved in migrating of MapReduce programs into Spark using Spark and Scala
  • Scheduled workflow with Oozie
  • Use Spark with Scala and Spark SQL for testing and processing of data
  • Worked with analytics team to build statistical model with MLlib and PySpark
  • Worked with analytics team to prepare and visualize tables in Tableau for reporting
  • Performed unit testing using JUnit and Pytest
  • Used Git for version control, JIRA for project tracking and Jenkins for continuous integration

Environment: Hadoop 2.6, Cloudera CDH 5.4, HDFS, MapReduce, HBase, Sqoop, Flume, Zookeeper, MongoDB, Spark 1.4, Spark SQL, Pyspark, MLlib, Tableau 9.2, JUnit, Pytest

Confidential, Seattle WA

Jr. Data Engineer

Responsibilities:

  • Responsible for building scalable distributed data solutions usingHadoop
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on theHadoopcluster
  • Developed simple to complex Map/Reduce jobs using Java, and scripts using Hive and Pig
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) for data ingestion and egress
  • Implemented business logic by writing UDFs in Java and used various UDFs from other sources
  • Experienced on loading and transforming of large sets of structured and semi structured data
  • Managing and ReviewingHadoopLog Files, deploy and MaintainingHadoopCluster
  • Export filtered data into HBase for fast query

Environment: Hadoop, HBase, Hive, Pig, Map Reduce, Sqoop, Oozie, Eclipse, Java

Confidential

Jr. Java/J2EE/Big Data Developer

Responsibilities:

  • Developed Map Reduce programs in Java to help the analysis team to read, write, delete, update and analyze the big data
  • Developed a serious of Map Reduce algorithms to increase the performance
  • Developed Web pages using Struts view component JSP, JavaScript, HTML, jQuery, AJAX, to create the user interface views migration 3rd party applications
  • Java Hibernate is used to access the database within the system

Environment: Eclipse, Java, Map Reduce, Algorithm, Hibernate, Struts, JavaScript, JQuery, CSS, HTML, MySQL and XML

We'd love your feedback!