Big Data Engineer Resume Renton WA - Hire IT People

SUMMARY

6+ years of working experience in IT Industry including Big Data and Java/J2EE
Worked in various domains included E - commerce, telecommunication, finance and investment
Strong experience in Hadoop 2.3 Eco system technologies such as in HDFS, YARN, Map Reduce, Spark 1.3+, Hive 1.0+, Pig 0.14+, HBase 0.98, Zookeeper 3.4+, Oozie 3.0+, Flume 1.3+, Sqoop 1.3+, Impala 1.2+ and Kafka 0.8+
Experienced in building, maintaining multiple Hadoop clusters of different sizes and configuration and setting up the rack topology for large clusters also in Hadoop Administration/Architecture/Developer
Developed Java 6+ and Scala 2.10+ applications on Hadoop and Spark Streaming for high-volume and real-time data processing
Expertise in Spark Streaming and Spark SQL with Scala
Exceptional skills with NoSQL databases such as HBase and Cassandra
Experienced in writing Sqoop, HiveQL and Pig scripts as well as the UDFs to do ELT process
Utilized Kafka and Flume to gain real-time data stream and save it in HDFS and HBase from the different data sources
Experience withAWSCloud IAM, Data pipeline,EMR, S3, EC2,AWSCLI, and other services.
Experience with Azure Function, EventHub, Application Insight, Cosmos DB
Experience other Hadoop ecosystem tools in jobs such as ZooKeeper, Oozie, Implala
Experience in using notebooks like Zeppelin and Jupyter
Experience in all the phases of Data warehouse life cycle involving requirement analysis, design, coding, testing, and deployment
Strong in core Java, data structure and Java components like Collections Framework, Exception handling, I/O system, and Multithreading
Earned designation with J2EE development by using frameworks such as Spring MVC and Hibernate 3 & 4, as well as using Web services such as SOAP and REST
Familiar in Agile/Scrum Development and Test Driven Development (TDD)
Extensive Experienced in Unit Testing with JUnit, MRUnit, Pytest
Worked in development environment tools such as Git, JIRA, Jenkins, Agile/Scrum and Waterfall
Self-motivated, teamwork, working under high pressure, working in several projects simultaneously, dynamic problem-solving, adept at working with minimal to supervision, highly-caliber teams of professionals

TECHNICAL SKILLS

Hadoop Eco Systems\Web Technologies\: Hadoop 2.7.0+, MapReduce, HBase 0.98\SOAP, REST, JSP 2.0, JavaScript 1.8\ Spark 1.6+, Hive 1.0+, Pig 0.14+, Kafka 0.8+\Servlet 3.0, HTML 5, CSS 3\ Sqoop 1.3+, Flume 1.3+, Impala 1.2+, Oozie \ 3.0+, Zookeeper 3.4+\

NoSQL\Java Frameworks\: HBase 0.98, Cassandra 2, MangoDB 3\Spring MVC, Hibernate 3 & 4\

Programming Languages\Others: Java, Scala, SQL, SparkSQL\JSON, AVRO, Parquet, ORC, XML \HiveQL, Pig-Latin, C#, C, Python\RabbitMQ 3.0+\

PROFESSIONAL EXPERIENCE

Confidential, Renton WA

Big Data Engineer

Responsibilities:

Designed and built data pipeline for large amounts of data in Azure platform
Generated Heartbeat message to pipeline to check the health of the pipeline and the pipeline latency
Designed JSON schema of the table according to the BI requirement
Configured EventHub to receive stream data and use EventFlow to collect cluster data
Developed the Azure Function for loading test in 8k messages per second and 20k messages per second
ETL: Extract data from Blob Storage, Transfer it to Application Insight for BI team and Load the data to S3 bucket from EventHub
Transfer data from S3 to Snowflake Database by using Azure Function
Involved in Kinesis POC and Alooma
Deploy the Kafka to server and write simple producers and consumers

Environment: Azure, EventHub, Azure Functions, Cosmos DB, Application Insight, Snowflake, Kinesis, Firehose, S3, Lambda, Kafka, Zookeeper, bash, Powershell, Eclipse, Python

Confidential, Seattle WA

Sr. Big Data Developer

Responsibilities:

Develop big data applications in CDH platform and maintain the data processing workflows
Design the Hive schemes based on the requirements and Hive data migration and validation
Apply Sqoop to extract data from MySQL to do the continually data cleaning and aggregating, then store the data in HDFS
Work on creating HiveQL scripts to process and load data into Hive tables
Experienced performing data quality investigations writing interactive ad-hoc queries using Spark shell
Experienced loading historical data into HDFS via Spark SQL
Expertise at designing tables in Hive, MySQL using Sqoop and processing data like importing and exporting of databases to the HDFS
ETL: Data extraction, managing, aggregation and loading into Hive
Extensive experience in data monitoring and integrating existing monitoring to maintain data workflows via DataDog
Develop POC using Hive and deploy on the Test cluster, compared the performance of the old data architecture
Develop predictive analytic using Impala and check the data consistency
Develop workflow automation in Oozie and define coordinator to connect the upstreaming workflow and generate trigger files for downstreaming workflow

Environment: Cloudera, Hadoop, MySQL, HDFS, YARN, Impala, DataDog, Spark, Spark SQL, Zookeeper, Hive, Kafka, Oozie, Eclipse, Scala, Python, UNIX Shell Scripting

Confidential, Seattle WA

Sr. Big Data Developer

Responsibilities:

Design and build scalable infrastructure and platform for very large amounts of data ingestion, aggregation, integration and advanced analytics in Hadoop, including Map Reduce, Spark, Hive, HBase, Pig
Design the HBase schemes based on the requirements and HBase data migration and validation
Design and implement the HBase query APIs in Java for BI teams
Apply Spark Streaming to receive data from Kafka to do the continually data cleaning and aggregating, then store the data in HBase
Work on the core and Spark SQL modules of Spark Streaming extensively
Write customized Spark SQL UDFs in Scala and Java
Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response
Expertise at designing tables in Hive, MySQL using Sqoop and processing data like importing and exporting of databases to the HDFS
ETL: Data extraction, managing, aggregation and loading into HBase
Involve in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala to improve the performance
Develop multiple POCs using Scala and deploy on the Yarn cluster, compared the performance of Spark, with Hive and SQL
Develop predictive analytic using Apache Spark Scala APIs
HBase, Spark Streaming performance tuning
Configure Zookeeper to coordinate and support Kafka, Spark, Spark Streaming, HBase and HDFS

Environment: Hadoop 2.6, HDFS, YARN, Spark 1.5, Spark SQL, Spark Streaming, Zookeeper 3.5, HBase 0.98, Cassandra, Hive 1.2, Kafka 1.3, RabbitMQ 3.4.4, Oozie 4.1, Eclipse, Scala, UNIX Shell Scripting

Confidential, Seattle WA

Senior Data Engineer

Responsibilities:

Analyzed largedatasets by running custom MapReduce and Hive queries
Designed the HBase schemes based on the requirements
Designed Cassandra schemes and connect it by Spark
HBase data migration and validation
Assisted in exporting analyzeddatato relational databases using Sqoop
Utilized Kafka and RabbitMQ to capture the data stream
Processed data by using Spark Streaming and Kafka then stored the results in HBase
Worked on the core and Spark SQL modules of Spark extensively
Loaded the data from different source such as HDFS or HBase into Spark RDD and do in memory data computation to generate the output response
Implemented POC to migrate MapReduce jobs intoSparkRDD transformation using Scala IDE for IntelliJ
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala to improve the performance
Developed predictive analytic using Apache Spark Scala APIs
Scheduled workflow with Oozie

Environment: Hadoop 2.6, HDFS, Spark 1.5, Spark SQL, Spark Streaming, Zookeeper 3.5, HBase 0.98, Hive 1.2, Kafka 0.8, RabbitMQ 3.4.4, Oozie 4.1, IntelliJ, Scala, UNIX Shell Scripting

Confidential, Seattle WA

Senior Data Engineer

Responsibilities:

Worked on Confidential EMR 4.x with Agile methodology
Developed Kafka consumer to receive and store real time data from Kafka to Confidential S3
Used Flume to collect, aggregate, and store web log data from different sources
Used Sqoop to transfer data between RDBMS and HBase
Extracted data from MongoDB through MongoDB Connector for Hadoop
Involved in migrating of MapReduce programs into Spark using Spark and Scala
Scheduled workflow with Oozie
Use Spark with Scala and Spark SQL for testing and processing of data
Worked with analytics team to build statistical model with MLlib and PySpark
Worked with analytics team to prepare and visualize tables in Tableau for reporting
Performed unit testing using JUnit and Pytest
Used Git for version control, JIRA for project tracking and Jenkins for continuous integration

Environment: Hadoop 2.6, Cloudera CDH 5.4, HDFS, MapReduce, HBase, Sqoop, Flume, Zookeeper, MongoDB, Spark 1.4, Spark SQL, Pyspark, MLlib, Tableau 9.2, JUnit, Pytest

Confidential, Seattle WA

Jr. Data Engineer

Responsibilities:

Responsible for building scalable distributed data solutions usingHadoop
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
Installed and configured Hive, Pig, Sqoop, Flume and Oozie on theHadoopcluster
Developed simple to complex Map/Reduce jobs using Java, and scripts using Hive and Pig
Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) for data ingestion and egress
Implemented business logic by writing UDFs in Java and used various UDFs from other sources
Experienced on loading and transforming of large sets of structured and semi structured data
Managing and ReviewingHadoopLog Files, deploy and MaintainingHadoopCluster
Export filtered data into HBase for fast query

Environment: Hadoop, HBase, Hive, Pig, Map Reduce, Sqoop, Oozie, Eclipse, Java

Confidential

Jr. Java/J2EE/Big Data Developer

Responsibilities:

Developed Map Reduce programs in Java to help the analysis team to read, write, delete, update and analyze the big data
Developed a serious of Map Reduce algorithms to increase the performance
Developed Web pages using Struts view component JSP, JavaScript, HTML, jQuery, AJAX, to create the user interface views migration 3rd party applications
Java Hibernate is used to access the database within the system

Environment: Eclipse, Java, Map Reduce, Algorithm, Hibernate, Struts, JavaScript, JQuery, CSS, HTML, MySQL and XML

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Renton, WA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship