Spark/Hadoop Developer. Resume Tukwila, WA. - Hire IT People

SUMMARY:

Over all 5+ years of experience in building big data applications through Hadoop, Spark, Hive, Sqoop, Kafka frameworks.
Hands on experience with Spark/hadoop distributions - Cloudera, Horton works.
Experience working with large datasets and making performance improvements.
Experience in performing Read and Write operations on HDFS.
Experience in implementing Spark integration with Hadoop ecosystem.
Experience in designing and developing Spark applications using Scala.
Experience in migrating MapReduce programs into RDD transformations or actions to improve performance.
Worked with RDD for parallel processing of datasets in HDFS, My SQL and the other sources.
Experience in creating Hive tables and loading data from different file formats and performed Partitioning, Dynamic partitioning, Bucketing in Hive.
Experience developing and debugging Hive queries.
Experience converting HiveQL/SQL queries into Spark transformations through Spark RDD and dataframes API in Scala.
Good experience in Data importing and exporting to Hive and HDFS with Sqoop.
Used Oozie to manage and schedule Spark jobs on Hadoop Cluster.
Experience in using producer and consumer API's of Kafka.
Skilled in integrating Kafka with Spark Streaming for faster data processing.
Experience with EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service) setting up EMR (Elastic MapReduce).
Experience in creating and driving large scale ETL pipelines.
Experience in dealing with file formats like Text Files, Sequence Files, JSON, Parquet, ORC.
Worked on Tableau with Hive by using JDBC/ODBC drivers.
Strong Knowledge on UNIX/LINUX commands.
Good with version control system like GIT.
Adequate knowledge on python scripting language.
Adequate Knowledge of scrum, Agile and waterfall methodologies.
Highly motivated and committed to highest levels of professionalism.

TECHNICAL SKILLS:

Big Data Technologies: Apache Spark, Apache Hadoop, Map Reduce, HDFS, Apache Hive, YARN,Apache Oozie, Apache Kafka, Apache Sqoop, Apache Flume, Apache Zookeeper

Languages: Scala, Python, SQL

Databases: My SQL, Oracle 11g

Operating systems: Mac OS, Windows 7/10, Linux

Tools: IntelliJ, Eclipse, Maven, GitHub, Jenkins

PROFESSIONAL EXPERIENCE:

Confidential, Tukwila, WA.

Spark/Hadoop Developer.

Responsibilities:

Worked under the Cloudera distribution CDH 5.13 version.
Imported and transformed large scale volumes of data from various data sources to HDFS.
Experienced in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
Processed different data sets files from HDFS into Spark code using Scala and Spark-SQL for faster testing and processing of data.
Experiencing in Loading the data into Spark RDD's & Spark Data Frame API's and performed in-memory data computation to generate the output response.
Worked on writing various spark transformations using Scala for Data Validation, Cleansing and Filtering in Hadoop HDFS.
Developed Scala scripts using both Data frames and RDD's in Spark for Data aggregation queries.
Responsible in performing a sort, join, filter, and other transformations on the datasets.
Involved in creating Hive tables to load the transformed data and stored it in HDFS.
Performed various performance optimizations like using distributed cache for small datasets, partitioning, bucketing of the tables in hive and Map joins.
Performed analysis on the hive tables based on the business logic, by writing queries using HiveQL for faster data processing.
Appended the Data frames to pre-existing data in hive.
Performed data cleansing to meet business requirements and stored the output data to Hive and HDFS.
Implemented the workflows using Apache Oozie framework to automate tasks.
Performed Proof of Concept for the Spark Streaming with kafka for the real time data processing.
Used Git extensively as versioning tool.
Worked with Jenkins for continuous integration.

Environment: Cloudera 5.13, Hadoop 3.0, HDFS, Spark 2.4, Hive 3.0, Spark SQL, Scala, Sqoop, Oozie, Linux shell, GIT, Jenkins, Agile.

Confidential, Chicago, IL.

Spark/Hadoop Developer.

Responsibilities:

Worked under the Hortonworks Enterprise.
Worked on large sets of structured and semi-structured historical data.
Involved in working with Sqoop to import the data from RDBMS to Hive.
Created Hive tables to load the Data and stored as ORC files for processing.
Implemented Hive Partitioning and bucketing for further classification of data.
Worked on Performance and Tuning optimization of Hive.
Involved in cleansing and transforming the data.
Used spark SQL to perform sort, join and filter the data.
Copied the ORC files to amazon S3 buckets using Sqoop for further processing in amazon EMR.
Performed data Aggregation operations using Spark SQL queries.
Copied output data back to Hive from Amazon S3 buckets using Sqoop after getting the output desired by the business.
Automated filter and join operations to join new data with the respective Hive tables using Oozie workflows on a daily basis.
Used Oozie and Oozie coordinators to deploy end to end data processing pipelines and scheduling work flows.
Used Git as Version Control System.
Worked with Jenkins for continuous integration.

Environment: HDP 2.5, HDFS, Spark 2.1, Hadoop 2.8, Kafka, Amazon S3, EMR, Sqoop, Oozie, Hive 2.3, Tez, Hue, Linux shell, Git, Jenkins, Agile.

Confidential

Hadoop Developer.

Responsibilities:

Worked under the Cloudera distribution.
Responsible for building scalable distributed data solutions using Hadoop.
Building and sizing the Hadoop cluster based on the data extracted from all the sources.
Monitored Hadoop cluster job performance and capacity planning.
Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Importing the data from RDBMS to HDFS using Sqoop.
Developed Simple to complex MapReduce Jobs to transform the ingested data.
Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
Created Hive tables to load the transformed Data.
Designed and Implemented Partitioning, Bucketing in HIVE for better classification.
Performed different joins on Hive tables and implementing Hive SerDe's like REGEX, JSON.
Performed transformations using Hive QL, MapReduce, and loaded data into HDFS.
Developed hive queries to analyses/transform the data.
Used Oozie workflow engine to automate MapReduce jobs.
Used SVN as Version Control System.

Environment: CDH 4.0, Hadoop 2.4, Map Reduce, Hive 0.13, Tez, Hue, Oozie, Flume, Sqoop, HBase, SVN, Jenkins, Agile.

Confidential

Junior Hadoop Developer

Responsibilities:

Worked under the Cloudera distribution.
Involved in loading data from LINUX file system to HDFS.
Monitored Hadoop cluster job performance and capacity planning.
Created Hive tables to load the transformed Data.
Importing the data from RDBMS to HDFS using Sqoop.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Developed hive queries to analyze/transform the data.
Used Oozie workflow engine to automate Map Reduce jobs.

Environment: CDH 4.0, Hadoop 2.4, Map Reduce, Hive 0.13, Hue, Oozie, Sqoop, HBase, Agile.

We provide IT Staff Augmentation Services!

Spark/hadoop Developer. Resume

Tukwila, Wa

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship