Spark/hadoop Developer. Resume
Tukwila, Wa
SUMMARY:
- Over all 5+ years of experience in building big data applications through Hadoop, Spark, Hive, Sqoop, Kafka frameworks.
- Hands on experience with Spark/hadoop distributions - Cloudera, Horton works.
- Experience working with large datasets and making performance improvements.
- Experience in performing Read and Write operations on HDFS.
- Experience in implementing Spark integration with Hadoop ecosystem.
- Experience in designing and developing Spark applications using Scala.
- Experience in migrating MapReduce programs into RDD transformations or actions to improve performance.
- Worked with RDD for parallel processing of datasets in HDFS, My SQL and the other sources.
- Experience in creating Hive tables and loading data from different file formats and performed Partitioning, Dynamic partitioning, Bucketing in Hive.
- Experience developing and debugging Hive queries.
- Experience converting HiveQL/SQL queries into Spark transformations through Spark RDD and dataframes API in Scala.
- Good experience in Data importing and exporting to Hive and HDFS with Sqoop.
- Used Oozie to manage and schedule Spark jobs on Hadoop Cluster.
- Experience in using producer and consumer API's of Kafka.
- Skilled in integrating Kafka with Spark Streaming for faster data processing.
- Experience with EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service) setting up EMR (Elastic MapReduce).
- Experience in creating and driving large scale ETL pipelines.
- Experience in dealing with file formats like Text Files, Sequence Files, JSON, Parquet, ORC.
- Worked on Tableau with Hive by using JDBC/ODBC drivers.
- Strong Knowledge on UNIX/LINUX commands.
- Good with version control system like GIT.
- Adequate knowledge on python scripting language.
- Adequate Knowledge of scrum, Agile and waterfall methodologies.
- Highly motivated and committed to highest levels of professionalism.
TECHNICAL SKILLS:
Big Data Technologies: Apache Spark, Apache Hadoop, Map Reduce, HDFS, Apache Hive, YARN,Apache Oozie, Apache Kafka, Apache Sqoop, Apache Flume, Apache Zookeeper
Languages: Scala, Python, SQL
Databases: My SQL, Oracle 11g
Operating systems: Mac OS, Windows 7/10, Linux
Tools: IntelliJ, Eclipse, Maven, GitHub, Jenkins
PROFESSIONAL EXPERIENCE:
Confidential, Tukwila, WA.
Spark/Hadoop Developer.
Responsibilities:
- Worked under the Cloudera distribution CDH 5.13 version.
- Imported and transformed large scale volumes of data from various data sources to HDFS.
- Experienced in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
- Processed different data sets files from HDFS into Spark code using Scala and Spark-SQL for faster testing and processing of data.
- Experiencing in Loading the data into Spark RDD's & Spark Data Frame API's and performed in-memory data computation to generate the output response.
- Worked on writing various spark transformations using Scala for Data Validation, Cleansing and Filtering in Hadoop HDFS.
- Developed Scala scripts using both Data frames and RDD's in Spark for Data aggregation queries.
- Responsible in performing a sort, join, filter, and other transformations on the datasets.
- Involved in creating Hive tables to load the transformed data and stored it in HDFS.
- Performed various performance optimizations like using distributed cache for small datasets, partitioning, bucketing of the tables in hive and Map joins.
- Performed analysis on the hive tables based on the business logic, by writing queries using HiveQL for faster data processing.
- Appended the Data frames to pre-existing data in hive.
- Performed data cleansing to meet business requirements and stored the output data to Hive and HDFS.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Performed Proof of Concept for the Spark Streaming with kafka for the real time data processing.
- Used Git extensively as versioning tool.
- Worked with Jenkins for continuous integration.
Environment: Cloudera 5.13, Hadoop 3.0, HDFS, Spark 2.4, Hive 3.0, Spark SQL, Scala, Sqoop, Oozie, Linux shell, GIT, Jenkins, Agile.
Confidential, Chicago, IL.
Spark/Hadoop Developer.
Responsibilities:
- Worked under the Hortonworks Enterprise.
- Worked on large sets of structured and semi-structured historical data.
- Involved in working with Sqoop to import the data from RDBMS to Hive.
- Created Hive tables to load the Data and stored as ORC files for processing.
- Implemented Hive Partitioning and bucketing for further classification of data.
- Worked on Performance and Tuning optimization of Hive.
- Involved in cleansing and transforming the data.
- Used spark SQL to perform sort, join and filter the data.
- Copied the ORC files to amazon S3 buckets using Sqoop for further processing in amazon EMR.
- Performed data Aggregation operations using Spark SQL queries.
- Copied output data back to Hive from Amazon S3 buckets using Sqoop after getting the output desired by the business.
- Automated filter and join operations to join new data with the respective Hive tables using Oozie workflows on a daily basis.
- Used Oozie and Oozie coordinators to deploy end to end data processing pipelines and scheduling work flows.
- Used Git as Version Control System.
- Worked with Jenkins for continuous integration.
Environment: HDP 2.5, HDFS, Spark 2.1, Hadoop 2.8, Kafka, Amazon S3, EMR, Sqoop, Oozie, Hive 2.3, Tez, Hue, Linux shell, Git, Jenkins, Agile.
Confidential
Hadoop Developer.
Responsibilities:
- Worked under the Cloudera distribution.
- Responsible for building scalable distributed data solutions using Hadoop.
- Building and sizing the Hadoop cluster based on the data extracted from all the sources.
- Monitored Hadoop cluster job performance and capacity planning.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Importing the data from RDBMS to HDFS using Sqoop.
- Developed Simple to complex MapReduce Jobs to transform the ingested data.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Created Hive tables to load the transformed Data.
- Designed and Implemented Partitioning, Bucketing in HIVE for better classification.
- Performed different joins on Hive tables and implementing Hive SerDe's like REGEX, JSON.
- Performed transformations using Hive QL, MapReduce, and loaded data into HDFS.
- Developed hive queries to analyses/transform the data.
- Used Oozie workflow engine to automate MapReduce jobs.
- Used SVN as Version Control System.
Environment: CDH 4.0, Hadoop 2.4, Map Reduce, Hive 0.13, Tez, Hue, Oozie, Flume, Sqoop, HBase, SVN, Jenkins, Agile.
Confidential
Junior Hadoop Developer
Responsibilities:
- Worked under the Cloudera distribution.
- Involved in loading data from LINUX file system to HDFS.
- Monitored Hadoop cluster job performance and capacity planning.
- Created Hive tables to load the transformed Data.
- Importing the data from RDBMS to HDFS using Sqoop.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Developed hive queries to analyze/transform the data.
- Used Oozie workflow engine to automate Map Reduce jobs.
Environment: CDH 4.0, Hadoop 2.4, Map Reduce, Hive 0.13, Hue, Oozie, Sqoop, HBase, Agile.