We provide IT Staff Augmentation Services!

Spark Scala Developer Resume

2.00/5 (Submit Your Rating)

Bloomfield, CT

SUMMARY:

  • Around 5 years of extensive IT experience in all phases of Software Development Life Cycle, including experience working with Hadoop, Spark and Cloud projects.
  • Worked extensively with Hadoop Distributions like Cloudera, and Hortonworks.
  • Hands on experience with Hadoop Architecture and its components like YARN HDFS, Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2.
  • Experienced in working with different file formats (Avro, Parquet, RC & ORC) & compressions (Gzip, LZO, Snappy and Bzip2).
  • Broad working experience and certified in Spark Core, SQL (Dataframes), Streaming.
  • Experience in importing and exporting data from different RDBMS Servers like Oracle and Teradata into HDFS and Hive using Sqoop.
  • Experience in ingesting data from FTP/SFTP servers using Flume.
  • Experience in developing Kafka Consumer API using Spark applications using Scala.
  • Experience in Hive, Impala and Spark Performance Tuning and Optimization.
  • Experience in developing Hive UDFs and running hive scripts using different execution engines like Tez and Spark (Hive on Spark).
  • Experienced in tuning long running Spark applications and implementing features like graceful shutdown, fault tolerance and fail over.
  • Experience in creating DStreams from sources like Kafka and performed different Spark transformations and actions on it.
  • Experience in working with Akka Actor Model using Scala.
  • Hands on experience working with Kerberos keytabs for application authentication and Sentry for defining role based ACLs on objects like URI, databases and tables.
  • Well versed working with Hadoop encryption like data at rest and transportation.
  • Experience in Integrating Hive, Impala, Spark with Tableau reports.
  • Experience in publishing and scheduling refreshes on the Tableau Server.
  • Experience with AWS components like Ec2 instances, S3 buckets & Cloud Formation templates.
  • Experience with Azure Components like Azure SQL Database and Azure Data Factory.

TECHNICAL SKILLS:

Programming Languages: Scala, Java, shell scripting, SQL and PL/SQL

Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Impala, Hue, Sqoop,Oozie, Flume, Zookeeper, Kafka, Sentry, Cloudera and Hortonworks.

Spark Components: Core, SQL(Dataframes, Datasets), Streaming

Databases & NoSQL: Oracle, Teradata, MySQL, SQL Server, HBase

Hadoop Paradigms: Map Reduce, YARN, In - memory computing, High Availability, Batch processing, Real-time Streaming.

Other Tools: Eclipse, IntelliJ, Maven, SBT, SVN, GitHub, Jira, Jenkins

Cloud Components: AWS (S3 Buckets, EMR, Ec2, Cloud Formation), Azure (Sql Database

Visualization: Tableau Desktop and Tableau Server

PROFESSIONAL EXPERIENCE:

Confidential - Bloomfield, CT

Spark Scala Developer

Responsibilities:

  • Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
  • Orchestrated number of Sqoop Queries and Hive Scripts through custom developed Infrastructure.
  • Handled encryption algorithms using Apache Shiro for password protection.
  • Transformed existing hive scripts to Spark applications using RDDs for transforming data and persisting into HDFS.
  • Extensively worked with Spark-SQL context to create data frames and datasets to preprocess the model data.
  • Developed Spark HBase and Spark AtScale modules for retrieving data into Spark for processing.
  • Designed end to end integration testing and unit testing for Spark Applications.
  • Experienced in performance tuning of Spark applications on code, memory and parallelism levels.
  • Developed to Spark application to stream Hive table to Kafka topic with Avro format.
  • Migrated all the data from Teradata to Big data environment using Sqoop and Hive.
  • Worked on Spark Hive models to convert to turn off the Classic Environment.
  • Responsible for developing Spark scripts to check data quality issues in Dataframes.
  • Developed preprocessing logics to filter data for downstream teams based on the requirements.
  • Deployed all changes through Continuous Integration Continuous Development pipelines using Jenkins and IBM UDeploy.

Environment: - Cloudera, AWS, Sqoop, Hive, Spark, HBase, AtScale, SBT, Jenkins, IBM UDeploy, Shiro, Oozie, Intellij, Teradata

Confidential - St. Louis, MO

Big Data - Senior Technical Consultant

Responsibilities:

  • Integrated Tableau with Azure SQL Database and published workbooks to Tableau Server.
  • Involved in developing Sqoop queries for moving data from RDBMS servers to Hadoop.
  • Implemented optimization techniques like partitioning, bucketing and query optimization in Hive.
  • Responsible for creating spark Dataframes using Scala for the ingested data.
  • Developed data pipe lines using Kafka and Spark Streaming to ingest, transform and for aggregations

Environment: Hadoop, HDFS, Spark, Hive, Oozie, Impala, Cloudera, Azure SQL Server, Tableau

Confidential, Ridgefeild Park, NJ

Hadoop Engineer

Responsibilities:

  • Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
  • Transformed existing hive scripts to Spark applications using RDDs for transforming data and persisting into HDFS.
  • Extensively worked with Spark-SQL context to create data frames to filter input data for model execution.
  • Developed data pipe lines using Kafka and Spark Streaming to ingest, transform and for aggregations.
  • Developed Flume agents for handling data from FTP/SFTP Source and Sink as HDFS.
  • Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
  • Developed automated scripts to import data from Amazon s3 buckets to HDFS using Boto library.
  • Involved in performance tuning of Hive from design, storage and query perspectives.
  • Extensively worked on performance optimization of hive queries by using map-side join, parallel execution and cost based optimization.
  • Automated the ETL pipelines using Oozie and scheduled jobs using coordinator & Cron tabs.
  • Integrated Hive and Impala with Tableau reports and published to Tableau Server.
  • Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
  • Designed role based acls for the tables in Hive and Impala using Sentry.

Environment: HDFS, Yarn, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, IntelliJ, Oracle, Teradata, Shell Scripting, Tableau, Scala, Cloudera, AWS.

Confidential

Hadoop Developer

Responsibilities:

  • Participated in Gathering requirements, analyze requirements and design technical documents for business requirements.
  • Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations.
  • Developed and used existing UDF’s for custom implementation on table data.
  • Created Partitioning, Bucketing, Map side Join, Parallel execution for optimizing the hive queries.
  • Responsible for monitoring Cluster using Cloudera Manager.
  • Developed Pig scripts for track data capture between arrived data and current data.
  • Orchestrated hundreds of Sqoop queries, Pig scripts, Hive queries using Oozie workflows and sub-workflows.
  • Responsible for handling different data formats like Avro, Parquet and ORC formats.

Environment: Pig, Hive, Oozie, Linux, YARN, Cloudera Manager

Confidential

Java Developer

Responsibilities:

  • Involved in developing various data flow diagrams, use case diagrams and sequence diagrams.
  • Worked on various phases and as well as improving the reporting module.
  • Worked extensively in JSP, HTML, JavaScript, and CSS to create the UI pages for the project.
  • Created JUnit test cases for unit testing and developed generic JS functions for validations.
  • Gathered requirements for migrating from ICD-9 to ICD-10 codes.

Environment: Java 5.0, Struts, Spring 2.0, Hibernate 3.2, Web Logic 7.0, Eclipse, Oracle, JUnit 4.2, Maven, Windows XP, HTML, CSS, JavaScript, and XML.

We'd love your feedback!