Spark Scala Developer Resume
Bloomfield, CT
SUMMARY:
- Around 5 years of extensive IT experience in all phases of Software Development Life Cycle, including experience working with Hadoop, Spark and Cloud projects.
- Worked extensively with Hadoop Distributions like Cloudera, and Hortonworks.
- Hands on experience with Hadoop Architecture and its components like YARN HDFS, Resource Manager, Node Manager, Name Node, Data Node and MR v1 & v2.
- Experienced in working with different file formats (Avro, Parquet, RC & ORC) & compressions (Gzip, LZO, Snappy and Bzip2).
- Broad working experience and certified in Spark Core, SQL (Dataframes), Streaming.
- Experience in importing and exporting data from different RDBMS Servers like Oracle and Teradata into HDFS and Hive using Sqoop.
- Experience in ingesting data from FTP/SFTP servers using Flume.
- Experience in developing Kafka Consumer API using Spark applications using Scala.
- Experience in Hive, Impala and Spark Performance Tuning and Optimization.
- Experience in developing Hive UDFs and running hive scripts using different execution engines like Tez and Spark (Hive on Spark).
- Experienced in tuning long running Spark applications and implementing features like graceful shutdown, fault tolerance and fail over.
- Experience in creating DStreams from sources like Kafka and performed different Spark transformations and actions on it.
- Experience in working with Akka Actor Model using Scala.
- Hands on experience working with Kerberos keytabs for application authentication and Sentry for defining role based ACLs on objects like URI, databases and tables.
- Well versed working with Hadoop encryption like data at rest and transportation.
- Experience in Integrating Hive, Impala, Spark with Tableau reports.
- Experience in publishing and scheduling refreshes on the Tableau Server.
- Experience with AWS components like Ec2 instances, S3 buckets & Cloud Formation templates.
- Experience with Azure Components like Azure SQL Database and Azure Data Factory.
TECHNICAL SKILLS:
Programming Languages: Scala, Java, shell scripting, SQL and PL/SQL
Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Impala, Hue, Sqoop,Oozie, Flume, Zookeeper, Kafka, Sentry, Cloudera and Hortonworks.
Spark Components: Core, SQL(Dataframes, Datasets), Streaming
Databases & NoSQL: Oracle, Teradata, MySQL, SQL Server, HBase
Hadoop Paradigms: Map Reduce, YARN, In - memory computing, High Availability, Batch processing, Real-time Streaming.
Other Tools: Eclipse, IntelliJ, Maven, SBT, SVN, GitHub, Jira, Jenkins
Cloud Components: AWS (S3 Buckets, EMR, Ec2, Cloud Formation), Azure (Sql Database
Visualization: Tableau Desktop and Tableau Server
PROFESSIONAL EXPERIENCE:
Confidential - Bloomfield, CT
Spark Scala Developer
Responsibilities:
- Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
- Orchestrated number of Sqoop Queries and Hive Scripts through custom developed Infrastructure.
- Handled encryption algorithms using Apache Shiro for password protection.
- Transformed existing hive scripts to Spark applications using RDDs for transforming data and persisting into HDFS.
- Extensively worked with Spark-SQL context to create data frames and datasets to preprocess the model data.
- Developed Spark HBase and Spark AtScale modules for retrieving data into Spark for processing.
- Designed end to end integration testing and unit testing for Spark Applications.
- Experienced in performance tuning of Spark applications on code, memory and parallelism levels.
- Developed to Spark application to stream Hive table to Kafka topic with Avro format.
- Migrated all the data from Teradata to Big data environment using Sqoop and Hive.
- Worked on Spark Hive models to convert to turn off the Classic Environment.
- Responsible for developing Spark scripts to check data quality issues in Dataframes.
- Developed preprocessing logics to filter data for downstream teams based on the requirements.
- Deployed all changes through Continuous Integration Continuous Development pipelines using Jenkins and IBM UDeploy.
Environment: - Cloudera, AWS, Sqoop, Hive, Spark, HBase, AtScale, SBT, Jenkins, IBM UDeploy, Shiro, Oozie, Intellij, Teradata
Confidential - St. Louis, MO
Big Data - Senior Technical Consultant
Responsibilities:
- Integrated Tableau with Azure SQL Database and published workbooks to Tableau Server.
- Involved in developing Sqoop queries for moving data from RDBMS servers to Hadoop.
- Implemented optimization techniques like partitioning, bucketing and query optimization in Hive.
- Responsible for creating spark Dataframes using Scala for the ingested data.
- Developed data pipe lines using Kafka and Spark Streaming to ingest, transform and for aggregations
Environment: Hadoop, HDFS, Spark, Hive, Oozie, Impala, Cloudera, Azure SQL Server, Tableau
Confidential, Ridgefeild Park, NJ
Hadoop Engineer
Responsibilities:
- Involved in complete BigData flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
- Transformed existing hive scripts to Spark applications using RDDs for transforming data and persisting into HDFS.
- Extensively worked with Spark-SQL context to create data frames to filter input data for model execution.
- Developed data pipe lines using Kafka and Spark Streaming to ingest, transform and for aggregations.
- Developed Flume agents for handling data from FTP/SFTP Source and Sink as HDFS.
- Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
- Developed automated scripts to import data from Amazon s3 buckets to HDFS using Boto library.
- Involved in performance tuning of Hive from design, storage and query perspectives.
- Extensively worked on performance optimization of hive queries by using map-side join, parallel execution and cost based optimization.
- Automated the ETL pipelines using Oozie and scheduled jobs using coordinator & Cron tabs.
- Integrated Hive and Impala with Tableau reports and published to Tableau Server.
- Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
- Designed role based acls for the tables in Hive and Impala using Sentry.
Environment: HDFS, Yarn, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, IntelliJ, Oracle, Teradata, Shell Scripting, Tableau, Scala, Cloudera, AWS.
Confidential
Hadoop Developer
Responsibilities:
- Participated in Gathering requirements, analyze requirements and design technical documents for business requirements.
- Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations.
- Developed and used existing UDF’s for custom implementation on table data.
- Created Partitioning, Bucketing, Map side Join, Parallel execution for optimizing the hive queries.
- Responsible for monitoring Cluster using Cloudera Manager.
- Developed Pig scripts for track data capture between arrived data and current data.
- Orchestrated hundreds of Sqoop queries, Pig scripts, Hive queries using Oozie workflows and sub-workflows.
- Responsible for handling different data formats like Avro, Parquet and ORC formats.
Environment: Pig, Hive, Oozie, Linux, YARN, Cloudera Manager
Confidential
Java Developer
Responsibilities:
- Involved in developing various data flow diagrams, use case diagrams and sequence diagrams.
- Worked on various phases and as well as improving the reporting module.
- Worked extensively in JSP, HTML, JavaScript, and CSS to create the UI pages for the project.
- Created JUnit test cases for unit testing and developed generic JS functions for validations.
- Gathered requirements for migrating from ICD-9 to ICD-10 codes.
Environment: Java 5.0, Struts, Spring 2.0, Hibernate 3.2, Web Logic 7.0, Eclipse, Oracle, JUnit 4.2, Maven, Windows XP, HTML, CSS, JavaScript, and XML.
