We provide IT Staff Augmentation Services!

Spark Developer Resume

3.00/5 (Submit Your Rating)

SUMMARY

  • Expertise in Bigdata Development applications and experienced in Hadoop ecosystem components like Spark, Hive, Sqoop, Pig, Flume and Oozie.
  • Hands on developing and debugging Spark Jobs to process large Datasets.
  • Excellent knowledge and understanding of Distributed Computing and Parallel processing frameworks.
  • Experience in working with Cloudera and Horton Works Hadoop Distributions.
  • Worked on Importing and exporting data into HDFS and Hive using Sqoop.
  • Experience in Creating Hive tables and load the tables using Sqoop and processed data using Hive QL.
  • Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
  • Extending Hive and Pig core functionality by writing custom UDF's for Data Analysis.
  • Good experience in job scheduling tools like Oozie.
  • Experience in handling Hive queries using spark SQL that integrate with spark environment implemented in Scala.
  • Hands on Experience in dealing with the different file formats like Sequence files, Json, Avro and Parquet.
  • Hands on experience in using Spark Streaming programming model for Real time processing of data and stored in HDFS.
  • Experience inworking with Amazon S3 Buckets for storing the data.
  • Adequate knowledge of Agile and Waterfall methodologies.
  • Extensive programming experience in developing Java applications using Java, J2EE and JDBC.
  • Well versed with UNIX and Linux command line and shell script.
  • Extensive experience in developing Stored Procedures, Functions and Triggers, Complex SQL queries using Oracle PL/SQL.
  • Exhibited strong written and oral communication skills. Rapidly learn and adapt quickly to emerging new technologies and paradigms.

TECHNICAL SKILLS

BigData Technologies: MapReduce, Pig, Hive, Yarn, Sqoop, Oozie, Hbase, Impala, Hue, Spark, Flume, Kafka.

Hadoop Distributions: Horton Works, Cloudera

Cloud platforms: AWS.

Databases: Oracle 11g/10g/9i MySQL, Teradata.

VersionControl Tools: Git, SVN.

Build Tools: Maven, Sbt.

ETL: Informatica PowerCenter 10.1/9.X.

Languages: Java, Scala, SQL, Python.

Operating System: Mac OS, Linux (Various Versions), Windows 2010/7/8/8.1/XP

Development Tools: Eclipse, Intellij.

PROFESSIONAL EXPERIENCE

Confidential

Spark Developer

Responsibilities:

  • Used different pyspark APIs to perform necessary transformations and actions on the data which gets from Kafka in real time.
  • Performed various Parsing technique’s using pyspark API’S to cleanse the data from Kafka.
  • Experienced in working with Spark SQL on different file formats like Avro and Parquet.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Configured Kafka with Spark Streaming to collect the data from the Kafka.
  • Implemented to run Hive on spark and analyzed the data using SparkSQL Queries.
  • Very good understanding of Partitions, bucketing concepts inHiveand designed both Managed and External tables inHiveto optimize performance.
  • Implemented Incremental Imports of analyzed data intoOracle tables using Sqoop.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
  • Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
  • Used AWS service S3 for small data sets processing and storage.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Implemented data ingestion and handling clusters in real time processing using Kafka.
  • Exported the analyzed data to relational databases using Sqoop for visualization and to generate reports.

Confidential

Spark Developer

Responsibilities:

  • Used Flume as a data pipeline system to ingest the unstructured events from various web servers to HDFS.
  • Wrote various spark transformations using python for data cleansing, validation and summarization activities on user behavioral data.
  • Parsed the unstructured data into semi-structured format by writing complex algorithms in pyspark.
  • Developed generic parser to transform any format of unstructured data into a consisted data model.
  • Configured Flume with the Spark Streaming to transfer the data into HDFS at regular intervals of time from web servers to process the data.
  • Implemented the persistence of frequently used transformed data from data frames for faster processing.
  • Build hive tables on the transformed data and used different SERDE's to store the data in HDFS in different formats.
  • Loaded the transformed Data into the hive tables and perform some analysis based on the requirements.
  • Implemented partioning on the Hive data to increase the performance of the processing of data.
  • Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
  • Worked on various performance optimizations like using distributed cache for small datasets, Partitioning, Bucketing in Hive and Map Side joins.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • Implemented custom workflow to automate the jobs on daily basis.
  • Created custom workflows to automate Sqoop jobs weekly and monthly.

Confidential

Hadoop Developer

Responsibilities:

  • Identifying data sources and create appropriate data ingestion procedures.
  • The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions.
  • Worked extensively with Hive Query language queries
  • Developed PIG Latin Script for handling business transformations and Responsible writing Hive queries for data processing.
  • Extend Hive and Pig core functionality by writing custom UDFs using Java.
  • Involved in moving all transaction files generated from various sources to HDFS for further processing through flume.
  • Developed Sqoop jobs to extract data from Teradata/oracle into HDFS.
  • Involved in developing Hive queries for extraction, transformation, loading of data into data warehouse.
  • Worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
  • Automated all the jobs, for pulling data from different databases to load data into Hive tables using Oozie workflows.
  • Loaded data into the HDFS from the web servers using Flume and from relational database management systems using Sqoop.

Confidential

ETL Developer

Responsibilities:

  • Gathered requirements from Business users and created a Low level technical design documents using high level design document.
  • Participated in the design meetings, and prepared technical and mapping documentation.
  • Created Tables, Keys (Unique and Primary) and Indexes in the Oracle.
  • Extracted data from Flat files, Oracle to build an Operation Data Source. Applied business logic to load the data into Global Data Warehouse.
  • Extensively worked on Facts and Slowly Changing Dimension (SCD) tables.
  • Maintained source and target mappings, transformation logic and processes to reflect the changing business environment over time.
  • Developed complex mappings to load data from Source System (Oracle) and flat files to Teradata.
  • Used various transformations like Filter, Router, Expression, Lookup (connected and unconnected), Aggregator, Sequence Generator, Update Strategy, Joiner, Normalizer, Sorter and Union to develop robust mappings in theInformaticaDesigner.
  • Extensively used the Add Currently Processed Flat File Name port to load the flat file name and to load contract number coming from flat file name into Target.
  • Worked on complex Source Qualifier queries, Pre and Post SQL queries in the Target.
  • Worked on different tasks in Workflow Manager like Sessions, Events raise, Event wait, Decision, E-mail, Command, Worklets, Assignment, Timer and Scheduling of the workflow.
  • Extensively used workflow variables, mapping parameters and mapping variables.
  • Created sessions, batches for incremental load into staging tables and scheduled them to run daily.
  • Implemented performance tuning logic on Targets, Sources, Mappings and Sessions to provide maximum efficiency and performance.
  • Involved in Unit, Integration, System, and Performance testing levels.
  • Written documentation to describe program development, logic, coding, testing, changes and corrections.

Confidential

Java developer

Responsibilities:

  • Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, modeling, analysis, design and development.
  • Used Struts tag libraries in the JSP pages.
  • Worked with JDBC and Hibernate.
  • Used SVN as a version control
  • Developed Web Services using XML messages that use SOAP.
  • Configured Development Environment using Tomcat and Apache Web Server.
  • Develop and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, SOAP, XML, JMS, JBoss and Web Services.
  • Worked with Complex SQL queries, Functions and Stored Procedures.
  • Developed Test Scripts using Junit.
  • Worked with ANT and Maven to develop build scripts.
  • Worked with Hibernate, JDBC to handle data needs.
  • Configured Development Environment using Tomcat and Apache Web Server.

We'd love your feedback!