Spark Developer Resume

SUMMARY

Expertise in Bigdata Development applications and experienced in Hadoop ecosystem components like Spark, Hive, Sqoop, Pig, Flume and Oozie.
Hands on developing and debugging Spark Jobs to process large Datasets.
Excellent knowledge and understanding of Distributed Computing and Parallel processing frameworks.
Experience in working with Cloudera and Horton Works Hadoop Distributions.
Worked on Importing and exporting data into HDFS and Hive using Sqoop.
Experience in Creating Hive tables and load the tables using Sqoop and processed data using Hive QL.
Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
Extending Hive and Pig core functionality by writing custom UDF's for Data Analysis.
Good experience in job scheduling tools like Oozie.
Experience in handling Hive queries using spark SQL that integrate with spark environment implemented in Scala.
Hands on Experience in dealing with the different file formats like Sequence files, Json, Avro and Parquet.
Hands on experience in using Spark Streaming programming model for Real time processing of data and stored in HDFS.
Experience inworking with Amazon S3 Buckets for storing the data.
Adequate knowledge of Agile and Waterfall methodologies.
Extensive programming experience in developing Java applications using Java, J2EE and JDBC.
Well versed with UNIX and Linux command line and shell script.
Extensive experience in developing Stored Procedures, Functions and Triggers, Complex SQL queries using Oracle PL/SQL.
Exhibited strong written and oral communication skills. Rapidly learn and adapt quickly to emerging new technologies and paradigms.

TECHNICAL SKILLS

BigData Technologies: MapReduce, Pig, Hive, Yarn, Sqoop, Oozie, Hbase, Impala, Hue, Spark, Flume, Kafka.

Hadoop Distributions: Horton Works, Cloudera

Cloud platforms: AWS.

Databases: Oracle 11g/10g/9i MySQL, Teradata.

VersionControl Tools: Git, SVN.

Build Tools: Maven, Sbt.

ETL: Informatica PowerCenter 10.1/9.X.

Languages: Java, Scala, SQL, Python.

Operating System: Mac OS, Linux (Various Versions), Windows 2010/7/8/8.1/XP

Development Tools: Eclipse, Intellij.

PROFESSIONAL EXPERIENCE

Confidential

Spark Developer

Responsibilities:

Used different pyspark APIs to perform necessary transformations and actions on the data which gets from Kafka in real time.
Performed various Parsing technique’s using pyspark API’S to cleanse the data from Kafka.
Experienced in working with Spark SQL on different file formats like Avro and Parquet.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Configured Kafka with Spark Streaming to collect the data from the Kafka.
Implemented to run Hive on spark and analyzed the data using SparkSQL Queries.
Very good understanding of Partitions, bucketing concepts inHiveand designed both Managed and External tables inHiveto optimize performance.
Implemented Incremental Imports of analyzed data intoOracle tables using Sqoop.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
Used AWS service S3 for small data sets processing and storage.
Implemented the workflows using Apache Oozie framework to automate tasks.
Implemented data ingestion and handling clusters in real time processing using Kafka.
Exported the analyzed data to relational databases using Sqoop for visualization and to generate reports.

Confidential

Spark Developer

Responsibilities:

Used Flume as a data pipeline system to ingest the unstructured events from various web servers to HDFS.
Wrote various spark transformations using python for data cleansing, validation and summarization activities on user behavioral data.
Parsed the unstructured data into semi-structured format by writing complex algorithms in pyspark.
Developed generic parser to transform any format of unstructured data into a consisted data model.
Configured Flume with the Spark Streaming to transfer the data into HDFS at regular intervals of time from web servers to process the data.
Implemented the persistence of frequently used transformed data from data frames for faster processing.
Build hive tables on the transformed data and used different SERDE's to store the data in HDFS in different formats.
Loaded the transformed Data into the hive tables and perform some analysis based on the requirements.
Implemented partioning on the Hive data to increase the performance of the processing of data.
Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
Worked on various performance optimizations like using distributed cache for small datasets, Partitioning, Bucketing in Hive and Map Side joins.
Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
Implemented custom workflow to automate the jobs on daily basis.
Created custom workflows to automate Sqoop jobs weekly and monthly.

Confidential

Hadoop Developer

Responsibilities:

Identifying data sources and create appropriate data ingestion procedures.
The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions.
Worked extensively with Hive Query language queries
Developed PIG Latin Script for handling business transformations and Responsible writing Hive queries for data processing.
Extend Hive and Pig core functionality by writing custom UDFs using Java.
Involved in moving all transaction files generated from various sources to HDFS for further processing through flume.
Developed Sqoop jobs to extract data from Teradata/oracle into HDFS.
Involved in developing Hive queries for extraction, transformation, loading of data into data warehouse.
Worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
Automated all the jobs, for pulling data from different databases to load data into Hive tables using Oozie workflows.
Loaded data into the HDFS from the web servers using Flume and from relational database management systems using Sqoop.

Confidential

ETL Developer

Responsibilities:

Gathered requirements from Business users and created a Low level technical design documents using high level design document.
Participated in the design meetings, and prepared technical and mapping documentation.
Created Tables, Keys (Unique and Primary) and Indexes in the Oracle.
Extracted data from Flat files, Oracle to build an Operation Data Source. Applied business logic to load the data into Global Data Warehouse.
Extensively worked on Facts and Slowly Changing Dimension (SCD) tables.
Maintained source and target mappings, transformation logic and processes to reflect the changing business environment over time.
Developed complex mappings to load data from Source System (Oracle) and flat files to Teradata.
Used various transformations like Filter, Router, Expression, Lookup (connected and unconnected), Aggregator, Sequence Generator, Update Strategy, Joiner, Normalizer, Sorter and Union to develop robust mappings in theInformaticaDesigner.
Extensively used the Add Currently Processed Flat File Name port to load the flat file name and to load contract number coming from flat file name into Target.
Worked on complex Source Qualifier queries, Pre and Post SQL queries in the Target.
Worked on different tasks in Workflow Manager like Sessions, Events raise, Event wait, Decision, E-mail, Command, Worklets, Assignment, Timer and Scheduling of the workflow.
Extensively used workflow variables, mapping parameters and mapping variables.
Created sessions, batches for incremental load into staging tables and scheduled them to run daily.
Implemented performance tuning logic on Targets, Sources, Mappings and Sessions to provide maximum efficiency and performance.
Involved in Unit, Integration, System, and Performance testing levels.
Written documentation to describe program development, logic, coding, testing, changes and corrections.

Confidential

Java developer

Responsibilities:

Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, modeling, analysis, design and development.
Used Struts tag libraries in the JSP pages.
Worked with JDBC and Hibernate.
Used SVN as a version control
Developed Web Services using XML messages that use SOAP.
Configured Development Environment using Tomcat and Apache Web Server.
Develop and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, SOAP, XML, JMS, JBoss and Web Services.
Worked with Complex SQL queries, Functions and Stored Procedures.
Developed Test Scripts using Junit.
Worked with ANT and Maven to develop build scripts.
Worked with Hibernate, JDBC to handle data needs.
Configured Development Environment using Tomcat and Apache Web Server.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship