Spark Developer Resume
SUMMARY
- Expertise in Bigdata Development applications and experienced in Hadoop ecosystem components like Spark, Hive, Sqoop, Pig, Flume and Oozie.
- Hands on developing and debugging Spark Jobs to process large Datasets.
- Excellent knowledge and understanding of Distributed Computing and Parallel processing frameworks.
- Experience in working with Cloudera and Horton Works Hadoop Distributions.
- Worked on Importing and exporting data into HDFS and Hive using Sqoop.
- Experience in Creating Hive tables and load the tables using Sqoop and processed data using Hive QL.
- Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
- Extending Hive and Pig core functionality by writing custom UDF's for Data Analysis.
- Good experience in job scheduling tools like Oozie.
- Experience in handling Hive queries using spark SQL that integrate with spark environment implemented in Scala.
- Hands on Experience in dealing with the different file formats like Sequence files, Json, Avro and Parquet.
- Hands on experience in using Spark Streaming programming model for Real time processing of data and stored in HDFS.
- Experience inworking with Amazon S3 Buckets for storing the data.
- Adequate knowledge of Agile and Waterfall methodologies.
- Extensive programming experience in developing Java applications using Java, J2EE and JDBC.
- Well versed with UNIX and Linux command line and shell script.
- Extensive experience in developing Stored Procedures, Functions and Triggers, Complex SQL queries using Oracle PL/SQL.
- Exhibited strong written and oral communication skills. Rapidly learn and adapt quickly to emerging new technologies and paradigms.
TECHNICAL SKILLS
BigData Technologies: MapReduce, Pig, Hive, Yarn, Sqoop, Oozie, Hbase, Impala, Hue, Spark, Flume, Kafka.
Hadoop Distributions: Horton Works, Cloudera
Cloud platforms: AWS.
Databases: Oracle 11g/10g/9i MySQL, Teradata.
VersionControl Tools: Git, SVN.
Build Tools: Maven, Sbt.
ETL: Informatica PowerCenter 10.1/9.X.
Languages: Java, Scala, SQL, Python.
Operating System: Mac OS, Linux (Various Versions), Windows 2010/7/8/8.1/XP
Development Tools: Eclipse, Intellij.
PROFESSIONAL EXPERIENCE
Confidential
Spark Developer
Responsibilities:
- Used different pyspark APIs to perform necessary transformations and actions on the data which gets from Kafka in real time.
- Performed various Parsing technique’s using pyspark API’S to cleanse the data from Kafka.
- Experienced in working with Spark SQL on different file formats like Avro and Parquet.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Configured Kafka with Spark Streaming to collect the data from the Kafka.
- Implemented to run Hive on spark and analyzed the data using SparkSQL Queries.
- Very good understanding of Partitions, bucketing concepts inHiveand designed both Managed and External tables inHiveto optimize performance.
- Implemented Incremental Imports of analyzed data intoOracle tables using Sqoop.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Used AWS service S3 for small data sets processing and storage.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Implemented data ingestion and handling clusters in real time processing using Kafka.
- Exported the analyzed data to relational databases using Sqoop for visualization and to generate reports.
Confidential
Spark Developer
Responsibilities:
- Used Flume as a data pipeline system to ingest the unstructured events from various web servers to HDFS.
- Wrote various spark transformations using python for data cleansing, validation and summarization activities on user behavioral data.
- Parsed the unstructured data into semi-structured format by writing complex algorithms in pyspark.
- Developed generic parser to transform any format of unstructured data into a consisted data model.
- Configured Flume with the Spark Streaming to transfer the data into HDFS at regular intervals of time from web servers to process the data.
- Implemented the persistence of frequently used transformed data from data frames for faster processing.
- Build hive tables on the transformed data and used different SERDE's to store the data in HDFS in different formats.
- Loaded the transformed Data into the hive tables and perform some analysis based on the requirements.
- Implemented partioning on the Hive data to increase the performance of the processing of data.
- Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
- Worked on various performance optimizations like using distributed cache for small datasets, Partitioning, Bucketing in Hive and Map Side joins.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
- Implemented custom workflow to automate the jobs on daily basis.
- Created custom workflows to automate Sqoop jobs weekly and monthly.
Confidential
Hadoop Developer
Responsibilities:
- Identifying data sources and create appropriate data ingestion procedures.
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions.
- Worked extensively with Hive Query language queries
- Developed PIG Latin Script for handling business transformations and Responsible writing Hive queries for data processing.
- Extend Hive and Pig core functionality by writing custom UDFs using Java.
- Involved in moving all transaction files generated from various sources to HDFS for further processing through flume.
- Developed Sqoop jobs to extract data from Teradata/oracle into HDFS.
- Involved in developing Hive queries for extraction, transformation, loading of data into data warehouse.
- Worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
- Automated all the jobs, for pulling data from different databases to load data into Hive tables using Oozie workflows.
- Loaded data into the HDFS from the web servers using Flume and from relational database management systems using Sqoop.
Confidential
ETL Developer
Responsibilities:
- Gathered requirements from Business users and created a Low level technical design documents using high level design document.
- Participated in the design meetings, and prepared technical and mapping documentation.
- Created Tables, Keys (Unique and Primary) and Indexes in the Oracle.
- Extracted data from Flat files, Oracle to build an Operation Data Source. Applied business logic to load the data into Global Data Warehouse.
- Extensively worked on Facts and Slowly Changing Dimension (SCD) tables.
- Maintained source and target mappings, transformation logic and processes to reflect the changing business environment over time.
- Developed complex mappings to load data from Source System (Oracle) and flat files to Teradata.
- Used various transformations like Filter, Router, Expression, Lookup (connected and unconnected), Aggregator, Sequence Generator, Update Strategy, Joiner, Normalizer, Sorter and Union to develop robust mappings in theInformaticaDesigner.
- Extensively used the Add Currently Processed Flat File Name port to load the flat file name and to load contract number coming from flat file name into Target.
- Worked on complex Source Qualifier queries, Pre and Post SQL queries in the Target.
- Worked on different tasks in Workflow Manager like Sessions, Events raise, Event wait, Decision, E-mail, Command, Worklets, Assignment, Timer and Scheduling of the workflow.
- Extensively used workflow variables, mapping parameters and mapping variables.
- Created sessions, batches for incremental load into staging tables and scheduled them to run daily.
- Implemented performance tuning logic on Targets, Sources, Mappings and Sessions to provide maximum efficiency and performance.
- Involved in Unit, Integration, System, and Performance testing levels.
- Written documentation to describe program development, logic, coding, testing, changes and corrections.
Confidential
Java developer
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, modeling, analysis, design and development.
- Used Struts tag libraries in the JSP pages.
- Worked with JDBC and Hibernate.
- Used SVN as a version control
- Developed Web Services using XML messages that use SOAP.
- Configured Development Environment using Tomcat and Apache Web Server.
- Develop and maintains complex outbound notification applications that run on custom architectures, using diverse technologies including Core Java, J2EE, SOAP, XML, JMS, JBoss and Web Services.
- Worked with Complex SQL queries, Functions and Stored Procedures.
- Developed Test Scripts using Junit.
- Worked with ANT and Maven to develop build scripts.
- Worked with Hibernate, JDBC to handle data needs.
- Configured Development Environment using Tomcat and Apache Web Server.