We provide IT Staff Augmentation Services!

Sr. Hadoop Spark Developer Resume

2.00/5 (Submit Your Rating)

SUMMARY:

  • 8+ years Professional experience of in IT including 4+ years of comprehensive experience in Big data Development across Finance and Insurance domains primarily using Hadoop and Spark Ecosystems.
  • Expertise in developing applications and frameworks using Java and Python.
  • Extensive experience on Big Data ecosystems and components like HDFS, MapReduce, HBase, Spark, Yarn, Kafka, Zookeeper, PIG, Hive, Sqoop, Storm, Oozie, Zookeeper, Impala, Nifi and Flume.
  • Expertise in working with EMR, Cloudera and Hortonworks distributions.
  • Good Working Expertise in handling structured, semi structured and unstructured data.
  • Experience in importing data from Relational Database Systems to HDFS to and vice - versa using Sqoop.
  • Expertise in developing batch data processing applications using Spark, Hive, Map Reduce, PIG and Sqoop.
  • Experienced in developing Real Time Streaming applications using Kafka as well as Spark Streaming and creating event-processing data pipelines.
  • Expertise in writing DDLs and DMLs using SQL scripts for analytics applications in MySQL and Hive.
  • Extending Hive and Pig core functionality by writing custom UDFs, UDTF and UDAFs.
  • Good Working noledge on NoSQL databases like HBase and Cassandra.
  • Experience in working with CSV, JSON, XML, ORC, Avro and Parquet file formats.
  • Good experience writing shell scripts to schedule jobs.
  • Experience in Orchestrating and scheduling workflows using Crontab, Oozie and Airflow.
  • Expertise in working with AWS cloud services like EMR, S3, Redshift, EMR, Lambda, DynamoDB, RDS, SNS, SQS, Glue, Data Pipeline, Atana.
  • Expertise in implementing serverless ETL jobs using Lambda and Glue.
  • Experience in understanding of the Specifications for Data Warehouse ETL Process and interacted with the designers and the end users for informational requirements.
  • Experience in implementing standards and processes for Hadoop based application design and implementation.
  • Experience in working with core J

PROFESSIONAL EXPERIENCE:

Confidential

Sr. Hadoop Spark Developer

Responsibilities:

  • Developed real - time streaming applications using PySpark to ingest data from the various sources like Oracle Exadata servers, SQL Server, Kafka into S3 Data Lakes. Developed Sqoop jobs to ingest customer and product data from Oracle databases into S3 buckets. Developed a Kafka Producer application dat transactional ingests data from Exadata machines to Kafka topics.
  • Developed Kafka consumer streaming application to ingest data from Kafka topics into data lakes in Avro format. Closely worked with Kafka Admin team to set up Kafka cluster and implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper. Developed a framework using Python used to develop multiple applications to process batch and streaming data. Built a real time PySpark application to detect fraud transactions by applying various business rules. Processed incoming transactions using PySpark by applying various business validation rules and persisted the validated transactions on HBase. Reconcile the processed data from multiple sources using PySpark and persist them on Redshift.
  • Developed DDL and DML scripts to create external tables and analyze incoming and intermediate data for analytics applications in Hive and RedShift. Involved in data ingestion of log files from various servers using NiFi. Implemented Partitioning, Bucketing and other optimization techniques in Hive for efficient data access. Worked on various spark optimizations techniques for memory management, garbage collection, Serialization and also using Broadcast Join, Accumulators and persisting methods. Built a serverless ETL in AWS lambda to process the files in the S3 bucket to be cataloged immediately. Developed serverless ETL applications to process and the data in S3 buckets using Glue. Developed Airflow scripts to Orchestrate complex workflows and schedule using Apache Airflow. Involved in developing CICD pipelines using Jenkins to automate the code deployment. Involved in database design and data modelling using Entity Relationship modeling for OLTP and dimension modeling for OLAP databases.
  • Created detailed design documentation for the source-to-target transformations and technical documentations. Supported in setting up QA environment, updating configurations for implementing scripts with Hive and Spark. Worked as a liaison between offshore teams and onshore to get the work delivered on time and experienced on ON-call support Involved in all the phases of Software Development Life Cycle (Requirements Analysis, Design, Development, Testing, Deployment, and Support) and Agile methodologies

Environment: Python, Hadoop, Spark, Spark SQL, Spark Streaming, Hive, HBase, MySQL, HDFS, Shell Scripting, Crontab, Apache Kafka, AWS Redshift, Lambda, EC2, EMR, S3, Glue.

Confidential

Sr. Big Data Developer

Responsibilities:

  • Developed Sqoop Jobs to ingest customer and product data into HDFS data lakes. Ingested log files from source servers into HDFS data lakes using Flume. Developed a real time Spark streaming application to ingest transactional data from Kafka topics HDFS Data lakes. Processed incoming transaction data using PySpark by applying various business validation rules and persisted them on Cassandra tables. Developed a PySpark application to flatten the transactional data coming from using various dimensional tables and persist on
  • Cassandra tables. Involved in developing a framework for metadata management on HDFS data lakes. Worked on various hive optimizations like partitioning, bucketing, vectorization, indexing and using right type of hive joins like Bucket Map Join and SMB join. Worked with various files format like CSV, JSON, ORC, AVRO and Parquet. Developed HQL scripts to create external tables and analyze incoming and intermediate data for analytics applications in Hive. Optimized spark jobs using various optimization techniques like broadcasting, executor tuning, persisting etc. Responsible for developing custom UDFs, UDAFs and UDTFs in Pig and Hive. Analyze the tweets json data using hive SerDe API to deserialize and convert into readable format. Debugging and optimizing long running Spark and Hive applications using various optimization techniques. Orchestrating Hadoop and Spark jobs using
  • Oozie workflow to create dependency of jobs and run multiple Jobs in sequence for processing data. Continuous monitoring and managing the Hadoop cluster through Cloudera Manager. Processed application Weblogs using flume and load them into Hive for analyzing the data Implemented RESTful Web Services to interact with Oracle and Cassandra to store and retrieve the data

Environment: Python, Hadoop, Spark, Spark SQL, Spark Streaming, Hive, Cassandra, MySQL, HDFS, Shell Scripting, Crontab, Apache Kafka

Confidential

Hadoop Developer

Responsibilities:

  • Developed multiple PySpark applications for data cleaning and preprocessing using various validation rules. Developed Pyspark applications to process transactional tables and persist them on HDFS. Created DDL scripts to create external tables in Hive on top on intermediate and processed data. Worked on POCs to migrate MapReduce and Pig jobs to Spark to utilize the in - memory processing of spark.
  • Developed Pig Scripts and Pig UDFs to load and analyze data on HDFS. Wrote the PIG UDF in for converting Date format and time stamp formats from the unstructured files to required date formats and processed the same. Developed multiple HQL scripts to analyze the processed data in hive and send the reports to business. Implemented optimization techniques like partitioning, bucketing, vectorization in Hive for efficient querying. Implemented Sqoop jobs to migrate data to Hadoop clusters from Oracle and vice versa. Ingested log files from source servers into HDFS data lakes using Flume.
  • Deployed Hive and HBase integration to perform OLAP operations on HBase data. Developed MapReduce jobs to process the incoming transactional data and persist on HDFS. Debugging and optimizing long running Spark and Hive applications using various optimization techniques. Orchestrated complex workflows and scheduled using Oozie. Developed the UNIX shell scripts for creating the reports from Hive data. Contributed to the development of framework used to develop Hive, MapReduce and PySpark as well as metadata management on Hadoop clusters to create external tables and reconcile data at different stages using Hive. Worked on various POCs to implement real time streaming applications to ingest transactional data using Spark. Actively participating in the code reviews, meetings and solving any technical issues. Closely worked with Hadoop security team and infrastructure team to implement Kerberos security.

Environment: Hadoop, MapReduce, Hive, pig, Spark, Sqoop, Bash Scripting, Spark RDD, Spark Sql.

Confidential

Sr. Java Developer

Responsibilities:

  • Interacted with business managers to transform requirements into technical solutions. Transformed the Use Cases into Class Diagrams, Sequence Diagrams and State diagrams
  • Generated Domain Layer classes using DAO's from the Database Schema. Derived the Spring MVC Controllers from the Use cases and integrated with the Service Layer to carry out business logic operations and returning resultant Data Model Object if any. Worked on Service Layer implementing the core business logic and providing access to external REST based services. Integrate the designed JSP pages with the View Resolvers, in order to display the view after carrying out the desired operations in the Service layer. Worked on Spring
  • Web Flow Design using the Sequence Diagrams, and configured the flows between the pre - defined Views and Controllers. Developed Validation Layer providing Validator classes for input validation, pattern validation and access control. Defined set of classes for the Helper Layer which validates the Data Models from the Service Layer and prepares them to display in JSP Views
  • Used AJAX calls to dynamically assemble the data in JSP page, on receiving user input. Used Log4J to print the logging, debugging, warning, info on the server console. Involved in creation of Test Cases for JUnit Testing and carried out Unit testing. Used SVN as configuration management tool for code versioning and release. Deployment on Oracle Weblogic Server 10.3. Used ANT tool for deployment of the web application on the Weblogic Server. Involved in the functional tests of the application and also resolved production issues

Environment: Java 1.6, J2EE 5 Servlet, JSP, Spring 2.5, Oracle Weblogic, Log4j, Web Services, JavaScript, SQL Server 2005, SQL Management Studio, PL\SQL, UML, Rational Rose, CVS, Eclipse.

We'd love your feedback!