We provide IT Staff Augmentation Services!

Hadoop And Spark Developer Resume

SUMMARY

  • 7+ years of experience in IT, which includes experience in Bigdata Technologies, Hadoop ecosystem, Data Warehousing, SQL related technologies in Retail, Manufacturing, Financial and Communication sectors
  • 5 Years of experience in Big Data Analytics using Various Hadoop eco - systems tools and Spark Framework and currently working on Spark and Spark Streaming frameworks extensively using Scala as the main programming dialect.
  • Experience installing/configuring/maintaining Apache Hadoop clusters for application development and Hadoop tools like Sqoop, Hive, PIG, Flume, HBase, Kafka, Hue, Storm, Zookeeper, Oozie, Cassandra, Python.
  • Worked with major distributions like Cloudera (CDH 3&4) & Horton works Distributions and AWS. Also worked on Unix and DWH in support for various Distributions.
  • Hands on experience in developing and deploying enterprise-based applications using major components in Hadoop ecosystem like Hadoop 2.X, YARN, Hive, Pig, MapReduce, Spark, Kafka, Storm, Oozie, HBase, Flume, Sqoop and Zookeeper
  • Experience in developing data pipeline using Pig, Sqoop, and Flume to extract the data from weblogs and store in HDFS and accomplished developing Pig Latin Scripts and using HiveQL for data analytics
  • Extensively dealt with Spark Streaming and Apache Kafka to fetch live stream data.
  • Experience in converting Hive/SQL queries into Spark transformations using Java and experience in ETL development using Kafka, Flume and Sqoop
  • Performance tuning and running jobs in Hadoop clusters.
  • Capacity planning as a Hadoop admin.
  • Hands on DevOps essential tools like Chef, Puppet, Ansible, Docker, Kubernetes, Subversion (SVN), GIT, Hudson, Jenkins, Ant, Maven and migrated VMWAREVMs to AWS and Managed Services like EC2, S3, Route53, ELB, EBS.
  • Worked on AWS ropework, AWS Lambda, AWS code deploys, AWS cloud formation and Cloud Foundry.
  • Monitoring connectivity and security of Hadoop cluster, Management of HDFS file system and monitoring them.
  • Deployment of Cloud service including Jenkins and Nexus on Docker using Terraform.
  • Automated the process of installation, configuration of the web application servers like WebSphere/WebLogic/Apache Tomcat/JBOSS using Ansible / Chef /Puppet.
  • Developed AWS Cloud formation templates to create custom sized VPC, subnets, EC2 instances, ELB and security groups
  • Extensive experience in developing applications that perform Data Processing tasks using Teradata, Oracle, SQL Server, and MySQL database
  • Extensively dealt with Spark Streaming and Apache Kafka to fetch live stream data.
  • Experience and Familiar with Virtualization technologies like installing, configuring, administering VMware Spheres and SRM 4.5.and Citrix XEN Server.
  • Experienced in using bug tracking systems like JIRA, Remedy, HP Quality Centre, and IBM ClearQuest.
  • Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Oozie.
  • Experience in converting Hive/SQL queries into Spark transformations using Java and experience in ETL development using Kafka, Flume and Sqoop
  • Good experience in writing Spark applications using Scala and Java and used Scala set to develop Scala projects and executed using Spark-Submit
  • Experience working on NoSQL databases including HBase, Cassandra and MongoDB and experience using Sqoop to import data into HDFS from RDBMS and vice-versa
  • Developed Spark scripts by using Scala shell commands as per the requirement
  • Good experience in writing Sqoop queries for transferring bulk data between Apache Hadoop and structured data stores.
  • Substantial experience in writing Map Reduce jobs in Java, PIG, Flume, Zookeeper, Hive and Storm
  • Implemented to reprocess the failure messages in Kafka using offset id.
  • Implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper.
  • Used Spring Kafka API calls to process the messages smoothly on Kafka Cluster setup.
  • Acquaintance with Agile and Waterfall methodologies. Responsible for handling several clients facing meetings with great communication skills

TECHNICAL SKILLS

Big Data Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Kafka, Cassandra, Apache Spark, Spark Streaming, HBase, Flume, Impala

Hadoop Distribution: Cloudera, Horton Works, Apache, AWS

Languages: Java, SQL, PL/SQL, Python, Pig Latin, HiveQL, Scala, Regular Expressions

Web Technologies: HTML, CSS, JavaScript, XML, JSP, Restful, SOAP

Operating Systems: Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.

Portals/Application servers: WebLogic, WebSphere Application server, WebSphere Portal server, JBOSS

Build Automation tools: SBT, Ant, Maven

Version Control: GIT

IDE &Build Tools, Design: Eclipse, Visual Studio, Net Beans, Rational Application Developer, Junit

Databases: Oracle, SQL Server, MySQL, MS Access, NoSQL Database (HBase, Cassandra, MongoDB), Teradata.

PROFESSIONAL EXPERIENCE

Hadoop and Spark Developer

Confidential

Responsibilities:

  • Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
  • Developed Spark API to import data into HDFS from Teradata and created Hive tables.
  • Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
  • Involved in running all the hive scripts through hive, Impala, Hive on Spark, and some through Spark SQL.
  • Involved in performance tuning of Hive from design, storage, and query perspectives.
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
  • Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
  • Developed Spark scripts to import large files from Amazon S3 buckets.
  • Developed Spark core and Spark SQL scripts using Scala for faster data processing.
  • Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
  • Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
  • Used Spring Kafka API calls to process the messages smoothly on Kafka Cluster setup.
  • Integrated Hive and Tableau Desktop reports and published to Tableau Server.
  • Developed shell scripts for running Hive scripts in Hive and Impala.
  • Orchestrated number of Sqoop and Hive scripts using Oozie workflow and scheduled using Oozie coordinator.
  • Administered all requests and analyzed issues and provided efficient resolution for same.
  • Designed all program specifications and performed required tests in same.
  • Prepared codes for all modules according to require specification and client requirements.
  • Monitor all production issues and inquiries and provide efficient resolution for same.
  • Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.

Environment: HDFS, Yarn, Map Reduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Informatica, Oracle, Teradata, PL/SQL UNIX Shell Scripting, Cloudera.

Confidential

Sr. Hadoop Developer

Responsibilities:

  • Involved in the Complete Software development life cycle (SDLC) to develop the application.
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and Map Reduce on EC2.
  • Worked with the Data Science team to gather requirements for various data mining projects.
  • Worked with different source data file formats like JSON, CSV, and TSV etc.
  • Experience in importing data from various data sources like MySQL and Netezza using Sqoop, SFTP, performed transformations using Hive, Pig and loaded data back into HDFS.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce.
  • Loading salesforce Data every 15 min on incremental basis to BIGQUERY raw and UDM layer using SOQL, Google DataProc, GCS bucket, HIVE, Spark, Scala, Python, Gsutil And Shell Script.
  • Open SSH tunnel to Google Data Proc to access to yarn manager to monitor spark jobs.
  • Write a Python program to maintain raw file archival in GCS bucket.
  • Write Scala program for spark transformation in Data proc.
  • Import and export data between the environments like MySQL, HDFS and deploying into productions.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve the performance.
  • Experience in Oozie workflow scheduler template to manage various jobs like Sqoop, MR, Pig, Hive, Shell scripts, etc.
  • Involved in importing and exporting data from HBase using Spark.
  • Involved in POC for migrating ETLS from Hive to Spark in Spark on Yarn Environment.
  • Actively participating in the code reviews, meetings and solving any technical issues.

Environment: Apache Hadoop, Gcp, Big query, Gcs Bucket, G-Cloud Function, EMR, EC2, S3, Horton works, Map Reduce, Hive, Pig, Sqoop, Apache Spark, Zookeeper, HBase, Java, Oozie, Oracle, MySQL, Netezza and UNIX Shell Scripting

Hire Now