We provide IT Staff Augmentation Services!

Sr.hadoop & Cloud Developer Resume

Beaverton, OregoN

SUMMARY:

  • 5.7 years of experience as a professional Hadoop developer in Batch and real - time data processing using various Hadoop components- Spark, Solr, Kafka, Hbase, Hive, Nifi, Sqoop, Storm and Java
  • Having experience in building Hortonworks Hadoop Cluster - HDP2.5
  • Having working experience in PySpark with AWS Cloud components like S3, Redshift Db
  • Deputed to TCS-Singapore for a period of 6 months to work closely for Confidential to build a Business layer for various transactional sources using Hadoop Components
  • Experience in working with Pyspark, Spark, Spark Streaming & Spark Sql and also in extending Spark integration with various components - Solr, Kafka, HDFS, Hbase, Hive & Amazon Kinesis
  • Extensive Experience in building SolrCloud cluster, Confidential, Banana dashboards and in extending prepare custom Solr schema & configurations
  • Utilized Storm, Kafka & Amazon kinesis for processing large volume of datasets
  • Experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to RDBMS and vice-versa
  • Experience in working with MR,PIG scripts & HIVE query language, Hcatalog and also in extending Hive and Pig functionality by writing custom UDFs
  • Experience in analyzing data using Hive QL,Pig,SPARK and custom MR programs in Java
  • Having hands on experience of Nifi(HDF) in building data routing & transformation dataflows and integrated with various components( Hdfs, Hive, Hbase, Solr, Mssql & Kafka) as source/target
  • Experience in working with multiple data formats - Avro, Parquet, Json, Xml, Csv
  • Utilized Oozie workflow to schedule Sqoop, Java, Hive,Hive2, Pig, MR & Shell script in HDP Kerberos Cluster
  • Having experienced knowledge in HBase and Phoenix
  • Hands on experience on Azure Cloud and Amazon AWS cloud services: EC2,S3,Data pipeline and EMR,S3
  • In depth understanding knowledge of Hadoop Architecture and various components such as HDFS,YARN,Zookeeper and MapReduce concepts
  • Having work experience in various Hadoop distributions(Cloudera, Hortonworks) & cloud platforms (AWS Cloud, Microsoft Azure)

TECHNICAL SKILLS:

Hadoop Distributions: Cloudera, Hortonworks

Cloud Platforms: Amazon Cloud and Microsoft Azure

Data Movement and integration: Nifi (HDF), Sqoop, Kafka, Amazon Kinesis

Search Engine: Solr, Elastic Search

Processing/Computing Frameworks: Pyspark, Spark, Spark Streaming, MapReduce, Storm

Query Languages: HiveQL, Spark Sql, Sql, Impala

Security: Kerberos, Ranger

File formats: Avro, Parquet, XML, JSON, CSV, XLSX

Workflow schedulers: Oozie, Unix Cron, APScheduler

Other Big Data Components: YARN, Zookeeper, Ambari, Hue, Tez, Pig

Cluster Installation: Hortonworks HDP 2.5 Using Ambari 2.4

Databases: HBase, Oracle, MsSql, Redshift

Languages: Java, Python, D3

Development / Build Tools: Eclipse, Maven,SVN, Jira, BitBucket, Confluence

Java Frameworks: Hibernate, Jboss Drools Engine

Operating Systems: Linux, Windows

EXPERIENCE:

Sr.Hadoop & Cloud Developer

Confidential, Beaverton, Oregon

Responsibilities:

  • Involved in Discussions with business users to gather the required knowledge
  • Analysing the requirements to Design and develop the framework
  • Developed PySpark scripts to perform incremental updates on hive data .
  • Developed airflow scripts to automate pyspark, hive, Athena scripts in required regular intervals
  • Perform the continuous deployments / integration using Jenkins

Tools: /Components: AWS S3, Python 2.7, Spark 2.1.2, Airflow, Hive, AWS EMR, Athena

Confidential, CT

Sr.Hadoop & Cloud Developer

Responsibilities:

  • Involved in Discussions with business users to gather the required knowledge
  • Analysing the requirements to develop the framework
  • Developed Java Spark streaming scripts to load raw files and corresponding processed metadata files into AWS S3 and Elasticsearch cluster.
  • Developed Python Scripts to get the recent S3 keys from Elasticsearch
  • Developed Python Scripts to fetch/get S3 files using Boto3 module .
  • Implemented Pyspark logic to transform and process various formats of data like XLSX, XLS, JSON, TXT
  • Built scripts to load pyspark processed files into Redshift Db
  • Developed scripts to monitor and capture state of each file which is being through Pyspark logic
  • Implemented Shell script to automate the whole process

Tools: AWS S3, Java 1.8, Maven, Python 2.7, Spark 1.6.1, Kafka, ElasticSearch 5.3, MapR Cluster, Amazon Redshift Db, Shell script

Python Modules: Boto3, pandas, Elasticsearch,certifi, pyspark, Psycopg2, Json,io

Confidential, Mi

Sr.Hadoop Developer

Responsibilities:

  • Analysing the requirements to develop the framework
  • Import plants event data from various plants using Apache Nifi
  • Implement transformation logic on plant events using Apache Nifi
  • Build HBase data lake and created secondary indexes using Phoenix
  • Manage and tuning Hbase to improve Performance

Tools: /Components: Apache Nifi 1.0, JSON expression Language, HBase1.1.2, Phoenix 4.7, Microsoft Azure HDP Cluster-2.5

Confidential

Hadoop Developer

Responsibilities:

  • Build Solrcloud cluster with external Zookeeper quorum
  • Index on real time HDFS plants events using Solr and Spark Streaming
  • Index on HBase Cycle time events using lucidworks HBase indexer
  • Build Banana dashboards
  • Configurable changes in Banana to make them available to end users

Tools: /Components: Lucidworks-Solr 5.5.2, Apache Spark 1.6, HBase1.1.2, Banana 1.6 Dashboard, Java1.8, Shell Script, Microsoft Azure HDP Cluster-2.5

Confidential

Hadoop Developer

Responsibilities:

  • Importing data using Sqoop from MsSql Server into HDFS
  • Build Hive scripts to perform queries and transformations
  • Build oozie coordinator workflows of Sqoop and Hive to schedule daily and incremental jobs
  • Helps to team to build SAP-BO reports on Hive using ODBC Driver

Tools: /Components: Sqoop 1.4, Hive 1.2, Oozie 4.2, Shell Script

Confidential

Hadoop Developer

Responsibilities:

  • Align the accounting entries from source systems to populate Standard set of PSGL chartfields (Ledger, Business Unit, Account, Product, PC Code, Chartfield3, Original CCY, Base CCY.
  • Summarised Accounting entries to be sent to PSGL in order to maintain the performance/EOD processing of PSGL.
  • Detailed (non-summarised) accounting entries to be sent to ODS and any other downstream that require such information.
  • Reduce Manual journal entries (MJE) posted across operations.
  • Decommission legacy mainframe systems
  • Faster Book closing
  • Involved in Discussion with business users to gather the required knowledge
  • Analysing the requirements to develop the framework.
  • Importing data using Sqoop.
  • Data Ingestion into Hive.
  • Processing Hive data using Spark and Spark Sql
  • Integration of JBoss Drools with Spark transformations
  • Sending files to PSGL, ODS

Tools: /Components: Cloudera 5.4.3 Cluster, Apache Spark 1.3, JBoss Drools, Java, Maven, Sqoop, Hdfs

Confidential

Hadoop Developer

Responsibilities:

  • Analysing the requirements to develop the framework.
  • Data Ingestion into HDFS and then integrated into Hive.
  • Develop the script integrate HBase with Hive data
  • Build the scripts to index data using HBase lily indexer + Solr
  • Developed Solr Java code to bring up the relation among materials
  • Build the logic functionality to fetch the hierarchical data and give a provision to search with component number using Java, JSP integration
  • Visualize the results using D3 Javascript dashboards

Tools: /Components: Cloudera CDH5.2,Solr,, Java, Sqoop, Hive, HBase,D3.js,JSP

Confidential

Hadoop Developer

Responsibilities:

  • Analysing the requirements to develop the framework.
  • Developed Sqoop scripts and Data Services to pull delta data and store them into HDFS.
  • Developed hive scripts to merge delta data with existing hive data.
  • Worked on oozie scripts to schedule above process for every 30 mints
  • Developed Reconciliation Java framework is used to record level comparison

Tools: /Components: Cloudera CDH5.2,, Java, Sqoop, Hive

Confidential

Hadoop Developer

Responsibilities:

  • Analysing the requirements to develop the framework.
  • Developed Sqoop scripts to import data from Oracle to Hdfs
  • Developed Map-Reduce programs for Cleansing & Validating data on imported Hdfs data
  • Implemented Custom key and partitioning techniques in MapReduce Programming
  • Developed Hive table Structures to inject cleansed data
  • Configuration changes in Hive, MR programming as a part of performance tuning
  • Executed queries in impala to get query performance
  • Built a workflow of Sqoop, MapReduce and Hive scripts and schedule them in Oozie via Hue
  • Helps to tableau team to build reports by connecting Impala

Tools: /Components: Cloudera 4, Java, Sqoop, MapReduce Hive, Impala, Oozie, Hue

Hire Now