We provide IT Staff Augmentation Services!

Hadoop Developer Resume

SUMMARY

  • Over 7 years of overall IT Industry and Software Development experience with 5 years of experience in Hadoop Development
  • Experience in installation, upgrading, configuration, monitoring supporting and managing in Hadoop clusters using Cloudera CDH4, CDH5, Hortonworks HDP 2.1, 2.2 and 2.3 on Ubuntu, RedHat, CentOS systems.
  • Worked on components of CDH and HDP including HDFS, MapReduce, Job tracker, Task tracker, Sqoop, Zookeeper, YARN, Oozie, Hive, Hue, Flume, HBase, Fair Scheduler, Spark and Kafka.
  • Deployed Hadoop clusters on public and private cloud environments like AWS and OpenStack
  • Involved in vendor selection and capacity planning for the Hadoop Clusters in production.
  • Experienced in Administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Nagios and Ganglia.
  • Experienced in performing backup, recovery, failover and DR practices on multiple platforms.
  • Implemented Kerberos and LDAP authentication of all the services across Hadoop clusters
  • Experienced in automating the provisioning processes and system resources using Puppet.
  • Implemented Hadoop - based solutions to store archives and backups from multiple sources.
  • Familiar with importing and exporting data using Sqoop from RDBMS MySQL, Oracle, Teradata and used fast loaders and connectors
  • Built ingestion framework using flume for streaming logs and aggregating the data into HDFS.
  • Worked with application team via scrum to provide operational support, install Hadoop updates, patches and version upgrades as required.
  • Imported and exported data in and out of HDFS and processed data for commercial analytics
  • Installed, monitored and performance tuned standalone multi-node clusters of Kafka.
  • Successfully loaded files to Hive and HDFS from MongoDB, Cassandra, and HBase.
  • Implemented authentication using Kerberos and authentication using Apache Sentry.
  • Exposure to installing Hadoop and its ecosystem components such as Hive and Pig.
  • Experienced in collaborative platforms including Jira, Rally, SharePoint and Discovery
  • Experience in understanding and managing Hadoop Log Files, experience in managing the Hadoop infrastructure with Cloudera Manager.
  • Experienced on SQL DBA in HA and DDR like replication, log shipping, mirroring and clustering and database security and permissions
  • Experienced in upgrading SQL Server software, patches and service packs
  • Experience in providing good production support for 24x7 over weekends on rotation basis.

PROFESSIONAL EXPERIENCE

Hadoop Developer

Confidential

Responsibilities:

  • Took all the training given by the Authority to Become Comfortable with Confidential DG supervision Hadoop Team.
  • Had to use JIRA to follow the assigned tasks on a daily basis.
  • DB2 to HIVE ingestion:
  • Had to create STG (stage) and PERM (permanent) tables for HIVE imitating DB2 schemas and tables, which is necessary for ingesting data from DB2 to HIVE. Also created insert queries for each table for inserting data from stage to perm tables.
  • Created a load metadata entry file by using Load Metadata Table Description from Sqoop1HFramework and entered metadata for each table in the Hadoop file system.
  • Executed the necessary commands in Edge Node to run ingest operation successfully.
  • Autosys Job Box:
  • Carried out the responsibility to create an autosys box job for creating new data pipelines. Had to collect and populate insert job, box name, command, description and profile for creating successful Autosys jobs which run in different time intervals.
  • Production Level Metadata creation job for more than 100 tables following given Template and Standard.
  • Had to collect all the data columns from BSDWeb. Followed given Instruction and Classword rules also.
  • Submitted all the Tasks of the Sprints on Time.

Big Data Developer

Confidential

Responsibilities:

  • Developed architecture document, process documentation, server diagrams, requisition documents
  • Real streaming the data using Spark with Kafka.
  • Working knowledge of Amazon's Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Running Apache Hadoop, CDH and Map-R distros, dubbed Elastic MapReduce (EMR) on (EC2).
  • Experienced in Spark streaming, which collects the data from Kafka in near real time and performs necessary transformations and aggregations on the fly to build the common learner data model and persists the data in Mongo DB
  • Responsible for installation and configuration of Hive, Pig, HBase and Sqoop on the Hadoop cluster and created hive tables to store the processed results in a tabular format.
  • Experience with designing and building solutions for data ingestion both real time & batch using Sqoop/PIG/Impala/Kafka.
  • Built NiFi dataflow to consume data from Kafka, make transformations on data, place in HDFS & exposed port to run spark streaming job.
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Informatica Powercenter is used for Data integration.
  • Experienced to connect & fetch data from different heterogeneous source and processing of data. Can connect to an SQL Server Database and Oracle Database both and can integrate the data into a third system.
  • Informatica has the ability to publish the database process as web services, conveniently, easily and speedily. Informatica helps to balance the load between database box & ETL server, with coding capability.
  • Configured Spark Streaming to receive real-time data from the Apache Kafka and store the stream data to HDFS using Scala.
  • Hands on with real time data processing using distributed technologies Storm and Kafka.
  • Created demos in Tableau Desktop and published onto Tableau server.
  • Worked on motion chard, Bubble chart, drill down analysis using Tableau desktop.
  • Experienced in PySpark API written in python. PySpark is the collaboration of Apache Spark and Python.
  • Worked with PySpark library to apply SQL-like analysis on a huge amount of structured or semi-structured data.
  • Involved in deployment of services through containerization (Docker, Kubernetes, and/or AWS Elastic Container Service (ECS).
  • Complete end to end design and development of Apache NiFi flow which acts as the agent between the middleware team and EBI team and executes all the actions mentioned above.
  • Developed the Sqoop scripts to make the interaction between Hive and vertical Database.
  • Processed data into HDFS by developing solutions and analyzed the data using Map Reduce PIG, and Hive to produce summary results from Hadoop to downstream systems.
  • Build servers using AWS: Importing volumes, launching EC2, creating security groups, auto-scaling, load balancers, Route53, SES and SNS in the defined virtual private connection.
  • Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
  • Streamed AWS log group into Lambda function to create service now incident.
  • Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
  • Used Hive to perform data validation on the data ingested using scoop and flume and the cleansed data set is pushed into HBase.
  • Scheduled several times based Oozie workflow by developing Python scripts.
  • Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT to extract data from data files to load into HDFS.
  • Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
  • Worked on S3 buckets on AWS to store Cloud Formation Templates and worked on AWS to create EC2 instances.
  • Responsible for installing Talend on multiple environments, creating projects, setting up user roles, setting up job servers, configure TAC options, adding Talend jobs, job failures, on-call support and scheduling etc.
  • Enabled load balancer for impala to distribute data load on all Impala daemons across the cluster.
  • Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using a shell script, Sqoop, package, and MySQL.
  • Experienced with HIVE Data warehousing which aggregates structured data from one or more sources so that it can be compared and analyzed for greater business intelligence.
  • Prepared and presented Business Requirement Document (BRD), System Requirement Specification (SRS) and Functional Requirement Document (FRD).
  • Analyze business requirements and segregate them into Use Cases. Created Use case diagrams, activity diagrams, Sequence Diagrams.
  • Organized JAD sessions to flush out requirements, performed Use Case and workflow analysis, outlined business rules, and developed domain object models.
  • End-to-end architecture and implementation of client-server systems using Scala, Akka, Java, JavaScript and related.
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Involved in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
  • Created partitioned tables and loaded data using both static partition and dynamic partition method.
  • Developed custom Apache Spark programs in Scala to analyze and transform unstructured data.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
  • Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.
  • Test Driven Development (TDD) process and extensive experience with Agile and SCRUM programming methodology.
  • Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA
  • Involved in Cluster maintenance, Cluster Monitoring, and Troubleshooting, Manage and review data backups and log files.

Environment: Hadoop, Confluent Kafka, Hortonworks HDF, HDP,Elastic, NIFI, Linux, Splunk, Java, Puppet, Apache Yarn, Pig, Spark, Tableau, Machine Learning.

HADOOP DEVELOPER

Confidential

Responsibilities:

  • Worked on analyzing Hadoop clusters and different big data analytic tools including Pig, Hive and Sqoop.
  • Provided application demo to the client by designing and developing a search engine, report analysis trends, application administration prototype screens using AngularJS, and Bootstraps.
  • Took the ownership of complete application Design of Java part, Hadoop integration
  • Apart from the normal requirement gathering, I participated in a Business meeting with the client to gather security requirements.
  • Implemented Apache NiFi to allow integration of Hadoop and PostgreSQL into day to day usage SDS of team's projects.
  • Design and implement ETL jobs using BIML, SSIS and Sqoop to move from SQL Server to Impala.
  • Assisted with the architect to analyze the existing system and future system Prepared design blue pints and application flow documentation
  • Experienced in managing and reviewing Hadoop log files Load and transform large sets of structured, semi-structured and unstructured data
  • Responsible to manage data coming from different sources and application Supported Map Reduce Programs those are running on the cluster
  • Responsible in working with Message broker system such as Kafka Extracted data from mainframes and feed to KAFKA and ingested to HBase to perform Analytics
  • Written event-driven, link tracking system to capture user events and feed to KAFKA to push it to HBASE.
  • Created MapReduce jobs to extract the contents from HBase and configured in OOZIE workflow to generate analytical reports.
  • Developed the JAX- RS web services code using Apache CXF framework to fetch data from SOLR when the user performed the search for documents
  • Participated in SOLR schema, and ingested data into SOLR for data indexing.
  • Written MapReduce programs to organize the data, and ingest the data to suitable for analytics in client specified format
  • Hands on experience in writing python scripts to optimize the performance Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.
  • Created Talend Development Standards. This document describes the general guidelines for Talend developers, the naming conventions to be used in the Transformations and development and production environment structures.
  • Involved in writing spark applications using Scala Hands on experience in creating RDDs, transformations, and Actions while implementing spark applications
  • Good knowledge in creating data frames using Spark SQL. Involved in loading data into Cassandra NoSQL Database
  • Implemented record level atomicity on writes using Cassandra Written PIG Scripts to query and process the Datasets to figure out the patterns of trends by applying client-specific criteria, and configured OOZIE workflows to run the jobs along with the MR jobs
  • Stored the derived the results in HBase from analysis and make it available to data ingestion for SOLR for indexing data
  • Involved in integration of java search UI, SOLR and HDFS Involved in code deployments using continuous integration tool using Jenkins
  • Documented all the challenges, issues involved to deal with the security system and Implemented best practices
  • Created Project structures and configurations according to the project architecture and made it available to the junior developer to continue their work
  • Handled onsite coordinator role to deliver work to offshore Involved in core reviews and application lead supported activities

Environment: Apache Hadoop, Hive, Scala, PIG, HDFS, Cloudera, Java Map-Reduce, Python, Maven, GIT, Jenkins, UNIX, MySQL, Eclipse, Oozie, Sqoop, Flume, Oracle, JDK 1.8/1.7, Agile and Scrum Development Process, NoSQL

Hadoop Developer

Confidential

Responsibilities:

  • Involved in complete project life cycle starting from design discussion to production deployment
  • Worked closely with the business team to gather their requirements and new support features
  • Involved in running POC's on different use cases of the application and maintained a standard document for best coding practices
  • Developed a 200-node cluster in designing the Data Lake with the Hortonworks distribution
  • Responsible for building scalable distributed data solutions using Hadoop
  • Installed, configured and implemented high availability Hadoop Clusters with required services (HDFS, Hive, HBase, Spark, Zookeeper)
  • Implemented Kerberos for authenticating all the services in Hadoop Cluster
  • Responsible for installation and configuration of Hive, Pig, HBase and Sqoop on the Hadoop cluster and created hive tables to store the processed results in a tabular format.
  • Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
  • Developed the Sqoop scripts to make the interaction between Hive and vertica Database.
  • Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
  • Build servers using AWS: Importing volumes, launching EC2, creating security groups, auto-scaling, load balancers, Route 53, SES and SNS in the defined virtual private connection.
  • Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
  • Streamed AWS log group into Lambda function to create service now incident.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
  • Created Managed tables and External tables in Hive and loaded data from HDFS.
  • Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
  • Scheduled several times based Oozie workflow by developing Python scripts.
  • Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT to extract data from data files to load into HDFS.
  • Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
  • Worked on S3 buckets on AWS to store Cloud Formation Templates and worked on AWS to create EC2 instances.
  • Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, Sqoop, package and MySQL.
  • End-to-end architecture and implementation of client-server systems using Scala, Akka, Java, JavaScript and related, Linux
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Implementing Hadoop with the AWS EC2 system using a few instances in gathering and analyzing data log files.
  • Involved in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
  • Created partitioned tables and loaded data using both static partition and dynamic partition method.
  • Developed custom Apache Spark programs in Scala to analyze and transform unstructured data.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
  • Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.
  • Test Driven Development (TDD) process and extensive experience with Agile and SCRUM programming methodology.
  • Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA
  • Scheduled map reduces jobs in the production environment using the Oozie scheduler.
  • Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
  • Designed and implemented map reduce jobs to support distributed processing using java, Hive and Apache Pig
  • Analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
  • Improved the Performance by tuning of HIVE and map reduce.
  • Research, evaluate and utilize modern technologies/tools/frameworks around Hadoop ecosystem.

Environment: HDFS, Map Reduce, Hive, Sqoop, Pig, Flume, Vertica, Oozie Scheduler, Java, Shell Scripts,Teradata, Oracle, HBase, MongoDB, Cassandra, Cloudera, AWS, JavaScript, JSP, Kafka, Spark, Scala and ETL, Python.

Hire Now