BIG DATA HADOOP DEVELOPER Resume

SUMMARY:

5 plus years of experience in working with Big Data Technologies on systems which comprises of massive amount of data running in highly distributed Hadoop environment.
Strong knowledge of Hadoop Architecture and Daemons such as HDFS, JOB Tracker, Task Tracker, Name None, Data Node and Map Reduce concepts.
Well versed in implementing E2E solutions on big data using Hadoop frame work.
Hands on experience in writing Map Reduce programs using Java to handle different data sets using Map and Reduce tasks
Worked with join patterns and implemented Map side joins and Reduce side joins using Map Reduce.
Developed multiple MapReduce jobs to perform data cleaning and preprocessing.
Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design.
Having experience in developing a data pipeline using Kafka to store data into HDFS.
Good knowledge on AWS infrastructure services Amazon Simple Storage Service (Amazon S3),EMR, and Amazon Elastic Compute Cloud (Amazon EC2).
Implemented Ad - hoc query using Hive to perform analytics on structured data.
Expertise in writing Hive UDF, Generic UDF's to in corporate complex business logic into Hive Queries
Experienced in optimizing Hive queries by tuning configuration parameters.
Implemented SQOOP for large dataset transfer between Hadoop and RDBMS.
Extensively used Apache Flume to collect the logs and error messages across the cluster.
Experienced in performing real time analytics on HDFS using HBase.
Used Cassandra CQL with Java API's to retrieve data from Cassandra tables.
Experience in composing shell scripts to dump the shared information from MySQL servers to HDFS.
Worked on Implementing and optimizing Hadoop/MapReduce algorithms for Big Data analytics.
Experience in working with Amazon EMR, Cloudera (CDH3 & CDH4) and Horton Works Hadoop Distributions
Experience in meeting expectations with Hadoop clusters using Cloudera(CDH3 &CDH4) and Horton Wo

EXPERIENCE:

Confidential

BIG DATA HADOOP DEVELOPER

Responsibilities:

Responsibilities Developed data pipelines using Stream sets Data Collector to store data from Kafka into HDFS, Solr, Elastic Search, HBase and Map DB. analyze and define researcher's strategy and determine system architecture and requirement to achieve goals. Developed multiple Kafka Producers and Consumers from as per the software requirement specifications. Used Kafka for log accumulation like gathering physical log documents off servers and places them in a focal spot like HDFS for handling. Configured Spark streaming to get ongoing information from the Kafka and store the stream information to
HDFS. Used various spark Transformations and Actions for cleansing the input data. Developed shell scripts to generate the hive create statements from the data and load the data into the table. Wrote Map Reduce jobs using Java API and Pig Latin Optimized Hive QL/ pig scripts by using execution engine like Tez, Spark. Involved in writing custom Map - Reduce programs using java API for data processing. Integrated Maven build and designed workflows to automate the build and deploy process. Involved in developing a linear regression model to predict a continuous measurement for improving the observation on wind turbine data developed using spark with Scala API. The hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions and bucketing, intended for efficiency. Load and transform large sets of structured, semi structured data using hive. Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS. Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API. Developed Hive queries for the analysts.
Worked in AWS environment for development and deployment of custom Hadoop applications. Strong experience in working with ELASTIC MAPREDUCE(EMR)and setting up environments on Amazon AWS EC2 instances. Very good understanding Cassandra cluster mechanism that includes replication strategies, snitch, gossip, consistent hashing and consistency levels. Experienced in using the spark application master to monitor the spark jobs and capture the logs for the spark jobs. Implemented Spark using Scala and Spark SQL for faster testing and processing of data. Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data. Involved in performing the analytics and visualization for the data from the logs and estimate the error rate and study the probability of future errors using regressing models. Used WEB HDFS REST API to make the HTTP GET, PUT, POST and DELETE requests from the webserver to perform analytics on the data lake. Used Kafka to patch up a customer activity taking after pipeline as a course of action of steady appropriate subscribe suppor

Confidential

SPARK/HADOOP DEVELOPER

Responsibilities:

Responsibilities Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop. Responsible for architecting Hadoop clusters Translation of functional and technical requirements into detailed architecture and design. Worked on analyzing Hadoop cluster and different big data analytical and processing tools including Pig, Hive, Spark, and Spark Streaming. Analyzed large amounts of data sets to determine optimal way to aggregate and report on it. Migrating various Hive UDF's and queries into Spark SQL for faster requests. Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala. Hands on experience in Spark and Spark Streaming creating RDD's, applying operations - Transformation and Actions. Experienced Scheduling jobs using Control-M.
Developed and implemented hive custom UDFs involving date functions. Used sqoop to import data from Oracle to Hadoop. Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop. Experienced in developing scripts for doing transformations using Scala. Involved in developing Shell scripts to orchestrate execution of all other scripts and move the data files within and outside of HDFS. Installed and configured Hive, Pig, Sqoop and Oozie on the Hadoop cluster. Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability. Used Tableau for generating reports on weekly basis to the customer. Analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop. Implemented Kerberos Security Authentication protocol for existing clusterEnvironment Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala,, Microsoft Azure, Azure Resource Manager, Yarn

Confidential

Hadoop Developer

Responsibilities:

Responsibilities Architecting, managing and delivering the technical projects/products for various business groups. All the data was loaded from our relational DBs to HIVE using Sqoop. We were getting four flat files from different vendors. These were all in different formats e.g. text, EDI and XML formats. Architected all the ETL data loads coming in from the source system and loading into the data warehouse Created Hive External tables to stage data and then move the data from Staging to main tables Implemented Installation and configuration of multi - node cluster on Cloud using Google cloud platform.
Experienced in working with Apache Storm. Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed. intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making. Experience in data cleansing and data mining. Worked on tools Flume, Storm and Spark. Proof-of-concept to determine feasibility and product evaluation of Big Data products Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data. Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
Worked on configuring and managing disaster recovery and backup on Cassandra Data. Developed Spark jobs to transform the data in HDFS. Analyzed large amounts of data sets to determine optimal way to aggregate and report on it. Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard Involved in developing Map-reduce framework, writing queries scheduling map-reduce Developed the code for Importing and exporting data into HDFS and Hive using Sqoop Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files. Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship