We provide IT Staff Augmentation Services!

Hadoop Admin/developer Resume

2.00/5 (Submit Your Rating)

New, YorK

PROFESSIONAL SUMMARY:

  • Over 3 years of IT experience as a Developer, Designer & quality Tester with cross platform integration experience using Hadoop Ecosystem, Java and Software Functional Testing.
  • Hands on experience in installing, configuring and using Hadoop Ecosystem - HDFS, MapReduce, Pig, Hive, Oozie, Flume, Hbase, Spark, Sqoop, Flume and Oozie.
  • Hands on experience using Cloudera and Hortonworks Hadoop Distributions.
  • Strong understanding of various Hadoop services, MapReduce and YARN architecture.
  • Responsible for writing Map Reduce programs.
  • Experienced in importing-exporting data into HDFS using SQOOP.
  • Experience loading data to Hive partitions and creating buckets in Hive.
  • Developed Map Reduce jobs to automate transfer the data from HBase.
  • Expertise in analysis using PIG, HIVEand MapReduce.
  • Experienced in developing UDFs for Hive,PIG using Java.
  • Strong understanding of NoSQL databases like HBase, MongoDB & Cassandra.
  • Scheduling all hadoop/hive/sqoop/Hbase jobs using Oozie.
  • Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
  • Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
  • Experience in defining detailed application software test plans, including organization, participant, schedule, test and application coverage scope.
  • Experience in gathering and defining functional and user interface requirements for software applications.
  • Experience in real time analytics with Apache Spark (RDD, DataFrames and Streaming API).
  • Used Spark DataFrames API over Cloudera platform to perform analytics on Hive data.
  • Developed multiple MapReduce jobs to perform data cleaning and preprocessing.
  • Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design.
  • Excellent communications skills possess strong problem solving, analytical, time management skills.
  • Experience analyzing and resolving performance, scalability and reliability issues.
  • Having experience in developing a data pipeline using Kafka to store data into HDFS. Good Experience on SDLC (Software Development Life cycle).

TECHNICAL SKILLS:

Hadoop/Big Data: Hadoop, Map Reduce, HDFS, Zookeeper, Kafka, Hive, Pig, Sqoop, Oozie, Flume, Yarn, HBase, Spark with Scala.

No SQL Databases: Hbase,Cassandra, mongoDB

Languages: C, C++, Java, J2EE, PL/SQL, Pig Latin, HiveQL, UNIX shell scripts

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL

Frameworks: MVC, Struts, Spring, Hibernate

Operating Systems: Red Hat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata

Tools and IDE: Eclipse, Intellij.

PROFESSIONAL EXPERIENCE:

Confidential, New York, New York

Hadoop Admin/Developer

Responsibilities

  • Developed the services to run the Map-Reduce jobs as per the requirement basis.
  • Importing and exporting data into HDFS and HIVE, PIG using Sqoop.
  • Responsible to manage data coming from different sources.
  • Built data pipeline using Pig and Java/Scala Map Reduce to store onto HDFS.
  • Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, sqoop, package and mysql.existing algorithms in Hadoop Loading Data into HBase using Bulk Load and Non-bulk load.
  • Developed scripts and automated data management from end to end and sync up b/w all the clusters.
  • Involved in gathering the requirements, designing, development and testing.
  • Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform, NIFI
  • Set up Hortonworks Infrastructure from configuring clusters to Node
  • Installed Ambari server on the clouds
  • Ability to work with onsite and offshore team members.
  • Able to work on own initiative, highly proactive, self-motivated commitment towards work and resourceful.
  • Strong debugging and critical thinking ability with good understanding of frameworks advancement in methodologies and strategies.
  • Performance tune and manage growth of the O/S, disk usage, and network traffic
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from LINUX file system to HDFS.
  • Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products
  • Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
  • Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on tuning the performance of Mapreduce Jobs.
  • Responsible to manage data coming from different sources.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Experience in managing and reviewing Hadoop log files.
  • Job management using Fair scheduler.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Using PIG predefined functions to convert the fixed width file to delimited file.
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig job
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Created Oozie workflows to run multiple MR, Hive and pig jobs.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Develop Spark code using Scala and Spark-SQL for faster testing and data processing
  • Involved in the development of Spark Streaming application for one of the data source using Scala, Spark by applying the transformations.
  • Import the data from different sources like HDFS/MYSQL into SparkRDD.
  • Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
  • Responsible for design development of Spark SQL Scripts based on Functional Specifications.
  • Responsible for Spark Streaming configuration based on type of Input Source

Environment Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Apache Yarn.

Confidential, New York

Hadoop Admin/ Developer

Responsibilities

  • Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
  • Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
  • Performed S3 buckets creation, policies and also on the IAM role based polices and customizing the JSON template.
  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
  • Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management.
  • Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Worked closely with the data modellers to model the new incoming data sets.
  • Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution.
  • Installed Hadoop, Map Reduce, HDFS, and AWS and developed multiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
  • Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and Hbase.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Load the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Import the data from different sources like HDFS/Hbase into Spark RDD.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Performed real time analysis on the incoming data.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.

Environment Apache Hadoop, HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.

Confidential, New York

Hadoop Developer

Responsibilities

  • Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure
  • Cluster maintenance as well as creation and removal of nodes.
  • Evaluation of Hadoop infrastructure requirements and design/deploy solutions (high availability, big data clusters.
  • Cluster Monitoring and Troubleshooting Hadoop issues
  • Manage and review Hadoop log files
  • Works with application teams to install operating system and Hadoop updates, patches, version upgrades as required
  • Created NRF documents which explains the flow of the architecture, which measure the performance, security, memory usage, dependency.
  • Setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
  • Help maintain and troubleshoot UNIX and Linux environment.
  • Experience analyzing and evaluating system security threats and safeguards.
  • Experience in Importing and exporting data into HDFS and Hive using Sqoop.
  • Developed Pig program for loading and filtering the streaming data into HDFS using Flume.
  • Experienced in handling data from different data sets, join them and preprocess using Pig join operations.
  • Developed Map-Reduce programs to clean and aggregate the data
  • Developed HBase data model on top of HDFS data to perform real time analytics using Java API.
  • Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
  • Imported and exported data from Teradata to HDFS and vice-versa.
  • Strong understanding of Hadoop eco system such as HDFS, MapReduce, HBase, Zookeeper, Pig, Hadoop streaming, Sqoop, Oozie and Hive
  • Implement counters on HBase data to count total records on different tables.
  • Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
  • Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
  • We used Amazon Web Services to perform big data analytics.
  • Implemented Secondary sorting to sort reducer output globally in map reduce.
  • Implemented data pipeline by chaining multiple mappers by using Chained Mapper.
  • Created Hive Dynamic partitions to load time series data
  • Experienced in handling different types of joins in Hive like Map joins, bucker map joins, sorted bucket map joins.
  • Created tables, partitions, buckets and perform analytics using Hive ad-hoc queries.
  • Experienced import/export data into HDFS/Hive from relational data base and Tera data using Sqoop.
  • Handling continuous streaming data comes from different sources using flume and set destination as HDFS.
  • Integrated spring schedulers with Oozie client as beans to handle cron jobs.
  • Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters
  • Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
  • Worked on spring framework for multi-threading.

Environment: Hadoop, HDFS, Map Reduce, Hive, Pig,Sqoop, RDBMS/DB, Flat files, Teradata, Mysql, CSV, Avro data files. JAVA, J2EE.

We'd love your feedback!