We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Pittsburg, PA

SUMMARY

  • 8 years of experience in Application Development using Hadoop and related BigData technologies such as HBASE, HIVE, PIG, FLUME, OOZIE, SQOOP, and ZOOKEEPER.
  • More than 4+years of work experience in ingestion, storage, querying, processing and analysis of BigDatawith hands on experience in Hadoop Ecosystem development including Mapreduce, HDFS, Hive, Pig, Spark, ClouderaNavigator, Mahout, HBase, ZooKeeper, Sqoop, Flume, Oozie and AWS.
  • In - depth Knowledge of Data Structures, Design and Analysis of Algorithms and having good understanding of Data Mining and Machine Learning techniques.
  • Excellent knowledge on Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node.
  • Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, and Cassandra.
  • Strong experience working with different Hadoop distributions like Cloudera, Hortonworks, MapR and Apache distributions.
  • Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
  • Hands on expertise in ETL tools for data integration on big data and also importing and exporting the data from Relational Data Base Systems (RDBMS) to HDFS using SQOOP.
  • Experienced in developing udf's for Pig and Hive using Java to extend the core functionality.
  • Have good experience on data cleansing and analysis using Hive, Pig as well as HadoopJavaAPI.
  • Good knowledge on setting up job streaming and scheduling with Oozie, and working on messaging system such as Kafka integrated with Zookeeper.
  • Experience in AWS services such as EMR, EC2, S3,CloudFormation, RedShift which provides fast and efficient processing of Big Data.
  • Proficient in design and development of MapReduce Programs using ApacheHadoop for analysing the big data as per the requirement.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like HDFS, MapReduce, HBase, Zookeeper, Oozie, Hive, Sqoop, Pig, and Flume.
  • Skilled in writing Map Reduce jobs in Pig and Hive.
  • Knowledge in managing and reviewing Hadoop Log files.
  • Expertise in wide array of tools in the Big Data Stack such as Hadoop, Pig, Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper.
  • Knowledge in Streaming the Data to HDFS using Flume.
  • Excellent programming skills with experience in Java, C, SQL and Python Programming.
  • In depth and extensive knowledge of analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
  • Experience in tuning and troubleshooting performance issues in Hadoop cluster.
  • Worked on importing data into HBaseusingHBase Shell and HBase Client API.
  • Hands on experience in using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Extensive experience working on various databases and database script development using SQL and PL/SQL
  • Strong experience and knowledge of real time data analytics using Storm, Kafka, Flume and Spark.
  • Hands on experience in application development using Java, RDBMS and Linux Shell Scripting.
  • Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSF, Struts, Spring, Hibernate, EJB.
  • Expertise in developing responsive Front End components with JSP, HTML, XHTML, JavaScript, DOM, Servlets, JSF, NodeJS, Ajax, JQuery and AngularJS.
  • Knowledge in writing live Real-time Processing using Spark Streaming with Kafka.
  • Involved in HBase setup and storing data into HBase, which will be used for further analysis.
  • Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive Sere like JSON and Avro.
  • Worked on various Tools and IDEs like Eclipse, IBM Rational, Visio, Apache Ant-Build Tool, MS-Office, PLSQL Developer, SQL*Plus.
  • Supported MapReduce Programs running on the cluster and wrote custom MapReduce Scripts for Data Processing in Java.

TECHNICAL SKILLS

Hadoop Ecosystem: HDFS, MapReduce, HIVE, PIG, Sqoop, Flume, Oozie, Kafka, Zookeeper, Avro, Spark, Storm.

Programming Languages: C, C#, Java, SQL, Scala, PL/SQL, Python, Linux shell scripts

Web Technologies: HTML, XML, JDBC, JSP, JavaScript, AJAX, SOAP

Web Servers: Web Logic, Web Sphere, Apache Tomcat

Tools: Used: Eclipse, IntelliJ, GIT, Putty, WinSCP Ant, Maven, Gradle

Database: Oracle 11g/10g,DB2, Teradata, Vertica, MySQL.

NoSQL Databases: MongoDB, Cassandra, HBase, DynamoDB.

Testing: Hadoop MRUNIT Testing, Quality Centre, Hive Testing

Frameworks: MVC, Struts 2/1, Hibernate 3, Spring 3/2.5/2

Monitoring and Reporting tools: Ganglia, Nagios, Custom Shell scripts, Tableau.

ETL Tools: Informatica, Pentaho, Tableau, Talend.

Operating System: Ubuntu (Linux), Windows, RedHat

PROFESSIONAL EXPERIENCE

Confidential, Pittsburg, PA

Hadoop Developer

Responsibilities:

  • Worked with Business Analyst and helped representing the business domain details.
  • Hands on experience in gathering information from different nodes into Greenplum database and then Sqoop incremental load into HDFS.
  • Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job.
  • Involved in loading data from LINUX file system to HDFS
  • Experience in Writing Map Reduce jobs for text mining and worked with predictive analysis team to check the output and requirement.
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, MapReduce and then loading data into HDFS.
  • Hands on experience in writing hive UDF's for the requirements and to handle different schema’s and xml data.
  • Evaluated usage of Oozie for Workflow Orchestration.
  • Evaluated performance of SparkSQL vs IMPALA vs DRILL on offline data as a part of poc.
  • Used Pig as ETL tool to do transformations, event joins, filter both traffic and some pre-aggregations before storing the data onto HDFS .
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Wrote Hive and Pigscripts for joining the raw data with the lookup data and for some aggregative operations as per the business requirement.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Converted unstructured data to structured data by writing Spark code.
  • Involved in writing Flume and Hive scripts to extract, transform and load the data into Database
  • Implemented Partitioning and bucketing in Hive based on the requirement.
  • Connected Tableau from client end with AWSip addresses and view the end results.
  • Coordinator and Oozie workflows are developed to automate Hive, MapReduce, Pig and other jobs.
  • Creation of test cases as part of enhancement rollouts and Involved in Unit level and Integration level testing.
  • Develop database management systems for easy access, storage and retrieval of data.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • • Implemented AJAX, JSON, and Java script to create interactive web screens.
  • Extracted large volumes of data feed from different data sources, performed transformations and loaded the data into various Targets.
  • Scheduled Oozie workflow engine to run multiple Hive and Pigjobs, which independently run with time and data availability.
  • Assisted application teams in installing Hadoop updates, operating system, patches and version upgrades when required
  • Used GIT for version control.
  • Hands on experience in working with snappy compression and also different file formats.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports For the BI team.

Environment: Hadoop2.7.2, Map Reducer, Cloudera Manager, HDFS, Hive0.10, Pig0.16, Sqoop1.4.5, Spark2.x, Oozie, Impala, Greenplum, Kafka, SQL, Java (jdk 1.6), Eclipse.

Confidential

Hadoop Developer

Responsibilities:

  • Worked on analyzing, writing HadoopMapReduce jobs using JavaAPI, Pig and Hive .
  • Responsible for building scalable distributed data solutions using Hadoop .
  • Involved in loading data from LINUX file system, servers, Java web services using KafkaProducers, partitions.
  • Worked on analysing Hadoop cluster and different big data analytic tools including MapReduce, Hive and Spark
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked with using different kind of compression techniques to save data and optimize data transfer over network using LZO, Snappy, Bzip etc.
  • Analyse large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, HiveUDF, Pig, Sqoop, Zookeeper, & Spark .
  • Developed custom aggregate functions using SparkSQL and performed interactive querying.
  • Used Scoop to store the data into HBase and Hive .
  • Worked on installing cluster, commissioning & decommissioning of DataNode, NameNode high availability, capacity planning, and slots configuration.
  • Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala.
  • Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using HiveQL .
  • Used Pig to parse the data and Store in Avro format.
  • Stored the data in tabular formats using Hive tables and Hive Serdes .
  • Collecting and aggregating large amounts of log data using ApacheFlume and staging data in HDFS for further analysis.
  • Worked with NoSQL databases like HBase for creating HBase tables to load large sets of semi structured data coming from various sources.
  • Implemented a script to transmit information from Oracle to HBase using Sqoop .
  • Implemented MapReduce programs to handle semi/unstructured data like XML, JSON, and sequence files for log files.
  • Importing and exporting data into HDFS from Oracle10.2 database and vice versa using SQOOP
  • Developed a working prototype for real time data ingestion and processing using Kafka, Spark Streaming, and HBase.
  • Fine-tuned Pigqueries for better performance.
  • Involved in writing the shellscripts for exporting log files to Hadoop cluster through automated process.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Environment: Hadoop2.6.3, MapReduce, HDFS, Yarn, Sqoop1.4.3, Oozie, Pig0.11, Hive3.x, HBase0.98, Spark2.x, Java, Eclipse, UNIX shell scripting, python3.5.1, Hortonworks.

Confidential, Sunnyvale, CA

Hadoop Developer

Responsibilities:

  • Worked closely with the business analysts to convert the Business Requirements into Technical Requirements and prepared low and high-level documentation.
  • Played a key role in discussing about the requirements, analysis of the entire system along with estimation, development and testing accordingly keeping BI requirements as a note.
  • Involved in developing and installations of Sqoop, Hive and FTP integrations to down systems.
  • Performed importing data from various sources to the Cassandra cluster using Sqoop.
  • Did a Sqoop job to help the import of data which is in different formats such as Avro, XML obtained from different vendors.
  • Involved in writing MapReduce java scripts to process the data from HBase tables.
  • Developed UNIX shell scripts for the business process and assimilation of data from different interfaces.
  • Developed Sqoop scripts for writing the processed data into HBase tables which helps BI team for the data visualization.
  • Established an Oozie component to implement a job scheduler which should occur on a daily basis.
  • Also involved in developing a pipe line to load the data into tables using Spark streaming and Kafka which is integrated with ZooKeeper.
  • Involved in developing Sqoop scripts that load the data from different interfaces to HDFS.
  • Developed Scala code for reading multiple data formats on HDFS.
  • Worked on debugging and performance tuning on Map Reduce, HIVE and Sqoop jobs.
  • Involved in diagnosing different possible ways to optimize and improve the efficiency of the system.
  • Developed multiple POC's using Scala which are deployed on the cluster in turn compared the performance of Spark with Map Reduce.
  • Developed Spark code using Scala for generating Spark-RDD seeds for faster transformations.
  • Involved in creating and maintaining of the technical documentation for the Map Reduce, Hive, Sqoop, UNIX jobs along with Hadoop clusters and also reviewing them to fix the post production issues.

Environment: Red Hat Enterprise Linux, HBase, Solr, Kafka, Map Reduce, Hive, Java SDK, Python, DB2, SQOOP, Spark, Scala, SBT, Akka, Maven, Solr, Github.

Hire Now