We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Austin, TX

SUMMARY:

  • Technically accomplished professional with 8+ years of total experience in Software Development and Requirement Analysis in Agile work environment and 5 years of Big Data Eco Systems experience in ingestion, storage, querying, processing and analysis of Big Data.
  • In - depth knowledge and hands-on experience in dealing with Apache Hadoop components like HDFS, MapReduce, HiveQL, HBase, Pig, Hive, Sqoop, Oozie, Cassandra, Flume, and Spark.
  • Very good understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondary Namenode, and MapReduce concepts
  • Extensively worked on MRV1 and MRV2 Hadoop architectures.
  • Hands on experience in writing MapReduce programs, Pig & Hive scripts.
  • Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
  • Knowledge on RabbitMQ.
  • Knowledge on Hortonworks Distribution.
  • Loaded streaming log data from various webservers into HDFS using Flume.
  • Experience in building Pig scripts to extract, transform and load data onto HDFS for processing.
  • Excellent knowledge of data mapping, extract, transform and load from different data source.
  • Experience in writing HiveQL queries to store processed data into Hive tables for analysis.
  • Excellent understanding and knowledge of NOSQL databases like HBase and Cassandra.
  • Expertise in database design, creation and management of schemas, writing Stored Procedures, Functions, DDL, DML, SQL queries & Modeling.
  • 4 Years of IT experience in ETL Architecture, Development, enhancement, maintenance, Production support, Data Modeling, Data profiling, Reporting including Business requirement, system requirement gathering.
  • 2 years of hands-on experience in shell scripting.
  • Knowledge on cloud services Amazon web services (AWS) and Azure.
  • Knowledge on Elastic MapReduce(EMR)
  • Proficient in using RDMS concepts with Oracle, SQL Server and MySQL.
  • Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Worked extensively with CDH3, CDH4.
  • Skilled in leadership, self-motivated and ability to work in a team effectively
  • Possess excellent communication and analytical skills along with a can-do attitude.
  • Strong work ethics with desire to succeed and make significant contributions to the organization

TECHNICAL SKILLS:

Hadoop/Big Data Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Hbase, Oozie, Zookeeper, Apache Kafka, Cassandra, StreamSets, Impyla, Solr

Programming Languages: Java … (JDK 5/JDK 6), C, HTML, SQL, PL/SQL, Python, Scala Client Technologies JQUERY, Java Script, AJAX, CSS, HTML 5, XHTML, D3, Angular JS Operating Systems UNIX, WINDOWS, LINUX

Application Servers: IBM Web sphere, Tomcat, Web Logic, Web Sphere

Web technologies: JSP, Servlets, JNDI, JDBC, Java Beans, JavaScript, Web Services (JAX-WS) Databases Oracle 8i/9i/10g & MySQL 4.x/5.x

Java IDE: Eclipse 3.x, IBM Web Sphere Application Developer, IBM RAD 7.0

WORK EXPERIENCE:

Confidential, Austin, TX

Hadoop Developer

Responsibilities:

  • Developed Pyspark code to read data from Hive, group the fields and generate XML files
  • Enhanced the Pyspark code to write the generated XML files to a directory to zip them to CDAs
  • Implemented REST call to submit the generated CDAs to vendor website
  • Implemented Impyla to support JDBC/ODBC connections for Hiveserver2
  • Enhanced the Pyspark code to replace spark with Impyla.
  • Performed installation for Impyla on the Edge node
  • Evaluated performance of Spark application by testing on cluster deployment mode vs local mode
  • Experimented submissions with Test OIDs to the vendor website
  • Explored StreamSet Data collector
  • Implemented StreamSets data collector tool for ingestion into Hadoop.
  • Created a StreamSet pipeline to parse the file in XML format and convert to a format that is fed to Solr
  • Built a data validation dashboard in Solr to be able to display the message record.
  • Wrote shell script to run Sqoop job for bulk data ingestion from Oracle into Hive
  • Created tables for the ingested data in Hive
  • Scheduled Oozie job for data ingestion for the Sqoop job
  • Worked with JSON file format for StreamSets
  • Implemented POC with Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS.
  • Transferred the data using Informatica tool from AWS S3 to AWS Redshift
  • Leverage the AWS platform for Big Data Analytics - MapReduce Algorithms, Ad-hoc Hive querying, and Sqoop for pulling in structured data from SQL Server to Hive Warehouse.

Environment: Sqoop, StreamSets, Impyla, Pyspark, Solr, Oozie, Hive, Impala, Informatica, AWS

Confidential, Westlake, TX

Hadoop Developer

Responsibilities:

  • Evaluated Spark's performance vs Impala on transactional data.
  • Used Spark transformations and aggregations using Python and Scala to perform min, max and average on transactional data.
  • Experienced in migrating data from HiveQL to SparkSQL using Scala.
  • Knowledge in using Spark Data-frames to load data in Spark Data-frames.
  • Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment.
  • Used java to develop Restful API for database Utility Project.
  • Responsible for performing extensive data validation using Hive.
  • Designed a data model in Cassandra(POC) for storing server performance data.
  • Implemented a Data service as a rest API project to retrieve server utilization data from this Cassandra Table.
  • Implemented Python script to call the Cassandra Rest API, performed transformations and loaded the data into Hive.
  • Designed data model to ingest transactional data with and without URIs into Cassandra.
  • Implemented shell script to call python script to perform min, max and average on utilization data of 1000s hosts and compared the performance on various levels of summarization.
  • Involved in creating Oozie workflow and Coordinator jobs for Hive jobs to kick off the jobs on time for data availability.
  • Generated reports from this hive table for visualization purpose.
  • Migrated HiveQL to SparkSQL to validate Spark's performance with Hive's.
  • Implemented Proof of concept for Dynamo DB, Redshift and EMR
  • Proactively researched on Microsoft Azure.
  • Presented Demo on Microsoft Azure, an overview of cloud computing with Azure.

Environment: Hadoop, Azure, AWS, HDFS, Hive, Hue, Oozie, Java, Linux, Cassandra, Python, Open TSDB, Scala

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

  • Worked on analyzing, writing Hadoop MapReduce jobs using JavaAPI, Pig and Hive.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Analyze large and critical datasets using Cloudera, HDFS, Hbase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper, & Spark.
  • Developed custom aggregate functions using Spark SQL and performed interactive querying.
  • Used Pig to store the data into HBase.
  • Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using HiveQL.
  • Used Pig to parse the data and Store in Avro format.
  • Stored the data in tabular formats using Hive tables and Hive SerDes.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data e
  • Implemented a script to transmit information from Oracle to Hbase using Sqoop.
  • Worked on tuning the performance Pig queries.
  • Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Implemented MapReduce programs to handle semi/unstructured data like XML, JSON, and sequence files for log files.
  • Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Environment: Hadoop, HDFS, Pig, Sqoop, Spark, MapReduce, Cloudera, Snappy, Zookeeper, NoSQL, HBase, Shell Scripting, Ubuntu, Linux Red Hat.

Confidential

Hadoop Developer

Responsibilities:

  • Worked on writing transformer/mapping Map-Reduce pipelines using Java.
  • Handling structured and unstructured data and applying ETL processes.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
  • Designed and implemented Incremental Imports into Hive tables.
  • Worked in Loading and transforming large sets of structured, semi structured and unstructured data.
  • Extensively used Pig for data cleansing.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database system and vice-versa. Loading data into HDFS.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Facilitated the Production move ups of ETL components from Acceptance to Production environment
  • Experienced in managing and reviewing the Hadoop log files.
  • Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Worked with Avro Data Serialization system to work with JSON data formats.
  • Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
  • Developed scripts and automated data management from end to end and sync up between all the clusters.
  • Involved in Setup and benchmark of Hadoop /HBase clusters for internal use.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing pig Scripts.

Environment: Hadoop, Big Data, HDFS, Map Reduce, Sqoop, Oozie, Pig, Hive, Hbase, Flume, LINUX, Java, Eclipse, Cassandra, Hadoop Distribution of Cloudera., PL/SQL, Windows, UNIX Shell Scripting, and Eclipse

Confidential, Atlanta, GA

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Written multiple Map Reduce programs in Java for Data Analysis.
  • Wrote Map Reduce job using Pig Latin and Java API.
  • Performed performance tuning and troubleshooting of Map Reduce jobs by analyzing and reviewing Hadoop log files.
  • Developed pig scripts for analyzing large data sets in the HDFS.
  • Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
  • Designed and presented plan for POC on impala.
  • Experienced in migrating Hive QL into Impala to minimize query response time.
  • Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment.
  • Implemented Avro and parquet data formats for apache Hive computations to handle custom business requirements.
  • Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Implemented Daily jobs that automate parallel tasks of loading the data into HDFS using autosys and Oozie coordinator jobs.
  • Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
  • Responsible for performing extensive data validation using Hive.
  • Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
  • Used Kafka to load data in to HDFS and move data into NoSQL databases(Cassandra)
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Involved in submitting and tracking Map Reduce jobs using Job Tracker.
  • Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
  • Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
  • Responsible for cleansing the data from source systems using Ab Initio components such as Join, Dedup Sorted, De normalize, Normalize, Reformat, Filter-by-Expression, Rollup.
  • Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources.
  • Implemented Hive Generic UDF's to implement business logic.
  • Implemented test scripts to support test driven development and continuous integration.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings. Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Java, Linux, Teradata, Zookeeper, autosys, Hbase, Cassandra, Apache ignite
Confidential, New York, NY

Hadoop Developer

Responsibilities:

  • Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology.
  • Worked on analysing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce.
  • Developed data pipeline using Flume, Sqoop to ingest customer behavioral data and purchase histories into HDFS for analysis.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Used Pig to perform data validation on the data ingested using scoop and flume and the cleansed data set is pushed into Hbase.
  • Participated in development/implementation of Cloudera Hadoop environment.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked with Zookeeper, Oozie, and Data Pipeline Operational Services for coordinating the cluster and scheduling workflows.
  • Designed and built the Reporting Application, which uses the Spark SQL to fetch and generate reports on HBase table data.
  • Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into HBase.
  • Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Involved in running MapReduce jobs for processing millions of records.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
  • Developed Hive queries and Pig scripts to analyze large datasets.
  • Involved in importing and exporting the data from RDBMS to HDFS and vice versa using Sqoop.
  • Involved in generating the Adhoc reports using Pig and Hive queries.
  • Used Hive to analyze data ingested into Hbase by using Hive-Hbase integration and compute various metrics for reporting on the dashboard.
  • Provide operational support for Hadoop and/or MySQL databases
  • Developed job flows in Oozie to automate the workflow for pig and hive jobs.
  • Loaded the aggregated data onto Oracle from Hadoop environment using Sqoop for reporting on the dashboard.

Environment: RedHat Linux, HDFS, Map-Reduce, Hive, Java JDK1.6, Pig, Sqoop, Flume, Zookeeper, Oozie, Oracle, HBase.

Confidential

Java Developer

Responsibilities:

  • Involving in Analysis, Design, Implementation and Bug Fixing Activities.
  • Involving in Functional & Technical Specification documents review.
  • Created and configured domains in production, development and testing environments using configuration wizard.
  • Involved in creating and configuring the clusters in production environment and deploying the applications on clusters.
  • Deployed and tested the application using Tomcat web server.
  • Analysis of the specifications provided by the clients.
  • Involved to Design of the Application.
  • Ability to understand Functional Requirements and Design Documents.
  • Developed Use Case Diagrams, Class Diagrams, Sequence Diagram, Data Flow Diagram
  • Coordinated with other functional consultants.
  • Web related development with JSP, AJAX, HTML, XML, XSLT, and CSS.
  • Create and enhance the stored procedures, PL/SQL, SQL for Oracle 9i RDBMS.
  • Designed and implemented a generic parser framework using SAX parser to parse XML documents which stores SQL.
  • Deployed the application on WebLogic Application Server 9.0.
  • Extensively used UNIX /FTP for shell Scripting and pulling the Logs from the Server.
  • Provided further Maintenance and support, this involves working with the Client and solving their problems which include major Bug fixing.

Environment: Java 1.4, Web logic Server 9.0, Oracle 10g, Web services Monitoring, Web Drive, UNIX/LINUX, Web Logic Server, JavaScript, HTML, CSS, XML

Hire Now