We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

San Francisco, CA

SUMMARY

  • Over 4+ years of Professional experience in IT Industry, involved in developing, Implementing, configuring, testing Hadoop ecosystem components and maintenance of various web based applications using Java.
  • Experience in designing and implementing complete end - to-end Hadoop Infrastructure using MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Spark, Oozie and Zookeeper.
  • Expert Hands-on in Installing, Configuring, Testing Hadoop Ecosystem components
  • Experience on Hadoop clusters using major Hadoop Distributions - Cloudera and Hortonworks (HDP).
  • Familiar and good exposure with Apache Spark ecosystem such as Shark, Spark Streaming using Scala and Python
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
  • Experience in working with MapReduce programs using Hadoop for working with Big Data.
  • Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa
  • Collecting and aggregating large amount of Log Data using Apache Flume and storing data in HDFS for further analysis.
  • Job/workflow scheduling and monitoring tools like Oozie.
  • Experience in designing both time driven and data driven automated workflows using Oozie.
  • Experience in different layers of Hadoop Framework - Storage (HDFS), Analysis (Pig and Hive), Engineering (Jobs and Workflows).
  • Good understanding of NoSQL databases and hands on experience with HBase.
  • Experienced in loading dataset into Hive for ETL (Extract, Transfer and Load) operation.
  • Working knowledge in SQL, PL/SQL, Stored Procedures, Functions, Packages, DB Triggers, Indexes, SQL* Loader.
  • Experience in Amazon AWS cloud services (EC2, EBS, S3).
  • Experience in creating web-based applications using Java, J2EE, JSP and Servlets.
  • Experienced in using Integrated Development environments like Eclipse, NetBeans, Kate and gEdit .
  • Excellent communication skills, interpersonal skills, problem solving skills a very good team player along with a can do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.

TECHNICAL SKILLS

Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Oozie, Zookeeper, Spark, Shark.

Hadoop platforms: Cloudera, Horton Works

Languages: Java, Scala

Web Technologies: SOAP, REST

Scripting Language: UNIX Shell Script, K Shell

RDBMS DB: MS SQL, MySQL, Oracle

NoSQL Technologies: Hbase, MongoDB, Cassandra

Tools: & Utilities: Eclipse, Visual Studio, Net Beans, SVN, GitHub, Maven

Operating Systems: Windows 7/8, Vista, Windows XP, Linux (Ubuntu, Red hat)

PROFESSIONAL EXPERIENCE

Confidential, San Francisco, CA

Hadoop Developer

Responsibilities:

  • Involved in Analysis, Design, System architectural design, Process interfaces design, design documentation.
  • Loading the data from multiple Data sources like (SQL, DB2, and Oracle) into HDFS using Sqoop and load into Hive tables.
  • Developed various Big Data workflows using custom MapReduce, Pig, Hive and Sqoop.
  • Creating Oozie workflows and coordinator jobs for recurrent triggering of Hadoop jobs such as Java map-reduce, Pig, Hive, Sqoop as well as system specific jobs (such as Java programs and shell scripts) by time (frequency) and data availability.
  • Developed Spark Streaming Jobs in Scala to consume data from Kafka Topics, made transformations on data and inserted to HBase.
  • Used Spark for fast and general processing engine compatible with Hadoopdata.
  • Used Spark to design and perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.
  • Setup of HADOOP Cluster on AWS, which includes configuring different components of HADOOP.
  • Analyzed large data sets by running Hive queries, and Pig scripts.
  • Implemented Pig Latin scripts using operators such as LOAD,STORE,DUMP, FILTER, DISTINCT, FOREACH, GENERATE,GROUP, COGROUP, ORDER, LIMIT, AND UNION,JOINS,SPLIT,AGGFUNCTIONS.
  • Cascade Jobs introduced to make the data Analysis more efficient as per the requirement.
  • Built re-usable Hive UDF libraries which enabled various business analysts to use these UDF’s in Hive querying.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Applied MapReduce framework jobs in java for data processing by installing and configuring Hadoop and HDFS.
  • Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
  • Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a MapReduce way.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase, NoSQL database and Sqoop.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Involved in creating Hive tables , and loading and analyzing data using hive queries.
  • Used FLUME to export the application server logs into HDFS.

Environment: Hadoop, HDFS, Sqoop, Hive, Pig, MapReduce, Spark, Scala, Kafka, AWS, HBase, MongoDB, Cassandra, NoSQL, Flume and Windows.

Confidential, Overland Park, KS

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Installed and configured Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Developed Simple to complex Map/Reduce Jobs using Hive and Pig.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Performed processing on large sets of structured, unstructured and semi structured data.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from Oracle into HDFS using Sqoop.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Used Pig UDF's to implement business logic in Hadoop.
  • Implemented business logic by writing UDFs in Java and used various UDFs.
  • Responsible to migrate from Hadoop to Spark frameworks, in-memory distributed computing for real time fraud detection.
  • Used Spark to store data in-memory.
  • Implemented batch processing of data sources using Apache Spark.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig and HiveQL.
  • Developed the Pig UDF’S to pre-process the data for analysis.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Cluster co-ordination services through ZooKeeper.
  • As Part of POC setup Amazon web services (AWS) to check whether Hadoop is a feasible solution or not.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Apache Sqoop, Spark, Oozie, HBase, AWS, PL/SQL, MySQL and Windows.

Confidential, Denver, CO

Hadoop Developer

Responsibilities:

  • Developed big data analytic models for customer fraud transaction pattern detection models using Hive from customer transaction data. It also involved transaction sequence analysis with gaps and no gaps, network analysis between common customers for the top fraud patterns.
  • Developed customer transaction event path tree extraction model using Hive from customer transaction data.
  • Enhanced and optimized the customer path tree GUI viewer to incrementally load the tree data from HBase, NoSQL database.
  • Used Prefuse open source java framework for the GUI.
  • Developed the Pig UDF’S to pre-process the data for analysis.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig and HiveQL.
  • Created Hive tables to store data and written Hive queries.
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Exported the patterns analyzed back to Teradata using Sqoop.
  • Involved in Installing, Configuring Hadoop Eco System, and Cloudera Manager using CDH4 Distribution.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Design and implement Map/Reduce jobs to support distributed data processing.
  • Process large data sets utilizing our Hadoop cluster.
  • Designing NoSQL schemas in Hbase.
  • Developing map-reduce ETL in Java/Pig.
  • Extensive data validation using HIVE.
  • Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
  • Involved in weekly walkthroughs and inspection meetings, to verify the status of the testing efforts and the project as a whole.

Environment: Hadoop Map Reduce, Pig Latin, Zookeeper, Oozie, Sqoop, Java, Hive, Hbase, UNIX Shell Scripting.

Confidential, Boca Raton, FL

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Hive, Spark, Sqoop, flume, Oozie.
  • Involved in importing and exporting data (SQL Server, XML, csv and text file) from local and external file system and RDBMS to HDFS.
  • Worked extensively with HIVE DDLs and Hive Query language (HQL).
  • Designed a data warehouse using Hive, created and managed Hive tables in Hadoop.
  • Implemented SCD (Slowly Changing Dimension) concepts in Hive.
  • Involved in ETL Data Cleansing, Integration and Transformation using Hive and PySpark.
  • Responsible of managing data from disparate sources.
  • Tuned performance in Hive using Partitions, Bucketing, Indexes and Parallism concepts.
  • Avoided MapReduce by using PySpark for boosting performance to 3x times.
  • Solved performance issues in Hive with understanding of joins, Groups, and aggregation and how does it translate to MapReduce jobs.
  • Worked on RDD and DataFrame techniques in PySpark for processing data at a faster rate.
  • Loaded stream data into HDFS using Flume, Kafka and Spark Streaming.
  • Imported data from critical applications to HDFS for data analysis.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the R&D team
  • Designed and Developed jobs that handles the Initial load and the Incremental load automatically using Oozie workflow.
  • Worked in setting up Hadoop on Pseudo distributed environment
  • Experienced in setting up Spark notebook on Ubuntu Operating system.
  • Involved in unit testing activities and test data preparation for various business requirements.
  • Worked on the upgrades in AWS environment along with admin team and did the regression testing.
  • Replaced the existing data analysis tool with Hadoop.
  • Sound working knowledge of HBase and NoSQL DB concepts.
  • Moved between agile and waterfall approaches depending on project specifics and client goals, creating detailed project road maps, plans, schedules and work breakdown structures.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries.
  • Used MS Team Foundation Service for project tracking, bug tracking and project management.
  • Involved in Scrum calls, Grooming and Demo meeting.

Environment: Hadoop, Hive, Spark, Sqoop, Flume, HDFS, MapReduce, Kafka, Ubuntu, AWS, HBase, NoSQL, TFS and Windows.

We'd love your feedback!