We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Detroit, MI

PROFESSIONAL SUMMARY:

  • Over 5+ years of Professional experience in IT Industry, involved in developing, Implementing, configuring, testing Hadoop ecosystem components and maintenance of various web based applications.
  • Experience in designing and implementing complete end - to-end Hadoop Infrastructure using MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Spark, Oozie and Zookeeper.
  • Expert Hands-on in Installing, Configuring, Testing Hadoop Ecosystem components
  • Experience on Hadoop clusters using major Hadoop Distributions - Cloudera and Hortonworks (HDP).
  • Familiar and good exposure with Apache Spark ecosystem such as Shark, Spark Streaming using Scala and Python
  • Good knowledge ofHadoopArchitecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
  • Experience in working with MapReduce programs using Hadoopfor working with Big Data.
  • Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa
  • Collecting and aggregating large amount of Log Data using Apache Flume and storing data in HDFS for further analysis.
  • Monitor Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager.
  • Job/workflow scheduling and monitoring tools like Oozie.
  • Experience in designing both time driven and data driven automated workflows using Oozie.
  • Experience in different layers of Hadoop Framework - Storage (HDFS), Analysis (Pig and Hive), Engineering (Jobs and Workflows).
  • Good understanding of NoSQL databases and hands on experience with HBase.
  • Experienced in loading dataset into Hive for ETL (Extract, Transfer and Load) operation.
  • Working knowledge in SQL, PL/SQL, Stored Procedures, Functions, Packages, DB Triggers, Indexes, SQL* Loader.
  • Experience in Amazon AWS cloud services (EC2, EBS, S3).
  • Experienced in using Integrated Development environments like Eclipse, NetBeans, Kate and gEdit.
  • Excellent communication skills, interpersonal skills, problem solving skills a very good team player along with a can do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.

TECHNICAL SKILLS:

Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Oozie, Zookeeper, Spark, Shark.

Hadoop platforms: Cloudera, HortonWorks

Languages: Java, Scala

Web Technologies: SOAP, REST

Scripting Language: UNIX Shell Script, K Shell

RDBMS DB: MS SQL, MySQL, Oracle

NoSQL Technologies: Hbase, MongoDB, Cassandra

Tools: & Utilities: Eclipse, Visual Studio, Net Beans, SVN, GitHub, Maven

Operating Systems: Windows 7/8, Vista, Windows XP, Linux (Ubuntu, Red hat)

PROFESSIONAL EXPERIENCE:

Confidential, Detroit, MI

Hadoop Developer

Responsibilities:

  • Involved in Analysis, Design, System architectural design, Process interfaces design, design documentation.
  • Loading the data from multiple Data sources like (SQL, DB2, and Oracle) into HDFS using Sqoop and load into Hive tables.
  • Developed various Big Data workflows using custom MapReduce, Pig, Hive and Sqoop.
  • Creating Oozie workflows and coordinator jobs for recurrent triggering of Hadoop jobs such as Pig, Hive, Sqoop as well as system specific jobs by time (frequency) and data availability.
  • Installing, Upgrading and Managing Hadoop Cluster on HortonWorks.
  • Worked on evaluating, architecting, installation/setup of HortonWorks 2.1/1.8 Big Data ecosystem which includes Hadoop, Pig, Hive, Sqoop etc.
  • Developed Spark Streaming Jobs in Scala to consume data from Kafka Topics, made transformations on data and inserted to HBase.
  • Used Spark for fast and general processing engine compatible with Hadoopdata.
  • Used Spark to design and perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.
  • Setup of HADOOP Cluster on AWS, which includes configuring different components of HADOOP.
  • Analyzed large data sets by running Hive queries, and Pig scripts.
  • Implemented Pig Latin scripts using operators such as LOAD,STORE,DUMP, FILTER, DISTINCT, FOREACH, GENERATE,GROUP, COGROUP, ORDER, LIMIT, AND UNION, JOINS, SPLIT, AGGFUNCTIONS.
  • Cascade Jobs introduced to make the data Analysis more efficient as per the requirement.
  • Built re-usable Hive UDF libraries which enabled various business analysts to use these UDF’s in Hive querying.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Applied MapReduce frameworkjobs for data processing by installing and configuring Hadoop and HDFS.
  • Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
  • Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a MapReduce way.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase, NoSQL database and Sqoop.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Used FLUME to export the application server logs into HDFS.

Environment: Hadoop, HDFS, HortonWorks, Sqoop, Hive, Pig, MapReduce, Spark, Scala, Kafka, AWS, HBase, MongoDB, Cassandra, NoSQL, Flume and Windows.

Confidential, Philadelphia, PA

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Installed and configured Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Developed Simple to complex Map/Reduce Jobs using Hive and Pig.
  • Contributed to building hands-on tutorials for the community to learn how to setup Hortonworks Data Platform (powered by Hadoop) and Hortonworks Data flow (powered by Nifi).
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Performed processing on large sets of structured, unstructured and semi structured data.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from Oracle into HDFS using Sqoop.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Used Pig UDF's to implement business logic in Hadoop.
  • Avoided MapReduce by using PySpark for boosting performance to 3x times.
  • Worked on RDD and DataFrame techniques in PySpark for processing data at a faster rate.
  • Involved in ETL Data Cleansing, Integration and Transformation using Hive and PySpark.
  • Involved in setting up Spark notebook on Ubuntu Operating system.
  • Responsible to migrate from Hadoop to Spark frameworks, in-memory distributed computing for real time fraud detection.
  • Used Spark to store data in-memory.
  • Implemented batch processing of data sources using Apache Spark.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig and HiveQL.
  • Developed the Pig UDF’S to pre-process the data for analysis.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Cluster co-ordination services through ZooKeeper.
  • As Part of POC setup Amazon web services (AWS) to check whether Hadoop is a feasible solution or not.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Hadoop, MapReduce, HortonWorks, HDFS, Hive, SQL, Cloudera Manager, Pig, Apache Sqoop, Spark, Oozie, HBase, AWS, PL/SQL, MySQL and Windows.

Confidential, Birmingham, AL.

Hadoop Developer

Responsibilities:

  • Developed big data analytic models for customer fraud transaction pattern detection models using Hive from customer transaction data. It also involved transaction sequence analysis with gaps and no gaps, network analysis between common customers for the top fraud patterns.
  • Developed customer transaction event path tree extraction model using Hive from customer transaction data.
  • Enhanced and optimized the customer path tree GUI viewer to incrementally load the tree data from HBase, NoSQL database.
  • Developed the Pig UDF’S to pre-process the data for analysis.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig and HiveQL.
  • Created Hive tables to store data and written Hive queries.
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Exported the patterns analyzed back to Teradata using Sqoop.
  • Involved in Installing, ConfiguringHadoop Eco System, and ClouderaManager using CDH4 Distribution.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on theHadoopcluster.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Design and implement Map/Reduce jobs to support distributed data processing.
  • Process large data sets utilizing our Hadoop cluster.
  • Designing NoSQL schemas in Hbase.
  • Developing map-reduce ETL in Pig.
  • Extensive data validation using HIVE.
  • Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
  • Involved in weekly walkthroughs and inspection meetings, to verify the status of the testing efforts and the project as a whole.

Environment: Hadoop Map Reduce, Pig Latin, Zookeeper, Oozie, Sqoop, Hive, Hbase, UNIX Shell Scripting.

Confidential, Plano. TX.

Hadoop Developer

Responsibilities:

  • Worked on analyzingHadoopcluster and different big data analytic tools including Hive, Spark, Sqoop, flume, Oozie.
  • Involved in importing and exporting data (SQL Server, XML, csv and text file) from local and external file system and RDBMS to HDFS.
  • Worked extensively with HIVE DDLs and Hive Query language (HQL).
  • Designed a data warehouse using Hive, created and managed Hive tables inHadoop.
  • Implemented SCD (Slowly Changing Dimension) concepts in Hive.
  • Responsible of managing data from disparate sources.
  • Solved performance issues in Hive with understanding of joins, Groups, and aggregation and how does it translate to MapReduce jobs.
  • Loaded stream data into HDFS using Flume, Kafka and Spark Streaming.
  • Imported data from critical applications to HDFS for data analysis.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the R&D team
  • Designed and Developed jobs that handles the Initial load and the Incremental load automatically using Oozie workflow.
  • Worked in setting upHadoopon Pseudo distributed environment
  • Involved in unit testing activities and test data preparation for various business requirements.
  • Worked on the upgrades in AWS environment along with admin team and did the regression testing.
  • Replaced the existing data analysis tool withHadoop.
  • Sound working knowledge of HBase and NoSQL DB concepts.
  • Moved between agile and waterfall approaches depending on project specifics and client goals, creating detailed project road maps, plans, schedules and work breakdown structures.
  • Created and maintained Technical documentation for launchingHadoopClusters and for executing Hive queries.
  • Used MS Team Foundation Service for project tracking, bug tracking and project management.
  • Involved in Scrum calls, Grooming and Demo meeting.

Environment: Hadoop, Hive, Sqoop, Flume, HDFS, MapReduce, Kafka, Ubuntu, AWS, HBase, NoSQL, TFS and Windows.

We'd love your feedback!