We provide IT Staff Augmentation Services!

Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Philadelphia, PA

SUMMARY

  • 7 years of experience in IT, which includes experience in Bigdata Technologies, Hadoopecosystem, Data Warehousing, SQL related technologies.
  • Extensive experience as Hadoop Developer and Big Data Analyst. Primary technical skills in HDFS, MapReduce, YARN, Hive, Sqoop, HBase, CA7,Flume, Oozie, Zookeeper.
  • Working experience with Big Data and Hadoop Distributed File System (HDFS). In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Namenode, Data node and MapReduce concepts.
  • Experience in working with MapReduce programs using Apache Hadoop for working with BigData analysis.
  • Hands on experience in working with Ecosystems like Hive, Sqoop, Spark, MapReduce, Flume, Oozie.
  • Knowledge on Scala language features - Language fundamentals, Classes, Objects, Traits, Collections, Case Classes, High Order Functions, Pattern Matching, Extractors etc.
  • Experience on Creating Internal and External tables and implementation of performance\ improvement techniques using partitioning tables, bucketing tables in Hive.
  • Developed the Sqoop scripts to import data from RDBMS to HIVE, RDBMS to HDFS and Export Data from HDFS to RDBMS.
  • Loading data into the HDFS from dynamically generated files, relational database management systems using Sqoop.
  • Experience handling different file formats like JSON, AVRO, ORC and Parquet.
  • Knowledge of job workflow scheduling and monitoring tools like Oozie.
  • Experience with RDBMS Databases such as MySQL, Oracle and expertise on Writing SQLs, HQLs.
  • Experience on Scripting languages like Bash, Unix Shell scripting and knowledge on Python
  • Expertise in preparing the test cases, documenting and performing unit testing and Integration.
  • Working experience on Data ingestion tools like Apache NiFi also data loading into Common Data Lake using HiveQLs.
  • Working experience to develop wrapper shell scripts to schedule the data loading using HiveQLs using batch scheduled jobs.
  • Experience in various phases of Agile Development such as Requirement, Analysis, Design, Development, and Unit Testing.
  • Developed SQOOP Scripts for importing large dataset from RDBMS to HDFS. Knowledge on Creating the UDFs in Java and Register them in PIG and HIVE.
  • Experience in dealing with Spark Streaming and Apache Kafka to fetch live stream data.
  • Developed Spark scripts by using Scala shell commands as per the requirement
  • Extensive experience in developing applications that perform Data Processing tasks using Teradata, Oracle, SQL Server and MySQL database

TECHNICAL SKILLS

Big Data Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Kafka, Cassandra, Apache Spark, Spark Streaming, HBase, Flume, Impala

Hadoop Distribution: Cloudera, Horton Works, Apache, AWS

Languages: Java, SQL, PL/SQL, Python, Pig Latin, HiveQL, Scala, Regular Expressions

Web Technologies: HTML, CSS, JavaScript, XML, JSP, Restful, SOAP

Operating Systems: Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.

Build Automation tools: SBT, Ant, Maven

Version Control: GIT,GIT HUB, BitBucket

IDE &Build Tools, Design: Eclipse, Visual Studio, Net Beans, Junit, SQL Developer, MySQL Workbench

Databases: Oracle, SQL Server, MySQL, MS Access, NoSQL Database (HBase, MongoDB).

Operating Systems: Windows 98, 2000, XP, Windows 7,10, Mac OS, Unix, Linux

Cloud Technologies: MS Azure, Amazon Web Services, Microsoft CORE and ASP.Net CORE

PROFESSIONAL EXPERIENCE

Hadoop Developer

Confidential - Philadelphia, PA

Responsibilities:

  • Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
  • Developed Spark API to import data into HDFS from Teradata and created Hive tables.
  • Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
  • Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
  • Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.
  • Involved in performance tuning of Hive from design, storage and query perspectives.
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
  • Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
  • Developed Spark scripts to import large files from Amazon S3 buckets.
  • Developed Spark core and Spark SQL scripts using Scala for faster data processing.
  • Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
  • Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
  • Integrated Hive and Tableau Desktop reports and published to Tableau Server.
  • Developed shell scripts for running Hive scripts in Hive and Impala.
  • Orchestrated number of Sqoop and Hive scripts using Oozie workflow and scheduled using Oozie coordinator.
  • Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Environment: HDFS, Yarn, Map Reduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting, Cloudera.

Environment: HDFS, Yarn, Map Reduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, SparkSQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting, Cloudera

Hadoop/Big Data Developer

Confidential - Charlotte, NC

Responsibilities:

  • Responsible for architecting Hadoop clusters with CDH3 and involved in installation of CDH3 and up gradation to CDH4 from CDH3
  • Worked on creating Key space in Cassandra for saving the Spark Batch output
  • Worked on Spark application to compact the small files present into hive ecosystem to make it equivalent to block size of HDFS
  • Manage migration of on-perm servers to AWS by creating golden images for upload and deployment
  • Implemented the real time streaming ingestion using Kafka and Spark Streaming
  • Loaded data using Spark-streaming with Scala and Python
  • Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka and Scala
  • Experience in loading the data into Spark RDD and performing in-memory data computation to generate the output responses
  • Migrated complex map reduce programs into In-memory Spark processing using Transformations and actions
  • Developed full text search platform using NoSQL and Logstash Elastic Search engine, allowing for much faster, more scalable and more intuitive user searches
  • Developed the Sqoop scripts to make the interaction between Pig and MySQL Database
  • Worked on Performance Enhancement in Pig, Hive and HBase on multiple nodes
  • Worked with Distributed n-tier architecture and Client/Server architecture
  • Supported Map Reduce Programs those are running on the cluster and developed multiple Map Reduce jobs in Java for data cleaning and pre-processing
  • Developed MapReduce application using Hadoop, MapReduce programming and HBase
  • Evaluated usage of Oozie for Work Flow Orchestration and experienced in cluster coordination using Zookeeper
  • Developing ETL jobs with organization and project defined standards and processes
  • Experienced in enabling Kerberos authentication in ETL process
  • Implemented data access using Hibernate persistence framework
  • Design of GUI using Model View Controller Architecture (STRUTS Frame Work)
  • Integrated Spring DAO for data access using Hibernate and involved in the Development of Spring Framework Controller

Environment: Hadoop 2.X, HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, HBase, Java, J2EE, Eclipse, HQL.

Sr.Hadoop/Spark Developer

Confidential - Charlotte, NC

Responsibilities:

  • Involved in the Complete Software development life cycle (SDLC) to develop the application.
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and Map Reduce on EC2.
  • Worked with the Data Science team to gather requirements for various data mining projects.
  • Worked with different source data file formats like JSON, CSV, and TSV etc.
  • Experience in importing data from various data sources like MySQL and Netezza using Sqoop, SFTP, performed transformations using Hive, Pig and loaded data back into HDFS.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce.
  • Import and export data between the environments like MySQL, HDFS and deploying into productions.
  • Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve the performance.
  • Involved in developing Impala scripts to do Adhoc queries.
  • Experience in Oozie workflow scheduler template to manage various jobs like Sqoop, MR, Pig, Hive, Shell scripts, etc.
  • Involved in importing and exporting data from HBase using Spark.
  • Involved in POC for migrating ETLS from Hive to Spark in Spark on Yarn Environment.
  • Actively participating in the code reviews, meetings and solving any technical issues.

Environment: Apache Hadoop, AWS, EMR, EC2, S3, Horton works, Map Reduce, Hive, Pig, Sqoop, Apache Spark, Zookeeper, HBase, Java, Oozie, Oracle, MySQL, Netezza and UNIX Shell Scripting.

Hadoop Developer

Confidential

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing
  • Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Experienced in defining job flows and managing and reviewing Hadoop log files
  • Load and transform large sets of structured, semi structured and unstructured data
  • Responsible to manage data coming from different sources and for implementing MongoDB to store and analyze unstructured data
  • Supported Map Reduce Programs those are running on the cluster and involved in loading data from UNIX file system to HDFS
  • Installed and configured Hive and written Hive UDFs
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way
  • Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data
  • Created HBase tables to store variable data formats of PII data coming from different portfolios
  • Load and transform large sets of structured, semi structured and unstructured data
  • Cluster coordination services through Zookeeper
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
  • Used Hibernate ORM framework with Spring framework for data persistence and transaction management and involved in templates and screens in HTML and JavaScript

Environment: Hadoop, HDFS, MapReduce, Pig, Sqoop, UNIX, HBase, Java, JavaScript, HTML

SQL/Java Developer

Confidential

Responsibilities:

  • Importing the data from the MySQL and Oracle into the HDFS using Sqoop.
  • Implemented CDH3 Hadoop cluster on Centos.
  • Worked on installing clusters, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
  • Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
  • Importing the unstructured data into the HDFS using Flume.
  • Written Map Reduce java programs to analyze the log data for large-scale data sets.
  • Involved in creating Hive tables, loading and analyzing data using hive queries.
  • Involved in using HBase Java API on Java application.
  • Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
  • Developed Pig Latin scripts to extract the data from the output files to load into HDFS.
  • Responsible for managing data from multiple sources.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
  • Designed and built many applications to deal with vast amounts of data flowing through multiple Hadoop clusters, using Java-based map-reduce.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.

Environment: Hadoop 1.0.0, Map Reduce, Hive, HBase, Flume, Sqoop, Pig, Zookeeper, Java, ETL, SQL, Centos.

We'd love your feedback!