Hadoop Developer Resume
Lancaster, PA
PROFESSIONAL SUMMARY:
- Around 6 years as a Hadoop Developer in design, development, deploying and large scale supporting large scale distributed systems.
- Expertise in Hadoop eco system components HDFS, MapReduce, Yarn, HBase, Pig, Sqoop, Flume and Hive for scalability, distributed computing and high - performance computing.
- Experience in Designing, developing and implementing connectivity products that allow efficient exchange of data between our core database engine and Hadoop ecosystem.
- In depth knowledge of Hadoop Architecture and its various components such as job tracker, Task Tracker, Name Node, Data Node, Resource Manager and Map reduce components. These components are also called as the Five Demons in Hadoop Architecture.
- Experience in analyzing data using HiveQL, HBase and custom MapReduce programs in Java.
- Experience in administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig in Distributed Mode.
- Experience in installation, configuration, management and deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster.
- Experience in integrating Hive queries into Spark environment using Spark SQL.
- Hands-on experience in using Big Data Transformation ETL techniques like Talend to extract the data from external repositories and Internal sources.
- Profound Knowledge on cloud Confidential like Azure and AWS E2 instances.
- Good Knowledge on HDFS Daemons which includes Resource Manager, Node Manager, Name Node and Data Node.
- Implemented in setting up standards and processes for Hadoop based application design and implementation.
- Worked with relational database systems (RDBMS) such as Mysql, Oracle and database systems like HBase.
- Experience in Administering, Installation, configuration, troubleshooting, Security, Backup, Performance Monitoring and Fine-tuning of Linux Redhat.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
TECHNICAL SKILLS:
Hadoop Technologies: HDFS, MapReduce, Hive, Impala, Pig, Sqoop, Flume, Oozie, Zookeeper, Ambari, Hue, Spark, Strom, Taland, Ganglia
Operating System: Windows, Linux
Languages: Java, J2EE, SQL, PL/SQL, Shell Script
Project Management / Tools: MS Project, MS Office, TFS, HP Quality Center Tool
Front End: HTML, JSTL, DHTML, JavaScript, CSS, XML, XSL, XSLT
Databases: MySQL, Oracle 11g/10g/9i, SQL Server
NoSQL Databases: HBase, Cassandra
File System: HDFS
Reporting Tools: Jasper Reports, Tableau
IDE Tools: Eclipse, NetBeans
Application Server: Apache Tomcat, Web Logic
PROFESSIONAL EXPERIENCE:
Confidential, Lancaster, PA
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytical and processing tools including Pig, Hive, Spark, and Spark Streaming.
- Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL, Data Frame, SparkYARN.
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Developed Preprocessing job using Spark Data frames to flatten Json documents to flat file.
- Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines.
- Log data from webservers across the environments is pushed into associated Kafka topic partitions, SparkSQL is used to calculate the most prevalent diseases in each city from this data.
- Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS
- Imported the data from different sources like HDFS/HBase into SparkRDD.
- Written shell scripts that run multiple Hive jobs which helps to automate different Hive tables incrementally which are used to generate different reports using Tableau for the Business use.
- Responsible for data extraction and data integration from different data sources into Hadoop Data Lake by creating ETL pipelines Using Spark, MapReduce, Pig, and Hive.
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud and making the data available in Athena and Snowflake.
- Worked with ELASTIC MAPREDUCE and setup Hadoop environment in AWS EC2 Instances.
- Developed Spark programs with Scala, and applied principles of functional programming to process the complex unstructured and structured data sets
- Involved in writing Java API for Amazon Lambda to manage some of the AWS services.
- Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.
- Worked on different file formats like XML files, Sequence files, JSON, CSV and Map files using Map Reduce Programs.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Utilized Agile Scrum Methodology to help manage and organize a team of developers with regular code review sessions.
Environment: HDFS, MapReduce, Hive, Sqoop, HBase, Oozie, Flume, Sqoop, Kafka, Zookeeper, SparkSQL, Spark Dataframes, PySpark, Scala, Amazon AWS S3, Java, JSON, SQL Scripting and Linux Shell Scripting.
Confidential, Brooklyn, NY
Hadoop Developer
Responsibilities:
- Worked closely with the business analysts to convert the Business Requirements into Technical Requirements and prepared low and high level documentation.
- Imported required tables from RDBMS to HDFS using Sqoop and also used Storm and Kafka to get real time streaming of data into HBase.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce and Spark to ingest customer behavioral data and purchase histories into HDFS for analysis.
- Develop and run Map-Reduce jobs on a multi Peta byte YARN and Hadoop clusters which processes billions of events every day, to generate daily and monthly reports as per user's need.
- Developed Apache Spark Applications by using Scala, python and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Created big data workflows to ingest the data from various sources to Hadoop using OOZIE and these workflows comprises of heterogeneous jobs like Hive, SQOOP and Python Script.
- Used sparkSQL for reading data from external sources and processes the the data using Scala computation framework.
- Generate final reporting data using Tableau for testing by connecting to the corresponding Hivetables using HiveODBC connector.
- Used Hive Query language (HQL) to further analyze the data to identify issues and behavioral patterns.
- Wrote custom MapReduce codes, generated JAR files for user defined functions and integrated with Hive to help the analysis team with the statistical analysis.
- Written multiple MapReduce programs to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce , loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
Environment: Apache Hadoop, HDFS, MapReduce, HBase, Hive, Yarn, Pig, Sqoop, Flume, Zookeeper, Kafka, Impala, SparkSQL, Spark Core, Spark Streaming, NoSQL, MySQL, ETL, WebLogic, Web Analytics, Shell Scripting, Ubuntu.
Confidential, Cincinnati, OH
Hadoop Developer
Responsibilities:
- Installed and configured various components of Hadoop Ecosystem like Job Tracker, Task Tracker, Name Node and Secondary Name Node.
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce.
- Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations to build the data model and persists the data in HDFS.
- Optimized MapReduce code, pig scripts and performance tuning and analysis.
- Successfully loading files to Hive and HDFS from Oracle, SQL Server using SQOOP.
- Involved in launching and Setup of HADOOP/ HBASE Cluster which includes configuring different components of HADOOP and HBASE Cluster.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Used SparkSQL to query data from Db2, Oracle using the respective connectors available.
- Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data.
- Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS. Big Data tool to load the big volume of source files from S3 to Redshift.
- Used the JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into HIVEtables.
- Performed troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Scala, Data Frame, and Pair RDD's.
Environment: Hadoop, Java, MapReduce, AWS, HDFS, Redshift, Spark, Hive, Pig, Linux, XML, Eclipse, Distribution, DB2, SQL Server, Informatica, Oracle, SQL, Scala, Teradata, EC2, AWS, JSON, Elasticsearch, DynamoDB, Hortonworks, ETL.
