Hadoop Developer Resume
Phoenix, AZ
SUMMARY:
- 5+ years of IT experience in the field of Information Technology that includes analysis, design, development and testing of complex applications.
- Strong working experience with Big Data and Hadoop Ecosystems including HDFS, PIG, HIVE, HBase, Yarn, Sqoop, Flume, Oozie, Hue, MapReduce and Spark.
- Extensive experience in analyzing data using Hive QL, Pig Latin and MapReduce programs in Java.
- Extensively implemented POC's on migrating to Spark - Streaming to process the live data.
- Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
- Hands on with real time data processing using distributed technologies Storm and Kafka.
- Used Different Spark Modules like Spark core, Spark RDD's, Spark Dataframe, Spark SQL.
- Converted Various Hive queries into Spark transformations and Actions that are required.
- Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
- Worked on analyzing Hadoop cluster and different big data analytic tools including HBase database and Sqoop.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using CSS, Avro, Parquet, JSON, CSV.
- Having good knowledge of Oracle as Database and excellent in writing the SQL queries and scripts.
- Experience in implementing Kerberos authentication protocol in Hadoop for data security.
- Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration.
- Experience in setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud.
TECHNICAL SKILLS:
Big Data Technologies: HDFS, Hive, MapReduce, Pig, Sqoop, Oozie, FlumeKafka, YARN and Spark
Scripting Languages: Shell, Python
Programming Languages: Java, Scala, Python, SQL, C
Hadoop Distributions: Cloudera, Hortonworks, MapR
NoSQL databases: HBase, Cassandra
Tools: SVN, GitHub, Jenkins, Tableau
Operating systems: UNIX, LINUX, Mac OS and Windows
Databases: Oracle, SQL Server, MySQL.
PROFESSIONAL EXPERIENCE:
Confidential, Phoenix AZ
Hadoop Developer
Responsibilities:
- Involved in the Complete Software development life cycle (SDLC) to develop the application.
- Developed Kafka consumer component for near real-time and Real-Time data processing in Java.
- Worked on load and transform large sets of structured data using spark streaming.
- Worked on loading the data into spark RDD, Data Frames and performed in-memory data computation to get faster output response.
- Part of designing and developing a custom Java to pull data from source systems and publish the resultant to a specific Kafka Topic.
- Developing spark jobs by using Java and Spark-SQL migrating the SQL procedures.
- Creating Kafka connectors to pull the data from the database and publish the data into Kafka.
- Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Working on migrating the history ingestion data from Datawarehouse to Bigdata hive.
- Involved in loading data from Unix File System into HDFS with different format of data.
- Working with different data sources like XML files, JSON files, SQL server and DB2 to load data into Hive tables and HDFS.
- Worked with to create external tables, staging tables and joined the tables as per the requirement and built multiple data pipelines.
- Worked on performance tuning of Hive and Spark jobs.
- Used HiveQL for data analysis for importing the structured data to specified tables for reporting.
- Monitored and tracked the issues within the team using JIRA.
- Working application in Agile methodology.
Environment: Hadoop, Java, Linux, Map R, SQL, Kafka, Hive, Spark, Oracle, DB2, Netezza, Oozie, Informatica, Jira, Rally.
Confidential, New Jersey NJ
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Load the data into spark RDD and performed in-memory data computation to get faster output response.
- Developed Spark jobs and Hive Jobs to transform data.
- Developed Spark scripts by writing custom RDDs in Python for data transformations and perform actions on RDDs.
- Worked on Oozie workflow engine for job scheduling Imported and exported data into MapReduce and Hive using Sqoop.
- Developed Sqoop scripts to import, export data from relational sources and handled incremental loading on the data by date.
- Developed Kafka consumer component for Real-Time data processing in Java and Scala.
- Used Impala to query Hive tables for faster query response times.
- Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Created Partitioned and Bucketed Hive tables in Parquet and Avro File Formats with Snappy compression and then loaded data.
- Written Hive queries using spark SQL that integrates with spark environment.
- Developed MapReduce programs to parse the raw JSON data and store the refined data in tables
- Used Kafka to load data in to HDFS and move data into HBase.
- Captured the data logs from web server into HDFS using Flume for analysis.
- Worked on moving data pipelines from CDH cluster to run on AWS EMR.
- Involved in moving data from HDFS to AWS Simple Storage Service (S3) and extensively worked with S3 bucket in AWS.
- Developed spark application for filtering Json source data in AWS S3 location and store it into HDFS with partitions and used spark to extract schema of Json files.
- Responsible for migrating the code base from Cloudera Platform to Amazon EMR and evaluated Amazon eco systems components like Redshift.
Environment: Linux, Hadoop, Python, Scala, CDH, SQL, Sqoop, HBase, Hive, Spark, Oozie, Cloudera Manager, Oracle, Windows, Yarn, Spring, Sentry, AWS, S3, SQL.
Confidential, Richardson, TX
Hadoop Developer
Responsibilities:
- Involved in the Complete Software development life cycle (SDLC) to develop the application.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce.
- Loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Generated Java APIs for retrieval and analysis on No-SQL Cassandra database.
- Helped with the sizing and performance tuning of the Cassandra cluster.
- Developed Hive queries to process the data and generate the results in a tabular format.
- Handled importing of data from multiple data sources using Sqoop, performed transformations using Hive, MapReduce and loaded data into HDFS.
- Worked on extracting data from CSV, JSON Files and stored them in Avro and parquet formats.
- Implemented Partition, bucketing concepts in Hive and designed both Managed and External tables in Hive.
- Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement in project.
- Load and transform large sets of structured, semi structured using Hive.
- Involved in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Worked on Creating Kafka topics, partitions, writing custom partitioned classes.
- Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the data.
- Monitoring and controlling local file system disk space usage, log files, cleaning log files with automated scripts.
- Involved in writing OOZIE jobs for workflow automation.
- Involved in collecting metrics for Hadoop clusters using Ganglia and Ambari.
Environment : Unix, Linux, Hortonworks, Scala, HDFS, Map Reduce, Hive, Flume, Sqoop, Ganglia, Ambari, Oracle, Ranger, Python, Apache Hadoop, Cassandra.
Confidential
JAVA Developer
Responsibilities:
- Performed Requirement Gathering & Analysis by actively soliciting, analyzing and negotiating customer requirements and prepared the requirements specification document for the application using Microsoft Word.
- Developed Use Case diagrams, business flow diagrams, Activity/State diagrams.
- Developed presentation layer using Java Server Faces (JSF) MVC framework.
- Used JSP, HTML and CSS, jQuery as view components in MVC.
- Developed custom controllers for handling the requests using the spring MVC controllers.
- Used JDBC to invoke Stored Procedures and used JDBC for database connectivity to SQL.
- Deployed the applications on WebLogic Application Server.
- Developed Web services using Restful and JSON.
- Created and managed microservices using Spring Boot that create, update, delete and get the data.
- Used Oracle database for tables creation and involved in writing SQL queries using Joins and Stored Procedures.
- Developed JUnit Test Cases for Code unit test.
- Worked with configuration management groups for providing various deployment environments set up including System Integration testing, Quality Control testing etc.
Environment: Java/J2EE, SQL, Oracle, JSP, JSON, Java Script, Web Logic, HTML, JDBC, Spring, Hibernate, XML, JMS, log4j, JUnit, Servlets, MVC, Eclipse.