Hadoop Developer Resume
Philadelphia, PA
SUMMARY
- 7+ years of professional IT experience as a Hadoop developer and Java developer.
- Extensive experience in working with Big data technologies Hadoop, Spark, Pig, Hive, HBase, Sqoop, Flume, and Kafka.
- Good knowledge ofHadooparchitecture and various components such as HDFS, Map Reduce programming paradigm, Job Tracker, Task Tracker, Name Node and Data Node.
- Experience in creating and maintaining large data pipelines using Kafka and Akka for handling Terabytes of data.
- Experience in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, JSON and Avro.
- Good experience working with NoSQL databases Cassandra and HBase.
- Experience in installation, configuration, Management, supporting and monitoringAWS EMR, Cloudera (CDH5), MapR and HortonWorks Distributions.
- Experience developing Scala applications for Loading/Streaming data from NoSQL databases (HBASE) to HDFS.
- Good experience working with Mapper, Reducer, Combiner, Partitioner, Shuffling and Sorting process along with Custom Partitioning for efficient Bucketing.
- Extensive experience in extending Hive and Pig core functionality by writing UDFs.
- Good experience in Java application development and with frameworks Struts, spring and Hibernate.
- Experience working with Apache NiFi to automate the data movement between different Hadoop systems.
- Experience in creating RDD’s in Spark and applying operations - Transformations and actions.
- Experience in developing and consuming webservices using REST and SOAP protocols.
- Experience in performance tuning hive queries, map-reduce jobs and spark jobs.
- Experience in moving data utilizing Sqoop from HDFS to Relational Database Systems and the vice versa.
- Experience in working with Flume to load the log data from different sources into HDFS.
- Experience in designing the Zookeeper to facilitate the servers in clusters and to keep up the information consistency.
- Experience in planning both time driven and information driven mechanized work processes utilizing Oozie using python.
- Good experience working with Spark and improving the performance and optimization of the existing algorithms inHadoopusing Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, Map Reduce, YARN, Pig, Hive, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Kafka
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks and MapR.
Tools: Talend, Informatica, Eclipse
Programming Languages: Java, SQL, Python, C#, PHP, Scala
Web Technologies: HTML, CSS, JavaScript
Operating System: Windows, Unix, Linux
Databases: SQL Server, Oracle, DB2, MySQL
NoSQL Databases: Cassandra, HBase
PROFESSIONAL EXPERIENCE
Confidential, Philadelphia, PA
Hadoop Developer
Responsibilities:
- Worked on developing Kafka producer and consumers, Cassandra clients and Spark with components HDFS, Hive.
- Worked on Populating HDFS and HBase with huge amounts of data and ingest data in to spark engine using Apache Kafka.
- Created RDD’s and worked on applying operations - Transformation and Actions.
- Managed and Scheduled Spark Jobs on a Hadoop cluster using Oozie.
- Worked on optimizing Hive queries using Hive on top of Spark engine.
- Worked on creating and maintaining data pipelines using Kafka and Akka for handling large terabytes of data.
- Integrated Hadoop cluster with Spark engine to perform Batch and GraphX operations.
- Wrote Sqoop scripts to import data from different data sources to Cassandra.
- Used HUE for running Hive queries and created partitions using Hive to improve performance.
- Performed cleansing operations using Apache Nifi flow topologies before moving data into HDFS.
- Worked on creating ETL jobs using Talend.
- Installed and configured Hadoop cluster using AWS and worked with EMR and EC2 web services for fast and efficient data processing.
- Developed the batch scripts to fetch the data from AWS S3 and do required transformations in Scala using Spark framework.
- Developed job flows in Oozie to automate the workflow for extraction of data from warehouses and weblogs.
- Responsible for maintaining and expanding AWS (Cloud Services) infrastructure using AWS.
- Wrote Python and Shell scripts for various deployments and automation process.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Developed data pipelines to ingest data into HDFS using Flume, Sqoop and Pig.
- Worked on performance tuning hive queries, map-reduce jobs, spark jobs.
- Moved data from RDBMS to Hive Dynamic partition tables using Sqoop.
Environment: Apache Spark, Kafka, Cassandra, Flume, YARN, Sqoop, Oozie, Hive, Pig, Java,Hadoopdistribution of Cloudera 5.4/5.5, Linux, XML, Eclipse, MySQL.
Confidential, Omaha, NE
Hadoop Developer
Responsibilities:
- Used Sqoop and Java API’s to import the data to Cassandra from different relational databases.
- Created tables in Cassandra and loaded large data sets of structured, semi-structured and unstructured data from various data sources.
- Developed Map reduce jobs in Java for cleaning and preprocessing data.
- Wrote Python scripts for wrapper and utility automation.
- Performed cleansing operations by using storm builder topologies before moving data in to Cassandra.
- Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.
- Worked on configuring Hive, PIG, Impala, Sqoop, Flume and oozie in cloudera.
- Automated data movement between different Hadoop systems using Apache NiFi.
- Wrote Map reduce programs in python using Hadoop Streaming API.
- Wrote on creating Hive tables and loading them with data and writing Hive queries.
- Migration of ETL processes from SQL server to Hadoop using PIG for data manipulation.
- Developed spark jobs using Scala in test environment and Spark sql for querying.
- Worked on importing data from oracle tables to HDFS and Hbase tables using Sqoop.
- Wrote scripts to load data in to Spark RDDs and do in memory computations.
- Wrote Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
- Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDDs in Scala.
- Experience in Elastic search technologies in creating custom Solr Query components.
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
- Worked on different data sources such as Oracle, Netezza, MySQL, Flat files etc.
- Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza.
- Worked with Flume to load the log data from different sources into HDFS.
- Good knowledge in using apache NiFi to automate the data movement between differentHadoop systems.
- Developed Talend jobs to move inbound files to HDFS file location based on monthly, weekly, daily and hourly partitioning.
Environment: Cloudera, Map Reduce, SparkSQL, SparkStreaming, Pig, Hive, Flume, Hue, Oozie, Java, Eclipse, Zookeeper, Cassandra, Hbase, Talend, Github.
Confidential
Java/Hadoop Developer
Responsibilities:
- Developed JSP, JSF and Servlets to dynamically generate HTML and display the data to the client side.
- Used Hibernate Framework for persistence onto oracle database.
- Written and debugged the ANT Scripts for building the web application.
- Developed web services inJava and used WSDL to publish the services to another application.
- Wrote SQL commands and Stored Procedures to retrieve data from Oracle database. Worked to plug this procedure inJavaclasses.
- Worked on developing UI using HTML, CSS and JavaScript.
- Involved in writing PL/SQL - Stored Procedures, Functions, Triggers, and Sequence etc.
- ImplementedJavaMessage Services(JMS) using JMS API.
- Worked on managing and reviewing Hadoop log files.
- Installed and configured Hadoop, YARN, Map Reduce, Flume, HDFS, developed Map Reduce jobs inJavafor data cleaning.
- Coded using Servlets, SOAP Client and Apache CXF RestAPI's for delivering the data from our application to external and internal for communication protocol.
- Worked on Cloudera distribution system for running Hadoop jobs on it.
- Expertise in writing Hadoop Jobs to analyze data using Map Reduce, Hive, Pig, Solr and Splunk.
- Created SOAP Web Service using JAX-WS, to enabled client to consume a SOAP Web Service.
- Worked on moving the data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
- Experienced in designing and developing multi-tier scalable applications usingJavaand J2EE Design Patterns.
Environment: MapR,Java, HTML,JavaScript, SQL Server, PL/SQL, JSP, Spring, Hibernate, Web Services, SOAP, SOA, JSF,Java, JMS, Junit, Oracle, Eclipse, SVN, XML, CSS, Log4j, Ant, Apache Tomcat.