Sr. Hadoop Developer Resume
Austin, TX
SUMMARY
- Over 8+ years of professional experience in various Software Development positions in core and enterprise software development using Big Data, Java/J2EE and Open Source technologies for Banking, Insurance Health care and Communication sectors.
- HadoopDeveloper with 4+ years of working experience in designing and implementing complete end - to-endHadoopbased data analytical solutions using HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase etc.
- Strong Knowledge on Architecture of Distributed systems and parallel processing, In-depth understanding of MapReduce programing paradigm and Spark execution framework.
- Profound understanding of Partitions and Bucketing concepts in Hive and designed both Managed and Experience in handling messaging services using Apache Kafka.
- Strong experience developing data transformation and other analytical applications in Spark, Spark-SQL using Scala programming language.
- Solid experience working with SQL & NOSQL databases using Oracle, PostgreSQL, MySQL, HBase & Cassandra.
- Experience in installation, configuration, management and deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster using Cloudera and Hortonworks distributions.
- Extensive experience in importing/exporting data from/to RDBMS into Hadoop Ecosystem using Apache Sqoop.
- Developed, deployed and supported several MapReduce applications in Java to handle semi and unstructured data.
- Hands-on experience in writing MapReduce programs and user-defined functions (UDFs) for Hive & Pig.
- Very strong industry experience in Apache Hive for data transformations.
- Strong experience fine tuning Hive queries for better performance outcomes.
- Experience in performing Query Analytics on Structured data using HiveQL, joins, aggregate functions.
- Extensive experience in migrating ETL operations into HDFS systems using Pig Scripts.
- Experienced in using Java API, Rest API to handle real time analytics on HBase data.
- Knowledge in handling Kafka cluster and created several topologies to support real-time processing requirements.
- Experienced with different scripting language like Python and shell scripts.
- Experience in writing build scripts using Maven, ANT and Gradle.
- Experienced in preparing and executing Unit Test Plan and Unit Test Cases using Junit, Easymock and MRUnit.
- Ability to work in a team and coordinate/resolve issues with team of developers and other stakeholders
- Fast learner, excellent team player, data driven approach for problem solving and decision-making.
- Good experience in creating data ingestion pipelines, data transformations, data management, data governance and real time streaming at an enterprise level.
- Experience in working with Java HBase API for ingestion processed data to Hbase tables.
- Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations in Cloudera Cluster.
- Assisted in Cluster maintenance, Cluster Monitoring, Managing and Reviewing data backups and log files.
- Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 and S3.
- Proficient in visualizing data using Tableau, QlikView, Microstratergy and MS Excel.
- Experience in developing ETL scripts for data acquisition and transformation using Informatica and Talend.
- Excellent global exposure to various work cultures and client interaction with diverse teams.
TECHNICAL SKILLS
Hadoop/NoSQL: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Avro, Hadoop, Zookeeper, Apache Crunch, Spark, Strom, Scala, Kafka, HBase, Cassandra
Programming Languages: Java, C, SQL, PL/SQL
IDE Tools: Eclipse, RAD
Frameworks: Hibernate, Spring, Struts, JMS, EJB, Junit, MRUnit
Web Technologies: HTML5, CSS3, JavaScript, JQuery, AJAX, Servlets, JSP,JSON, XML, XHTML, JSF
Web Services: SOAP,REST, WSDL, JAXB, and JAXP
Operating Systems: Windows, UNIX, LINUX, Ubuntu, CentOS
Application Servers: Jboss, Tomcat, Web Logic, Web Sphere
Databases: Oracle, MySQL, DB2, Derby, PostgreSQL
Reporting Tools: Jasper Reports, iReport
PROFESSIONAL EXPERIENCE
Confidential, Austin, TX
Sr. Hadoop Developer
Responsibilities:
- Created end to end spark applications for performing various data transformation activities.
- Created series of ingestion jobs using Sqoop, Kafka, custom Input adapter etc. to move data from multiple sources to HDFS.
- Developed simple to complex Map Reduce jobs using Java language for processing and validating the data.
- Developed data pipeline using Sqoop, Spark, Map Reduce, and Hive to ingest, transform and analyze customer behavioral data.
- Developed Spark jobs to discover trends in data usage by users.
- Implemented Spark using Scala and utilizing Dataframes and Spark SQL API for faster processing of data.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Real time streaming the data using Spark with Kafka
- Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive.
- Exported the analyzed data to the relational databases using Sqoop to further visualize and generate reports for the BI team.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
- Analyzed the data by performing Hive queries (Hive QL) and running Pig scripts (Pig Latin) to study customer behavior.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
- Created HBase tables and column families to store the user event data.
- Written automated HBase test cases for data quality checks using HBase command line tools.
- Scheduled and executed workflows in Oozie to run Hive and Spark jobs.
- Configured Kafka to read and write messages from external programs.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Developed interactive shell scripts for scheduling various data cleansing and data loading process.
- Performed data validation on the data ingested using Map Reduce by building a custom model to filter all the invalid data and cleanse the data.
- Experience with data wrangling and creating workable datasets.
Environment: Hadoop, Spark, Map Reduce, Pig, Hive, Sqoop, Oozie, HBase, Zoo keeper, Kafka, Flume, Solr, Tez, Impala, Mahout, Cassandra, Cloudera manager, MySQL, Jaspersoft, Multi-node cluster with Linux-Ubuntu, Windows, Unix.
Confidential, Somerset, NJ
Hadoop Developer
Responsibilities:
- Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
- Worked on automation of delta feeds from Teradata using Sqoop, also from FTP Servers to Hive.
- Implemented Hive tables and HQL Queries for the reports. Written and used complex data type in Hive.
- Developed Hive queries to analyze reducer output data.
- Designed workflow by scheduling Hive processes for Log file data, which is streamed into HDFS using Flume.
- Created various spark applications using scala to perform various enrichment of these click stream data with enterprise data of the users
- Responsible for creating end to end pipelines using Sqoop, Hive, Map reduce and other ecosystem tools.
- Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV, JSON.
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Implemented daily workflow for extraction, processing and analysis of data with Oozie.
Environment: Hadoop, Hive, HQL, HDFS, MapReduce, Sqoop, Flume, Oozie, Python, Java, Maven, Eclipse, Putty, Cloudera Manager 4 and CDH 4.
Confidential, St Louis, MO
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Map Reduce, Pig, Hive and Sqoop.
- Responsible for Installation and configuration of Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Extensively used Pig for data extraction and loading of data.
- Involved in Importing and exporting data from MySQL database into HDFS using Scoop.
- Batch Indexing with MapReduce on HDFS as pre-processing.
- Developed OozieWorkflows for daily incremental loads, which gets data from Teradata and then imported into Hive tables.
- Implemented optimized map side join in map reduce to handle different data sets.
- Implemented custom writable, comparators and custom input format in MapReduce to handle business specific format.
- Experienced in handling Avro data files in MapReduce programs using Avro data serialization system.
- Experienced in working different MapReduce Design Patterns to handle specific problems in MapReduce programs.
- Experienced in handling small files problems using Sequence files and combine file input formats.
- Used different compression techniques like LZO, Snappy and GZip formats to make optimum utilization of the network bandwidth.
- Filtered, transformed and combined data from multiple providers using custom Pig UDFs.
- Implemented various requirements using Pigscripts as part of migrating ETL applications.
- Responsible to manage data coming from different sources.
- Experiences with working different Hive SerDe that handle file formats like Avro, xml.
- Loading the data from the different Data sources into HDFS using Sqoop and load into Hive tables, which are partitioned.
- Installed and configured Hive and also implemented various Hive UDFs to validate against business rules before data has been moved to Hive tables.
- Created tables, static partitions and dynamic partitions in Hive.
- Good experience with Hive Query Language(HiveQL), Hive Security and debugging Hive issues.
- Developed HQL queries to implement the select, insert, update and operations to the database by creating HQL named queries.
- Implemented API tool to handle streaming data using Flume.
Environment: Hadoop, Cloudera(CDH 4),HDFS, Hive, Flume, Sqoop, Pig, Scala, Java, Eclipse, Teradata, MySQL, Ubuntu, UNIX, and Maven.
Confidential, NY
Java Hadoop Developer
Responsibilities:
- Involved in design low level design documents for functional and nonfunctional requirements.
- Developed MapReduce programs in Java and Sqoop the data from ORACLE database.
- Involved in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml and Avro.
- Have solid understanding of REST architecture style and its application to well performing web sites for global usage.
- Experienced in using HBase, REST API to perform CRUD operations on HBase data.
- Experienced in creating data model and implement queries to handle time series data with HBase data.
- Integrated HBase with Map Reduce to move bulk amount of data into HBase.
- Used Flume to collect the log data from different resources and transfer the data type to hive tables using different SerDa's to store in JSON, XML and Sequence file formats.
- Imported data using Sqoop to load data from Oracle to HDFS on regular basis.
- Developed multiple MapReduce jobs in Pig and Python for data cleaning and processing.
- Developed Pig scripts and UDFs extensively for Value added Processing (VAPS).
- Design and developed Custom Avro Storage to use in Pig Latin to load and Store data.
- Installed and configured Hive and also implemented various business requirements by writing Hive UDFs.
- Developed Pig scripts to convert the data from Avro to Text file format.
- Worked on partitioning the HIVE table and running the scripts in parallel to reduce the run time of the scripts.
- Written Hive queries for data analysis to meet the business requirements.
- Creating Hive tables and working on them using HiveQL.
- Developing Scripts and Batch Job to schedule various Hadoop Program using Oozie.
- Importing and exporting data into HDFS from Oracle Database and vice versa using Sqoop.
- Involved in preparation of docs like Functional Specification document and Deployment Instruction documents.
- Fix defects as needed during the QA phase, support QA testing, troubleshoot defects and identify the source of defects.
Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, HBase, Linux, XML, Java, Python, Eclipse, Oracle, PL/SQL, Java (jdk1.5), J2EE, WebSphere 6.1, IBM RAD 7.5, Rational ClearCase 7.0, XML, JAXP, XSL, XSLT, XML Schema(XSD), WSDL 2.0, SAML 2.0, AJAX 1.0, Web Services, SOA, JSP 2.2, CSS, Servlets.
Confidential
Java Application Developer
Responsibilities:
- Assisted in designing and programming for the system, which includes development of Process Flow Diagram, Entity Relationship Diagram, Data Flow Diagram and Database Design.
- Designed front end components using JSF.
- Involved in developing Java APIs, which communicates with the Java Beans.
- Implemented MVC architecture using Java, Custom and JSTL tag libraries.
- Involved in development of POJO classes and writing Hibernate query language (HQL) queries.
- Implemented MVC architecture and DAO design pattern for maximum abstraction of the application and code reusability.
- Created Stored Procedures using SQL/PL-SQL for data modification.
- Used XML, XSL for Data presentation, Report generation and customer feedback documents.
- Used Java Beans to automate the generation of Dynamic Reports and for customer transactions.
- Developed JUnit test cases for regression testing and integrated with ANT build.
- Implemented Logging framework using Log4J.
- Involved in code review and documentation review of technical artifacts.
Environment: J2EE/Java, JSP, Servlets, JSF, Hibernate, spring, JavaBeans, XML, XSL, HTML, DHTML, JavaScript, CVS, JDBC, Log4J, Oracle 9i, IBM WebSphere Application Server.
Confidential
Java Application Developer
Responsibilities:
- Created Use case, Sequence diagrams, functional specifications and User Interface diagrams using Star UML.
- Involved in complete requirement analysis, design, coding and testing phases of the project.
- Participated in JAD meetings to gather the requirements and understand the End Users System.
- Developed user interfaces using JSP, HTML, XML and JavaScript.
- Generated XML Schemas and used XML Beans to parse XML files.
- Created Stored Procedures & Functions. Used JDBC to process database calls for DB2/AS400 and SQL Server databases.
- Developed the code which will create XML files and Flat files with the data retrieved from Databases and XML files.
- Created Data sources and Helper classes which will be utilized by all the interfaces to access the data and manipulate the data.
- Developed web application called iHUB (integration hub) to initiate all the interface processes using Struts Framework, JSP and HTML.
- Developed the interfaces using Eclipse and JBoss Involved in integrated testing, Bug fixing and in Production Support.
Environment: Java, Servlets, JSPs, Java Mail API, Javascript, HTML, MySQL, Swing, Java Web Server, JBoss, RMI, Rational Rose, Red Hat Linux.