Hadoop Developer Resume
Sunnyvale, CA
SUMMARY:
- IT professional with 8+ years of experience in Analysis, Design, Development, Integration, Testing and maintenance of various applications using JAVA /J2EE technologies along with 3 + years of Big Data /Hadoop experience.
- Experienced in building highly scalable Big - data solutions using Hadoop and multiple distributions i.e. Cloudera, Horton works and NoSQL platforms (Hbase & Cassandra).
- Expertise in big data architecture with Hadoop File system and its eco system tools Map Reduce, HBase, Hive, Pig, Zookeeper, Oozie, Flume, Avro, Impala and Apache spark.
- Hands on experience on performing Data Quality checks on petabytes of data
- Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Good knowledge on Amazon AWS concepts like EMR & EC2 web services which provides fast and efficient processing of Big Data.
- Developed, deployed and supported several Map Reduce applications in Java to handle semi and unstructured data.
- Experience in writing Map Reduce programs and using Apache Hadoop API for analyzing the data.
- Strong experience in developing, debugging and tuning Map Reduce jobs in Hadoop environment.
- Experienced in working with Ab intitio.
- Expertise in developing PIG and HIVE scripts for data analysis
- Hands on experience in data mining process, implementing complex business logic and optimizing the query using Hive QL and controlling the data distribution by partitioning and bucketing techniques to enhance performance.
- Expertise in using Apache Hcat with different big data processing tools.
- Experience working with Hive data, extending the Hive library using custom UDF's to query data in non- standard formats
- Experience in performance tuning of Map Reduce, Pig jobs and Hive queries
- Involved in the Ingestion of data from various Databases like TERADATA( Sales Data Warehouse), AS400, DB2, SQL-SERVER using Sqoop
- Experience working with Flume to handle large volume of streaming data.
- Good working knowledge on Hadoop hue ecosystems.
- Extensive experience in migrating ETL operations into HDFS systems using Pig Scripts.
- Good knowledge in evaluating big data analytics libraries (MLlib) and use of Spark-SQL for data exploratory.
- Experienced in using Apache ignite for handling streaming data.
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in scala.
- Experience in implementing a distributed messaging queue to integrate with Cassandra using Apache kafka and zookeeper.
- Expert in creating and designing data ingest pipelines using technologies such as spring Integration, Apache Storm-kafka
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map Reduce and Pig jobs.
- Good knowledge in using OCR for kofax capture.
- Worked with different File Formats like TEXTFILE, AVROFILE, ORC for HIVE Querying and Processing
- Experienced in working with Apache Ambari.
- Experienced in working with Apache Accumulo.
- Used Compression Techniques (snappy ) with file formats to leverage the storage in HDFS
- Working knowledge in Hadoop HDFS Admin Shell commands.
- Developed core modules in large cross-platform applications using JAVA, J2EE, Hibernate, Python, Spring, JSP, Servlets, EJB, JDBC, JavaScript, XML, and HTML.
- Experienced with build tools Maven, ANT and continuous integrations like Jenkins.
- Working Knowledge in configuring and monitoring tools like Ganglia and Nagios.
- Hands-on experience in using relational databases like Oracle, MySQL, PostgreSQL and MS-SQL Server.
- Extensive experience in developing and deploying applications using Web Logic, Apache Tomcat and JBOSS.
- Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.
- Experienced with version controller systems like SVN, Clear case.
- Experience using IDEs tools Eclipse 3.0, My Eclipse, RAD and NetBeans
- Hands on development experience with RDBMS, including writing SQL queries, PLSQL, views, stored procedure, triggers, etc.
- Participated in all Business Intelligence activities related to data warehouse, ETL and report development methodology
- Expertise in Waterfall and Agile software development model & project planning using Microsoft Project Planner and JIRA.
- Highly motivated, dynamic, self-starter with keen interest in emerging technologies
TECHNICAL SKILLS:
Big Data Technologies: HDFS, Map Reduce, Hive, Hcat, Pig, Sqoop, Flume, Oozie, Avro, Hadoop Streaming, Zookeeper, Kafka, Impala, Apache Spark, hue, Ambari. Apache ignite.
Hadoop Distributions: Cloudera (CDH4/CDH5),Horton Works
Languages: Java, C, SQL, PYTHON,PL/SQL,PIG-Latin, HQL
IDE Tools: Eclipse, NetBeans, RAD
Framework: Hibernate, Spring, Struts, Junit
Web Technologies: HTML5, CSS3, JavaScript, JQuery, AJAX, Servlets, JSP,JSON, XML, XHTML, JSF, Angular JS
Web Services: SOAP,REST, WSDL, JAXB, and JAXP
Operating Systems: Windows (XP,7,8), UNIX, LINUX, Ubuntu, CentOS
Application Servers: Jboss, Tomcat, Web Logic, Web Sphere
Reporting Tools /ETL Tools: Tableau, Power view for Microsoft Excel, Informatica
Databases: Oracle, MySQL, DB2, Derby, PostgreSQL, No-SQL Database (Hbase, Cassandra)
PROFESSIONAL EXPERIENCE:
Confidential, Sunnyvale, CA
Hadoop Developer
Responsibilities:
- Evaluated business requirements and prepared Detailed Design documents that follow Project guidelines and SLAs required procuring data from all the upstream data sources and developing written programs.
- Developed and implemented API services using Python in spark.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
- Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Maintained and administrated HDFS through Hadoop - Java API, shell scripting, Python,
- Used Python for writing script to move the data across clusters.
- Created Hive External tables and loaded the data into tables and query data using HQL.
- Installing and maintaining the Hadoop - Spark cluster from the scratch in a plain Linux environment and defining the code outputs as PMML.
- Experience in integrating Cassandra with Elastic Search and Hadoop.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processed the data with Pig.
- Develop Shell scripts to automate routine DBA tasks (i.e. database refresh, backups, monitoring)
- Tuned/Modified SQL for batch and online processes
Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Java, Linux, Maven, Teradata, Zookeeper, SVN, autosys, Hbase, Cassandra, Python,Spark
Confidential, Frankfort, KY
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Written multiple Map Reduce programs in Java for Data Analysis
- Wrote Map Reduce job using Pig Latin and Java API
- Performed performance tuning and troubleshooting of Map Reduce jobs by analyzing and reviewing Hadoop log files.
- Developed pig scripts for analyzing large data sets in the HDFS.
- Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume
- Designed and presented plan for POC on impala.
- Experienced in migrating Hive QL into Impala to minimize query response time.
- Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment.
- Implemented Avro and parquet data formats for apache Hive computations to handle custom business requirements.
- Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Performed extensive Data Mining applications using HIVE.
- Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using autosys and Oozie coordinator jobs.
- Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
- Responsible for performing extensive data validation using Hive
- Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
- Utilized Storm for processing large volume of datasets.
- Used Kafka to load data in to HDFS and move data into NoSQL databases(cassandra)
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Involved in submitting and tracking Map Reduce jobs using Job Tracker.
- Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
- Responsible for cleansing the data from source systems using Ab Initio components such as Join, Dedup Sorted, De normalize, Normalize, Reformat, Filter-by-Expression, Rollup
- Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
- Exported data to Tableau and excel with Power view for presentation and refining
- Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources
- Implemented Hive Generic UDF's to implement business logic.
- Coordinated with end users for designing and implementation of analytics solutions for User Based Recommendations using R as per project proposals.
- Implemented test scripts to support test driven development and continuous integration.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Java, Linux, Maven, Teradata, Zookeeper, SVN, autosys, Tableau, Hbase, Cassandra, Apache ignite
Confidential - Albuquerque, NM
Hadoop Developer
Responsibilities:
- Worked on writing transformer/mapping Map-Reduce pipelines using Java.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
- Designed and implemented Incremental Imports into Hive tables.
- Worked in Loading and transforming large sets of structured, semi structured and unstructured data
- Deployed an Apache Solr search engine server to help speed up the search of the government cultural asset.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Experienced in managing and reviewing the Hadoop log files.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Implemented the workflows using Apache Oozie framework to automate tasks
- Worked with Avro Data Serialization system to work with JSON data formats.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
- Developed scripts and automated data management from end to end and sync up between all the clusters.
- Involved in Setup and benchmark of Hadoop /HBase clusters for internal use.
- Setup Hadoop cluster on Amazon EC2 using whirr for POC
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing pig Scripts.
Environment: : Hadoop, Big Data, HDFS, Map Reduce, Sqoop, Oozie, Pig, Hive, hbase, Flume, LINUX, Java, Eclipse, Cassandra, Hadoop Distribution of Cloudera., PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting, Putty and Eclipse
Confidential
Java Developer
Responsibilities:
- Worked with business analyst in understanding business requirements, design and development of the project.
- Implemented the Struts frame work with MVC architecture.
- Created new JSP's for the front end using HTML, Java Script, Jquery, and Ajax.
- Developing JSP pages and configuring the module in the application.
- Developed the presentation layer using JSP, HTML, CSS and client side validations using JavaScript.
- Involved in designing, creating, reviewing Technical Design Documents.
- Developed DAOs (Data Access Object) using Hibernate as ORM to interact with DBMS - Oracle.
- Collaborated with the ETL/ Informatica team to determine the necessary data models and UI designs to support Cognos reports.
- Developed Ab Initio graph that uses Java code to decompress the compressed PDF file and stored them in a directory.
- Performed several data quality checks and found potential issues, designed Ab Initio graphs to resolve them
- Applied J2EE design patterns like Business Delegate, DAO and Singleton.
- Deployed and tested the application using Tomcat web server.
- Using java scripts did client side validation.
- Involved in developing DAO's using JDBC.
- Involved in coding, code reviews, JUnit testing, Prepared and executed Unit Test Cases.
- JBOSS for application deployment and MySQL for database
- Worked with QA team in preparation and review of test cases.
- JUnit was used for unit testing for the integration testing tool.
- Writing SQL queries to fetch the business data using Oracle as database.
- Developed UI for Customer Service Modules and Reports using JSF, JSP's and My Faces Components
- Log4j used for logging the application log of the running system to trace the errors and certain automated routine functions.
- CVS was used as configuration management tool.
Environment: : Java, JSP, JavaScript, Servlets, Struts, Hibernate, EJB, JSF, JSP, Ant, Tomcat, CVS, Eclipse, SQL Developer, Oracle.
Java Developer
Confidential
Responsibilities:
- Developed the application using Struts Framework that leverages classical Model View Layer (MVC) architecture UML diagrams like use cases, class diagrams, interaction diagrams (sequence and collaboration) and activity diagrams were used
- Gathered business requirements and wrote functional specifications and detailed design documents
- Extensively used Core Java, Servlets, JSP and XML
- Wrote AngularJS controllers, views, and services.
- Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 9i database
- Implemented Enterprise Logging service using JMS and apache CXF.
- Developed Unit Test Cases, and used JUNIT for unit testing of the application
- Implemented Framework Component to consume ELS service.
- Involved in designing user screens and validations using HTML, jQuery, Ext JS and JSP as per user requirements
- Implemented JMS producer and Consumer using Mule ESB.
- Wrote SQL queries, stored procedures, and triggers to perform back-end database operations
- Sending Email Alerts to supporting team using BMC m send.
- Designed Low Level design documents for ELS Service.
Environment: Java, Spring core, JMS Web services, JMS, JDK, SVN, Maven, Mule ESB Mule, Junit,WAS7,Jquery, Ajax, SAX.
Jr. Java Developer
Confidential
Responsibilities
- Used Hibernate ORM tool as persistence Layer - using the database and configuration data to provide persistence services (and persistent objects) to the application.
- Implemented Oracle Advanced Queuing using JMS and Message driven beans.
- Responsible for developing DAO layer using Spring MVC and configuration XML’s for Hibernate and to also manage CRUD operations (insert, update, and delete).
- Implemented Dependency injection of spring frame work.
- Developed and implemented the DAO and service classes.
- Developed reusable services using BPEL to transfer data.
- Participated in Analysis, interface design and development of JSP.
- Configured log4j to enable/disable logging in application.
- Wrote SPA (Single Page Web Applications) using RESTFUL web services plus Ajax and AngularJS.
- Developed Rich user interface using HTML, JSP, AJAX, JSTL, Java Script, JQuery and CSS.
- Implemented PL/SQL queries, Procedures to perform data base operations.
- Wrote UNIX Shell scripts and used UNIX environment to deploy the EAR and read the logs.
- Implemented Log4j for logging purpose in the application.
- Involved in code deployment activities for different environments.
- Implemented agile development methodology.
Environment: Java, Spring, Hibernate, JMS, EJB, Web logic Server, JDeveloper, SQL Developer, Maven, XML, CSS, JavaScript, JSON.
