Sr Hadoop Developer Resume
New York, NY
SUMMARY:
- Over 8 Years of progressive experience in all the phases of software development life cycle that includes requirement study, analysis, design, development, integration, re - engineering, maintenance, installation, implementation and testing of various client/server and N-tier web applications
- Over 4+ years of experience as Hadoop Consultant working on Hadoop, HDFS, MapReduce, Hadoop Ecosystem, Pig, Hive, Oozie, Hbase, Flume, Sqoop, ZooKeeper, Flume, Cloudera, Navigator.
- Hands on experience of Hadoop architecture and its components like HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, Journal Nodes, Resource Manager and MapReduce programming model.
- Experienced in installing, configuring, and administrating Hadoop cluster of major Hadoop distributions.
- Experience in Hadoop Distributions like Cloudera, HortonWorks, BigInsights, MapR Windows Azure, and Impala.
- Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
- Have hands on experience in writing MapReduce jobs using Java.
- Configured Zoo Keeper, Cassandra & Flume to the existing hadoop cluster.
- In-depth knowledge of working with Avro and Parquet formats.
- Hands-on experience with Hadoop applications (such as administration, configuration management, monitoring, debugging, and performance tuning).
- Experienced in developing Java, J2EE applications using Struts, Spring, and Hibernate.
- Experience with web-based UI development using jQuery UI, jQuery, ExtJS, CSS, HTML, HTML5, XHTML and Java script.
- Extensive experience on Java Technology includes JDBC, JSP, Servlets.
- Strong experience with XML and HTML5. Good understanding of SOA.
- Knowledge in Web services, SOAP, REST.
- Experience in developing web enterprise applications based on Struts.
- Good working knowledge on development tools Eclipse, and spring source tool suite.
- Experience in Building, Deploying and Integrating with Ant, Maven.
- Experienced in developing enterprise applications using open source Technologies such as Struts, Hibernate, Spring, and jUnit.
- Expertise with MySQl, Oracle including SQL, PL/SQL, Stored Procedures and functions
- Extensive experience in interpreting program specs from the low level design specs.
- Experience in varied platforms like Windows, UNIX, Linux
- Well acquainted with understanding the user requirements, preparing technical and functional specification document.
- Extensively involved in unit testing and preparing test plans.
- Excellent communication and presentation skills.
TECHNICAL SKILLS:
Big Data Technologies: HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Oozie, Cloudera CDH5, HiveQL, Avro, PigLatin
Languages: Java, XML, HTML/XHTML, HDML, DHTML, SQL, PL/SQL
Operating System: Mac OS X, Windows, CentOS, Ubuntu
Imaging System: McKesson Pacs, AGFA Pacs
Interface Engine: ConnectR
Ticketing System: Caretech Remedy
Databases: Oracle9i/10g/11g,, SQL Server 2005/2008 R2, MySql, HBase, DB2
Testing &Case Tools: Junit, Log4j, Rational Clear case, CVS, ANT, JBuilder.
WORK EXPERIENCE:
Confidential, New York, NY
Sr Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop
- Designed the projects using MVC architecture providing multiple views using the same model and thereby providing efficient modularity and scalability
- This project will downloads the data that was generated by sensors from the Patients body activities, the data will be collected in to the HDFS system online aggregators by Kafka.
- Kafka consumer will get the data from different learning systems of the patients.
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (Hbase).
- Estimated the hardware requirements for NameNode and DataNodes & planning the cluster.
- Experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions.
- Used Hadoop's Pig, Hive and Map Reduce for analyzing the Health insurance data to help by extracting data sets for meaningful information such as medicines, diseases, symptoms, opinions, geographic region detail etc.
- Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data, such as removing personal information or merging many small files into a handful of very large, compressed files using pig pipelines in the data preparation stage.
- Uses Pig in three distinct workloads like pipelines, iterative processing and research.
- Uses Pig UDF's in Python, Java code and uses sampling of large data sets.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume and process the files by using some piggybank.
- Extensively used PIG to communicate with Hive using HCatalog and HBASE using Handlers.
- Implemented MapReduce jobs to write data into Avro format.
- Created Hive tables to store the processed results in a tabular format.
- Good experience in PIG Latin scripting and Sqoop Scripting.
- Involved in transforming data from legacy tables to HDFS, and HBASE tables using Sqoop
- Implemented exception tracking logic using Pig scripts
- Implemented test scripts to support test driven development and continuous integration.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it
- Used different file formats like Text files, Sequence Files, JSON and Avro.
- Good understanding of ETL tools and how they can be applied in a Big Data environment
Environment: Hadoop, Map Reduce, Spark, shark, Kafka, HDFS, Zoo Keeper, Hive, Pig, Oozie, Core Java, Eclipse, Hbase, Sqoop, Flume, CDH- 5.3.0, Cloud era, Oracle 10g, UNIX Shell Scripting.
Confidential, Atlanta, GA
Hadoop Developer
Responsibilities:
- Worked with business teams and created Hive queries for ad hoc access.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Involved in review of functional and non-functional requirements
- Configured Hadoop cluster with Name Node and slaves and formatted HDFS.
- Responsible to manage data coming from different sources.
- Loaded daily data from websites to Hadoop cluster by using Flume.
- Involved in loading data from UNIX file system to HDFS.
- Creating Hive tables and working on them using Hive QL.
- Wrote MapReduce code to convert unstructured data to semi structured data.
- Used Pig to extract, transformation & load of semi structured data.
- Installed and configured Hive and also written Hive UDFs.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Created HBase tables to store variable data formats of data coming from different applications.
- Involved in transforming data from legacy tables to HDFS and HBASE tables using Sqoop.
- Cluster co-ordination services through ZooKeeper.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Migrated the needed data from MySQL into HDFS using Sqoop.
- Design and implement Map Reduce jobs to support distributed data processing.
- Supported MapReduce Programs those are running on the cluster.
- Exported the data from Avro files and indexed the documents in sequence file format.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
- Developed the Pig UDF'S to pre-process the data for analysis.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, Oozie, Cloud era, Pig, CDH- 5.3.0, HBase, Sqoop, Linux, XML, MySQL Workbench, Java 6, Eclipse, Oracle 10g, PL/SQL, SQL*PLUS.
Confidential, Plano, TX
Hadoop Developer /Admin
Responsibilities:
- Installed and configured Hadoop, Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Installed and configured Pig for ETL jobs.
- Troubleshooting the cluster by reviewing Hadoop LOG files.
- Imported data using Sqoop from Tera data using Tera data connector.
- Used Oozie to orchestrate the workflow.
- Creating Hive tables and working on them for data analysis in order to meet the business requirements.
- Created complex Hive tables and executed complex Hive queries on Hive warehouse.
- Designed and implemented MapReduce-based large-scale parallel relation-learning system.
- Installed and benchmarked Hadoop/HBase clusters for internal use.
- Written HBASE Client program in Java and web services.
- Model, serialize, and manipulate data in multiple forms (xml).
- Supported postproduction enhancements.
- Experience with data model concepts-star schema dimensional modeling Relational design (ER).
- Involved in HDFS maintenance and loading of structured and unstructured data.
- Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle 9i.Created User Interface using JSF.
- Involved in integration testing the Business Logic layer and Data Access layer.
- Integrated JSF with JSP and used JSF Custom Tag Libraries to display the value of variables defined in configuration files.
- Used technologies like JSP, JSTL, JavaScript and Tiles for Presentation tier
- Involved in JUnit testing of the application using JUnit framework.
Environment: Hadoop, MapReduce, HDFS, CDH- 5.3.0, Hive, Pig, HBase, Oozie, Sqoop, Java, Cloudera Linux, XML, MySQL, Java 6, Eclipse, Cassandra.
Confidential
Java Developer
Responsibilities:
- Developed the user interface screens using Swing for accepting various system inputs such as contractual terms, monthly data pertaining to production, inventory and transportation.
- Involved in designing Database Connections using JDBC.
- Involved in design and Development of UI using HTML, JavaScript and CSS.
- Involved in creating tables, stored procedures in SQL for data manipulation and retrieval using SQL Server 2000, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle.
- Developed the business components (in core Java) used for the calculation module (calculating various entitlement attributes).
- Involved in the logical and physical database design and implemented it by creating suitable tables, views and triggers.
- Created the related procedures and functions used by JDBC calls in the above components.
- Involved in fixing bugs and minor enhancements for the front-end modules.
Environment: JDK 1.3, Swing, JDBC, JavaScript, HTML, Resin, SQL Server 2000, Textpad, Toad, MS Visual SourceSafe, Windows 2000, HP UNIX.
Confidential
Java Developer
Responsibilities:
- Involved in Analysis, Design, Coding and Development of custom Interfaces.
- Involved in the feasibility study of the project.
- Gathered requirements from the client for designing the Web Pages.
- Participated in designing the user interface for the application using HTML, DHTML, and Java Server Pages (JSP).
- Involved in writing Client side Scripts using Java Scripts and Server Side scripts using Java Beans and used Servlets for handling the business.
- Developed the Form Beans and Data Access Layer classes.
- XML was used to transfer the data between different layers.
- Involved in writing complex sub-queries and used Oracle for generating on-screen reports.
- Worked on database interaction layer for insertions, updating and retrieval operations on data.
- Deployed EJB Components on Web Logic.
- Involved in deploying the application in test environment using Tomcat.
Environment: Java, J2EE, JSP, Servlets, EJB, Java Beans, JavaScript, JDBC, Web Logic Server, Oracle, HTML, XML, CSS, Eclipse, CVS, Windows 2000.