Hadoop Develop Resume
Fort Worth, TX
SUMMARY
- Over 7+ years of IT experience, which includes solid experience in Hadoop Ecosystem.
- Experienced Hadoop Developer has a strong background with file distribution systems in a big - data arena. Understands the complex processing needs of big data and has experience developing codes and modules to address those needs.
- Experience in working on CDH3 and CDH4 cloudera distributions.
- Strong expertise on BIGDATA data modeling techniques with Hive, Hbase.
- Experience in developing PIG Latin Scripts and using Hive Query Language.
- Expertise on Hive SERDE parser for unstructured data analysis.
- Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
- Developed Java Map Reduce programs on log data to transform into structured way to find user location, age group, spending time.
- Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
- Experience in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP.
- Experience working on NoSQL database Hbase.
- Experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Proficiency working with various data sources: RDBMSs, Web services and other data sources.
- Experience in developing Map Reduce Programs using ApacheHadoopfor analyzing the big data as per the requirement.
- Good knowledge in java topics such as Generics, Collections and multi-threading.
- Hands on experience working on Amazon Web services (AWS).
- Experience in data-warehousing with ETL tools like Data stage, Abinitio.
- Advanced experience with HDFS, MapReduce, Hive, Hbase, Zookeeper, Impala, Pig, and Flume, Oozie.
- Understanding and ability to use SQL, XML, JSON and UNIX.
- Good knowledge on Apache Spark, Storm, Kafka.
- Good understanding of workload management, schedulers, scalability and distributed platform architectures.
- Design, develop and maintain solutions for Terabyte scale data analytics.
- Experience working with Hadoop/ HBase/ Hive/ MRV1/ MRV2.
- Knowledge I transaction Java applications(API-thru-database)
- Prepared test case scenarios and internal documentation for validation and reporting.
- Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.
- Experience designing, implementing, and supporting Java/J2EE application modules
- Experience with a variety of Java Frameworks(Spring, Struts, etc)
- Knowledge of Hadoop GEN2 Federation, High Availability and YARN architecture.
- Strong expertise in design and implementation of extraction and transformation (ETL) process.
- Experience with SOA Architectures (Rest, SOAP, etc)
- Experience in collecting business requirements, writing functional requirements and test cases and in creating technical design documents with UML - Use Cases, Class, and Sequence and Collaboration diagrams..
- HDFS, OOZII, Zookeeper administration and configuration experience.
- Apache Pig and Spark configuration and administration experience.
TECHNICAL SKILLS:
Data Management Databases: Oracle 10G/11G
Hadoop/Big Data: Hadoop 0.20.2-cdh3u3,HDFS 0.20.2, Map Reduce 0.20.2, Hbase 0.90.4, Pig0.8.1, Hive 0.7.1, Impala 1.2, Sqoop1.3.0, Flume0.9.4, Spark, Scala, Cassandra, Oozie 2.3.2, HUE 1.2.0.0, Zookeeper 3.3.3, YARN,Cluster Build, MYSQL, Data Meer, R-Analytics, Cloudera Manager 3.7.x,CLoudera Manager 4.7.x, CDH 4.6, Cloudera Manager 4.8.2
Methodologies & Standards: Software Development Lifecycle (SDLC)
Programming Languages: Java, C, PL SQL, Shell Scripting, Pig Latin, Elastic Search, HiveQL
Operating Systems: Windows XP, Windows 2000 Server, Unix, Linux 5.6
PROFESSIONAL EXPERIENCE
Confidential, Fort Worth, TX
Hadoop Develop
Responsibilities:
- Preparation of Vendor Questionnaire to capture the Vendor product features and advantages with Hadoop cluster.
- Involved in design and Implementation of proof of concept for the system to be developed on BIGDATA Hadoop with Hbase, HIVE, Pig, and Flume.
- Used Hbase for real time searching on log data and PIG, HIVE, MapReduce for analysis.
- Managed and reviewed Hadooplog files
- Used Flume to publish logs to Hadoop in real time.
- Worked with business teams and created Hive queries for ad hoc access.
- Involved in loading data from UNIX file system to HDFS. Automated the steps to load log files into Hive
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Used HUE to save Hive queries for each required report and for downloading the query results as csv or excel.
- Conduct interviews with subject matter experts and document the features to be included in the system.
- Used Pig for data cleansing.
- Created partitioned tables in Hive.
- Developed Hive SerDe for parsing send email logs.
- Wrote Hive UDF to extract date from the time in seconds.
- Involved in installing the Hive, Hbase, PIG, Flume and other Hadoop ECO system software.
- Involved in creating 12 data node Hadoop clusters for POC.
Environment: Java 6, Eclipse, Linux 5.x, CDH3, CDH4.x, Sqoop, Pig, Hive 0.71, Flume, UNIX Shell Scripting, HUE, WinSCP, MYSQL 5.5, Scala.
Confidential, Atlanta, GA
Hadoop Developer
Responsibilities:
- Responsible for coding MapReduceprogram,Hivequeries, testing and debugging the MapReduce programs.
- Responsible for Installing, Configuring and Managing of Hadoop Cluster spanning multiple racks.
- DevelopedPiglatin scripts in the areas where extensive coding needs to be reduced to analyze large data sets.
- UsedSqooptool to extract data from a relational database intoHadoop.
- Design, develop and maintain services and interfaces to allow for cross product communication, data analytics, reporting, and management.
- Involved in performance enhancements of the code and optimization by writing custom comparators and combiner logic.
- Worked closely with data warehouse architect and business intelligence analyst to develop solutions.
- Good understanding of job schedulers like Fair Scheduler which assigns resources to jobs such that all jobs get, on average, an equal share of resources over time and an idea about Capacity Scheduler.
- Responsible for performing peer code reviews, troubleshooting issues and maintaining status report.
- Involved in creating Hive Tables, loading with data and writing Hive queries, which will invoke and run MapReduce jobs in the backend.
- Involved in identifying possible ways to improve the efficiency of the system. Involved in the requirement analysis, design, developmentand Unit Testing use ofMRUnitandJunit
- Prepare daily and weekly project status report and share it with the client.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
Environment: Apache Hadoop, Java (JDK 1.6), Oracle, My SQL, Hive, Pig, Sqoop,Linux, Cent OS, Junit, MRUnit, Hbase.
Confidential, CA
Hadoop Developer
Responsibilities:
- Created Hive Tables, loaded retail transactional data from Teradata using Scoop.
- Loaded home mortgage data from the existing DWH tables (SQL Server) to HDFS using Scoop.
- Wrote Hive Queries to have a consolidated view of the mortgage and retail data.
- Data is loaded back to the Teradata for the BASEL reporting and for the business users to analyze and
- Visualize the data using Data Meer.
- Orchestrated hundreds of sqoop scripts, pig scripts, hive queries using oozie workflows and sub-workflows.
- Loaded the load ready files from mainframes to Hadoop and files were converted to ascii format.
- Developed pig scripts for replacing the existing home loans legacy process to the Hadoop and the data is back fed to retail legacy mainframes systems.
- Developed MapReduce programs to write data with headers and footers and Shell scripts to convert the data to fixed-length format suitable for Mainframes CICS consumption.
- Used Maven for continuous build integration and deployment.
- Agile methodology was used for development using XP Practices (TDD, Continuous Integration).
- Participated in daily scrum meetings and iterative development.
- Exposure to burn-up, burn-down charts, dashboards, velocity reporting of sprint and release progress.
Environment: Java1.3, EJB, Java Script, HTML, XML, Rational Rose, Microsoft Visio, Swings, JSP, Servlets, JNDI, JDBC, SQL, Oracle8i,Tomcat 3.1.
Software Engineer
Responsibilities:
- Associated in designing application using MVC design pattern.
- Developed front-end user interface modules by using HTML, XML, Java AWT, and Swing.
- Front-end validations of user requests carried out using Java Script.
- Designed and developed the interactingJSPs and Servlets for modules like User Authentication and Summary Display.
- Designed and developedEntity/Session EJB components for the primary modules.
- Java Mailwas used to notify the user of the status and completion of the request.
- Developed Stored Procedures on Oracle 8i.
- Implemented Queries using SQL (database triggers and functions).
- JDBC was used to interface the web-tier components on the J2EE server with the relational database.
- Gathered requirements for the project and involved in analysis phase.
- Worked on minor enhancements using core Java.
- Involved in writingSQLqueries.
- Used stored procedures, triggers, cursors, packages, Anonymous PL/SQL to store, retrieve, delete and update the database tables by using PL/SQL.
- Used technologies likeJDBCfor accessing related data from database
- Created UML class and sequence diagrams using Rational Rose.
Environment: Java, Oracle, PL/SQL.