Hadoop Developer Resume
Elsegundo, CA
SUMMARY
- Around 8 years of experience in design and development of various projects based on Java, J2EE and including4+years of experience in Big Data.
- Excellent understanding of Hadoop architecture and its components such as HDFS, Job tracker, Task tracker, Name node, Data node and MapReduce programming paradigm.
- Strong experience inETLTesting,HadoopTesting, DB Testing, and Manual Testing.
- Hands on experience in writing Map Reduce jobs in Java, Pig and Hive.
- Good working experience on using Sqoop to import data into HDFS from data store.
- Experience on Horton works and Cloudera distributions.
- Expertise in job scheduling and monitoring using Oozie.
- Good Knowledge on NoSQL databases such as HBase, Cassandra and Mongo DB.
- Have good experience in extracting and generatingstatistical analysisusing Business Intelligence toolTableaufor better analysis of data.
- Experience in developing Client side Web applications using HTML, CSS/CSS3, JSP, Jquery, AngularJS, JSON, AJAX, and Custom Tags while implementing the client side validations using JavaScript(EXT JS) and Server side validations using Struts Validations Framework.
- Hands - on experience with message brokers such as Apache Kafka, IBM WebSphere, and RabbitMQ.
- Experience migrating MapReduce programs into Spark transformations using Spark andScala.
- Good understanding of ObjectOriented Analysis, Collections frameworks, Designand UML notations.
- Strong with Unix and Linux development.
- Have knowledge on developingHadoopstreaming Map/Reduce works usingPython.
- Have good knowledge and experience onETL tools like Talend.
- Strong development experience in Message Oriented and Service Oriented Technologies like WSDL/SOAP and SOA (Web Services) and RESTful API.
- Proficiency in Database Programming using DB2, Oracle, SQL Server and MySQL creating stored procedures, Triggers, Indexes, Functions, Views, Joins etc and ability to write complex SQL queries and analyzing data.
- Experience using Maven and ANT build tools.
- Extensively worked in Agile, TDD, Scrum development methodology and waterfall model of software development life cycle (SDLC).
- Have knowledge on IBM Big Insights.
- An excellent team worker with the ability to take the lead where necessary.
- Self-motivated, delivery focused with the ability to work independently where required.
TECHNICAL SKILLS
Hadoop Technologies: HDFS. MapReduce, Pig, Hive, Sqoop, Kafka, ZooKeeper, Oozie and Spark, Scala, Yarn, Hue
Hadoop Distributions: Apache, Cloudera, Hortonworks
NoSQL Databases: Cassandra, Mongo DB and HBase
Languages: Core Java, python, T-SQL/PL-SQL
Web Technologies: HTML,CSS, XML, AJAX,DOM Parser, SAX Parser.
Scripting languages: Shell scripting, java Scripting
Databases: SQL Server 2000/2005/2008, Oracle 8/9i/10g and MySQL
Web/App Servers: IIS, Tomcat and Web Logic Server
Operating Systems: Windows XP/2000/vista/7, Unix and Linux
Tools: MAVEN, Ant, XMLSPY, Log4J, Lotus Notes, SharePoint, Remedy, PVCS, Clear Case, Tortoise, Jira, SOAP UI, Putty
PROFESSIONAL EXPERIENCE
Confidential, Elsegundo,CA
Hadoop Developer
Responsibilities:
- Developed Spark scripts by writing custom RDDs in Scala for data transformations and perform actions on RDDs
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark - SQL, Data Frames and Pair RDD's.
- Used Hive to form an abstraction on top of structured data resides in HDFS and implemented Partitions, Dynamic Partitions, Bucketson HIVE tables.
- Moved data from HDFS to RDBMS and vice-versa using SQOOP.
- Performed Analyzing/Transforming data with Hive and Pig.
- Hive QL scripts to create, load, and query tables in a Hive.
- Experience on ETL development using Kafka, Flume, and Sqoop.
- File system management and monitoring, Managed and review Hadoop log files
- Extended Pig and Hive core functionality by writing custom UDFs
- Excellent working experience on SQL&PL/SQL and Oracle.
- Experience in using Sequence files, RC, ORC and Avrofile formats and compression techniques.
- Performed data completeness, correctness, data transformation & data quality testing using available tools & techniques
- Have good knowledge on NoSQL databases like HBase, Cassandra and MongoDB..
- Experience in UNIX shell scripting and has good understanding of OOPS and Data structures.
- Worked on Tableau to build customized interactive reports, worksheets and dashboards.
- Implemented Kerberos for strong authentication to provide data security
- Comprehensive knowledge and experience in process improvement, normalization/de- normalization, data extraction, data cleansing, data manipulation on HIVE.
Environment: Hadoop, HDFS, Hive, Impala, Spark,scala, kafka,Informatica, shell scripting, UNIX,Teradata, Oracle, HBase, Sqoop, PIG, Hue, oozie, Avro, Tws,Json, Serde, CDH 5.4, Tez, Zookeeper,Rdbms, Github, Code Cloud, YARN.
Confidential, Providence, RI
Hadoop Developer
Responsibilities:
- ECV for the History data: The data for the history will be served from data warehouse house Teradata. The customer information might be serviced from Mainframe tables. The high level approach is to bring the data from Teradata & DB2 to Hadoop and prepare the ECV.
- Daily delta aggregation with history: The approach for delta processingis to consume the source raw file and aggregate with the ECV layer.
- Implemented complex scoring of bank accounts utilizing Map/Reduce paradigm and Hadoop ecosystem.
- DevelopedETLtest scripts based on technical specifications/Data design documents and Source to Target mappings.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Developed a data pipeline using Kafka and Spark to store data into HDFS.
- Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology.
- Used Sqoop to import the data from Oracle to Hadoop Distributed File System (HDFS) and then export the data back into an Oracle.
- Worked on Cloudera to analyze data present on top of HDFS.
- Did various performance optimizations like using distributed cache for small datasets, Partition; Bucketing in Hive.
- Involved in creating Hive tables, then applied HiveQL on those tables for data validation.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
- Preprocessing files usingScala-Spark.
- Monitoring and managing daily jobs, Processing around 200k files per day and monitoring those throughRabbitMQand Apache Dashboard application
- Assisted application teams in installing Hadoop updates, operating system, patches and version upgrades when required.
- Developed Hive UDF’s for the customized functionalities..
- Build and maintain scalable data pipelines using theHadoopecosystem and other open source components like Hive and HBase.
- Created the supporting documents related to the process involved in the project.
Confidential, Hartford, CT
ETL/ Hadoop Developer
Responsibilities:
- Development of Hadoop Map Reduce programs.
- Moving all the log information into HDFS.
- Retrieved data using HQL from Hive.
- Grouping the same insurances by analyzing the messages.
- Written Map Reduce code to convert semi Structured Data to Structured data.
- Developed a Framework that will create external and manageable tables in a batch processing based on the metadata files.
- Wrote multiple MapReduce programs in Scala for data cleaning from multiple file formats.
- Developed/ Maintained ETL process to move data between Oracle and Cloudera HDFS/Hive.
- Worked with different data format such as AVRO, JSON, XML, Parquet, CSV and more
- Create ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.
- Validating the load process ofETLto make sure the target tables are populated according the data mapping provided that satisfies the transformation rules.
- Created Talend jobs to populate the data into dimensions and fact tables.
- Successfully designed and developed a solution for speeding up a SQL Job using Hadoop Map-Reduce framework. Processing time was reduced from 12 hours to 20 Minutes.
- Written extensive Hive queries and fine-tuned them for performance as part of the multiple step process to get the required results for Tableau to generate reports.
- Migrating various hive UDF’s and queries into Spark SQL for faster requests as part of POC implementation.
- Created and Implemented highly scalable and reliable highly scalable and reliable distributed data design using NoSQL/Cassandra technology.
- Set up Sqoop jobs to import data from various data to various other systems.
- Involved in migrating the data from development cluster to QA cluster and from there to production cluster.
- Did extensive research on various NoSQL solutions and Algorithms.
- Created the developer Unit test plans and executed testing in the development cluster.
Environment: JDK 1.5, Struts 1.3, JSP, Agile, ETL, Crunch API, ApacheSolr, HTML, XML, JavaScript, AJAX, Talend, Hadoop distribution of Hortonworks, Scala, Cloudera, Avro, Shell, Linux, Pig, Hive HQL, Cassandra,MapReduce,Spark, HBase, Sqoop, Oozie, Ganglia and Flume, NOSQL.
Confidential, Atlanta, GA
Java Developer
Responsibilities:
- Worked on Gradle build tool for build the war file for Customer Gateway.
- Used JBoss EAP6.0.1 for deploy and configured the CG application.
- Used Confluence repository for saving Customer Gateway documents and files.
- Worked on Rest API and Customer Gateway application uses HTTP basic authentication for its entire set of APIs.
- Moved current functionality that was connecting to the database using JPA to access data through REST calls.
- Worked on JDBC for create/close database connections.
- Worked on Rest client that uses HTTP client.
- Used application/xml for API supports both inbound and outbound response data bindings.
- Worked on SOAP1.2 web services for consume and produce external system uses Confidential communication.
- Worked on Collection framework (Map/List) to set and get the queryParams (CG).
- Worked on ORACLE 10g for storing and retrieving the data from database.
- Worked on UNIX machines for Deploy/Configure the JBoss EAP server to build the CG application.
- Worked on Server Tuning for increase the heap size for UNIX machines.
- Handled offshore team for implement Customer gateway design/architecture.
- Worked on Pl/SQL for query and fetch the data from database.
Environment: Core Java, J2EE, Log4J, JUnit, Git, SOA, Web Services, JBoss EAP 6.0.1, Oracle10g, UNIX, Hibernate, Rest Client, PL/SQL.
Confidential
Java Developer
Responsibilities:
- Designed Use cases, activities, states, objects and components.
- Developed the UI pages using HTML, DHTML,Javascript, AJAX, JQUERY, JSP and tag libraries.
- Developed front-end screens using JSP and Tag Libraries.
- Performing validations between various users.
- Design ofJavaServlets and Objects using J2EE standards.
- Coded HTML, JSP and Servlets.
- Developed internal application using Angular andNode.JS connecting to Oracle on the backend.
- Coding xml validation and file segmentation classes for splitting large XML file into smaller segments using SAX Parser.
- Created new connections through application coding for better access to DB2 database and involved in writing SQL & PL SQL - Stored procedures, functions, sequences, triggers, cursors, object types etc.
- Implemented application using Struts MVC framework for maintainability.
- Involved in testing and deploying in the development server.
- Prepared design document forjavacomponents.
- Wrote oracle stored procedures (PL/SQL) and calling it using JDBC.
- Involved in the design of tables of the database in Oracle.
Environment: Java, J2ee, Apache Tomcat, CVS, JSP, Servlets, Struts, PL/SQL and Oracle.