Sr.Hadoop Developer Resume Houston, TX - Hire IT People

PROFESSIONAL SUMMARY:

Overall 7+ years of IT experience in a variety of industries, which includes hands on experience on Big Data Analytics, and Development.
Experience in Big - Data Analytics with hands on experience in Data Extraction, Transformation, Loading and Data Analysis, Data Visualization using Cloudera Platform Map-Reduce, HDFS, Hive, Pig, Sqoop, Flume, HBase, Oozie, Yarn, Impala, Spark, Scala, Kafka.
Experience working with Cloudera & Hortonworks Distribution of Hadoop. Extensive knowledge about Zookeeper process for various types of centralized configurations.
Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and Map Reduce Programming paradigm.
Developed ETL test scripts based on technical specifications/Data design documents and source to target mappings.
Good Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
Involved in creating Hive tables, loading with data and writing Hive Adhoc queries that will run internally in MapReduce.
Experienced in using HBase, Pig, HDFS and OOZIE Operational Services for coordinating the cluster, scheduling workflows and time data processing.
Experienced in working with different data sources like Flat files, Spreadsheet files, log files and Databases.
Experienced in analyzing, designing and developing ETL strategies and processes, Writing ETL specifications.
Experience in writing Spark programs in Scala for Data Extraction, Transformation and Aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
Expertise in writing Spark Jobs in Scala for processing large sets of structured, semi-structured and store them in HDFS.
Experience in converting SQL queries into Spark Transformations using Spark RDDs and Scala Performed map-side joins on RDD's. Experience in creating Real-Time Data streaming solutions using Apache Spark Streaming.
Collected the JSON data from HTTP Source and developed Spark APIs that helps to do insert and updates in Hive tables. Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
Experienced with batch processing of data sources using Apache Spark and Scala.
Hands on experience working on NoSQL, MongoDB databases including HBase, Cassandra and its integration with Hadoop cluster.
Experience developing Scala applications for Loading/Streaming data from NoSQL databases (HBASE) and into HDFS.
Experienced in writing complex MapReduce programs that work with different file formats like Text Sequence, XML, JSON and Avro.
Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig. Experience working with Build tools like Maven and Ant.
Experienced in both Waterfall and Agile Development (SCRUM) methodologies
Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions

TECHNICAL SKILLS:

Hadoop Technologies: Apache Hadoop, Cloudera Hadoop Distribution (HDFS and Map Reduce)

Technologies: HDFS, YARN, MapReduce, Hive, Pig, Sqoop, Flume, Spark, Kafka, Zookeeper, and Oozie

Java/J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts.

NOSQL Databases: HBase, MongoDB

Programming Languages: Java, Scala, SQL, PL/SQL, Pig Latin, HiveQL, Unix, Java Script, Shell Scripting

Application Servers: Web Logic, Web Sphere.

Build Tools: Jenkins, Maven, ANT

Databases: MySQL, Oracle, DB2

Business Intelligence Tools: Splunk, Talend

Development Tools: Microsoft SQL Studio, Eclipse, NetBeans.

PROFESSIONAL EXPERIENCE:

Confidential, Houston, TX

Sr.Hadoop Developer

Responsibilities:

Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP.
Implemented Map-Reduce jobs in Hive by querying the available data.
Configured Hive Meta-Store with MySQL, which stores the metadata for Hive tables.
Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
Written Hive and Pig scripts as per requirements.
Designed and created the high level design document including the logical data flows, source data extraction process, the database staging and the extract creation, source archival, job scheduling and Error Handling.
Created to pull data from Source, apply transformations, and load data into target database.
Developed re-usable transformations and various transformations such as source qualifier, expression connected and un-connected lookup, router, aggregator filter, sequence generator, update strategy, normalizer, joiner and rank transformations in power Center Designer.
Responsible for design development of Spark, SQL Scripts bases on Functional Specifications.
Responsible for Spark-Streaming configuration based on type of Input.
Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and also written Pig/Hive UDFs.
Experience in NoSQL database such as HBase, MongoDB. Followed agile methodology for the entire project.
Involved in cluster maintenance and monitoring. Experience in Performance troubleshooting.
Load and transform large sets of structured, semi structured and unstructured data.
Involved in loading data from UNIX file system to HDFS.
Created an E-Mail notification service upon completion of job or the particular team which requested for the data.
Configure and tune environment and batch jobs to ensure optimum performance and 99.99% availability.
Import the data from different sources like HDFS/HBase into Spark RDD.
Experienced with Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
Import the data from different sources like HDFS/HBase into Spark RDD.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
Involved in gathering the requirements, designing, development and testing.

Environment: Hadoop, HDFS, MapReduce, Pig, Hive, Sqoop, Spark, Flume, HBase, Oozie, Java, SQL scripting, Linux shell scripting, Java, Eclipse.

Confidential, San Jose, CA

Hadoop/Talend Developer

Responsibilities:

Implemented Hadoop cluster on Cloudera and assisted with performance tuning, monitoring and troubleshooting.
Installed and configured MapReduce, HIVE and the HDFS.
Created, altered and deleted topics (Kafka Queues) when required with varying Performance tuning using Partitioning, bucketing of IMPALA tables.
Created Map Reduce programs for some refined queries on big data.
Created RDD's in Spark technology and extracted data from data warehouse on to the Spark RDD’s.
Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement. Extensive experience in writing HDFS and Pig Latin commands.
Involved in Configuring core-site.xml and mapred-site.xml according to the multi node cluster environment.
Involved in the development of Pig UDF'S to analyze by pre-processing the data.
Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs.
Developed complex queries using HIVE and IMPALA.
Developed complex Talend jobs mappings to load the data from various sources using different components. Design, develop and implement solutions using Talend Integration Suite. Partitioning data streams using KAFKA .
Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second. Used Kafka producer 0.8.3 API's to produce messages.
Created HBase tables to store various data formats of PII data coming from different portfolios. Data processing using SPARK.
Parsed high-level design specification to simple ETL coding and mapping standards.
Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts.
Used AVRO, Parquet file formats for serialization of data.
Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest claim data and financial histories into HDFS for analysis.
Involved in setting up of HBase to use HDFS. Involved in creating Hive tables, loading data &writing hive queries. Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
Written Hive queries for data analysis to meet the business requirements.
Created HBase tables to store various data formats of incoming data from different portfolios.
Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
Used Hive partitioning and bucketing for performance optimization of the Hive tables and created around 20000 partitions. Importing and exporting data into HDFS and Hive using Sqoop.
Consumed the data from Kafka queue using Spark. Configured different topologies for Spark cluster and deployed them on regular basis.
Load and transform large sets of structured, semi structured and unstructured data.
Involved in loading data from LINUX file system to HDFS. Importing and exporting data into HDFS and Hive using Sqoop.
Managing work flow and scheduling for complex map reduce jobs using Apache Oozie.

Environment: Hadoop, Talend, Map-Reduce, HBase, Hive, Impala, Pig, Hive, Sqoop, Hdfs, Flume, Oozie, Spark, Spark SQL, Spark Streaming, Red Shift, Scala, Kafka and Cloudera.

Confidential, Charlotte, NC

Hadoop Developer

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop components.
Used Apache Maven to build and configure the application for the Map-Reduce jobs.
Developed a custom File System plug in for Hadoop so it can access files on Data Platform and which allows Hadoop MapReduce programs, HBase, Pig and Hive to have an access to files directly.
Setup and benchmarked Hadoop/HBase clusters for internal use.
Optimized Map-Reduce jobs to use HDFS efficiently by using various compression mechanisms.
Responsible for building scalable distributed data solutions using Hadoop and Developing Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Experience in NoSQL data stores HBase, Cassandra and MongoDB.
Extracted and loaded data into Data Lake environment by using Sqoop which was accessed by business users and data scientists.
Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Handled high volumes of data where a group of transactions is collected over a period using Batch data processing.
Performed transformations like event joins, filter bot traffic and some pre-aggregations using Pig.
Handled importing of data from various data sources, performed transformations using Hive, Map-Reduce.
Loaded data into HDFS and extracted the data from Oracle into HDFS using Sqoop.
Analyzed the data by performing Hive queries and running Pig-Scripts to study customer behavior.
Installed and configured Cloudera Manager for easy management of existing Hadoop cluster
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Responsible for managing and reviewing Hadoop log files. Designed and developed data management using MySQL. Written Python scripts to parse XML documents and load the data in database.
Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
Written the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
Performed various optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.

Environment: Apache Hadoop, HDFS, Hive, PIG, UNIX, SQL, Java, MapReduce, HBase, Sqoop, Oozie, Linux, Data Pipeline, Cloudera Hadoop Distribution, Python, MySQL, Git, MapR-DB

Confidential, Portland, OR

Java/ J2EE Developer

Responsibilities:

Developed Web module using Spring MVC, JSP.
Responsible and active in the analysis, design, implementation and deployment of full Software Development Lifecycle (SDLC) of the project.
Designed and developed user interface using JSP, HTML and JavaScript.
Developed Struts action classes, action forms and performed action mapping using Struts framework and performed data validation in form beans and action classes.
Extensively used Struts framework as the controller to handle subsequent client requests and invoke the model based upon user requests. Used DAO and JDBC for database access.
Defined the search criteria and pulled out the record of the customer from the database. Make the required changes and save the updated record back to the database.
Validated the fields of user registration screen and login screen by writing JavaScript validations.
Developed build and deployment scripts using Apache ANT to customize WAR and EAR Files.
Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic. Involved in Bug fixing. Involved in Unit Testing by using Junit.
Design and develop XML processing components for dynamic menus on the application.
Involved in post-production support and maintenance of the application.
Writing Technical Design Document. Gathered specifications from the requirements.
Developed the application using Spring MVC Architecture. Developed JSP custom tags support custom User Interfaces. Developed core Java classes for utility classes, business logic, and test cases.
Developed Stored Procedures, Triggers, Views and Cursors using SQLServer2005.
Used Stored Procedures for performing different database operations.
Developed control classes for processing the request. Used Exception Handling for handling exceptions.
Designed sequence diagrams and use case diagrams for proper implementation.

Environment: -Java, JSP, Struts, HTML, CSS, JavaScript, SQL, Spring, Exception Handling, UML, JUnit, Tomcat 6.

Confidential

Java /J2EE Developer

Responsibilities:

Developed Custom tags, JSTL to support custom User Interfaces.
Designed the user interfaces using JSP.
Designed and Implemented MVC architecture using Struts Framework, Coding involves writing Action Classes/Custom Tag Libraries, JSP.
Experienced in MS SQL Server 2005, writing Stored Procedures, SSIS Packages, Functions, and Triggers & Views. Used JUNIT for testing and check API performance.
Developed Action Forms and Controllers in Struts 1.2 framework. Utilized various Struts features like Tiles, tagged libraries and Declarative Exception Handling via XML for the design.
Involved in writing and business layer using EJB, BO, DAO and VO.
Implemented Business processes such as user authentication, Account Transfer using Session EJBs
Worked with Oracle Database to create tables, procedures, functions and select statements.
Used Log4J to capture the log that includes runtime exceptions and developed WAR framework to alert the client and production support in case of application failures.
Developed the Dao's using SQL and Data Source Object. Developed Stored Procedures, Triggers, Views, and Cursors using SQL Server 2005.
Development carried out under Eclipse Integrated Development Environment (IDE).
Used JBoss for deploying various components of application. Used Ant for building Scripts.

Environment: Java1.6, J2EE, Struts, HTML, CSS, JavaScript, Jdbc SQL 2005, ANT, Log4j, JUnit, XML, JSP, JSTL, AJAX, JBoss, Clear Case.

Confidential

Java /J2EE Developer

Responsibilities:

Used Struts Framework for implementing the MVC Architecture. Wrote various Struts action classes to implement the business logic. Asset Liability Management (ALM).
Developed Packages to validate data from Flat Files and insert into various tables in Oracle Database.
Developed JMS API using J2EE package. Made use of Java script for client side validation.
Used JMS API for asynchronous communication by putting the messages in the Message queue.
Written the Java Script, HTML, DHTML, CSS, Servlets and JSP for designing GUI of the application.
Used DISPLAY TAGS in the presentation layer for better look and feel of the web pages.
Understand concepts related to and written code for advanced topics such as Java IO, serialization and multithreading.
Provided UNIX scripting to drive automatic generation of static web pages with dynamic news content.
Involved in design of JSP's and Servlets for navigation among the modules.
Participated in requirements analysis to figure out various inputs correlated with their scenarios in associated fields in creating forms for ALM modules.
Involved in developing PL/SQL Procedures, Functions, Triggers and Packages to provide backend security and data consistency.
Involved in interacting with the Business Analyst during the Sprint Planning Sessions.
Involved in the design of the project using UML Use-Case Diagrams, Sequence Diagrams, Object diagrams and Class Diagrams.
Assisted design and development teams in identifying DB objects and their associated fields in creating forms for ALM Modules. Responsible for performing Code Reviewing and Debugging.

Environment: Java, J2EE, UML, Struts, HTML, CSS, Java Script, JMS, JSP Oracle 9i, PL/SQL, MS Access, UNIX Shell Scripting.

We provide IT Staff Augmentation Services!

Sr.hadoop Developer Resume

Houston, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship