Sr. Hadoop Developer Resume

SUMMARY:

7+ years of IT - experience, adept in designing, implementing, and maintaining solutions on Big Data Eco-System.
Adept at implementing E2E solutions on Big Data using Hadoop framework, executed and designed big data solutions on multiple distribution systems like Cloudera (CDH3 & CDH4), Hortonworks Highly Confident and Skilled Professional with having 7+ years of professional experience in IT industry, with around 4+ years of hands-on expertise in Big Data processing using Hadoop, Hadoop Ecosystem (Map Reduce, Pig, Spark, Scala, Hive, Sqoop, Flume and HBase, Cassandra, Mongo DB, Akka Framework) implementation, maintenance, ETL and Big Data analysis operations.
Exposure of working on different big data distributions like Cloudera, Hortonworks, Apache etc.
Experience in Apache Spark, Spark Streaming, Spark SQL and No SQL databases like Cassandra and HBase.
Strong knowledge in writing Hive UDF, Generic UDF's to in corporate complex business logic into Hive Queries.
Experienced in optimizing Hive queries by tuning configuration parameters.
Involved in designing the data model in Hive for migrating the ETL process into
Hadoop and wrote Pig Scripts to load data into Hadoop environment.
Having experience in developing a data pipeline using Kafka to store data into
HDFS.
Exposure to Migration from Data warehouses to Hadoop Eco System.
Experience in NoSQL databases like HBase, Cassandra and Mongo DB.
Worked extensively with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
Experience in importing and exporting data using Sqoop from HDFS to
Relational Database Systems (RDBMS), Teradata and vice versa.
Skilled in creating workflows using Oozie for Autosys jobs.
Hands on experience with message brokers such as Apache Kafka.
Hands on experience in designing ETL operations including data extraction, data cleansing, data transformations, data loading.
Setup/Manag

EXPERIENCE:

Confidential

Sr. Hadoop Developer

Responsibilities:

Performing transformations using Hive, MapReduce, hands on experience in copying .log, snappy files into HDFS from Greenplum using Flume & Kafka, loaded data into HDFS and extracted the data into HDFS from MYSQL using Sqoop. Designed and Implement test environment on AWS. Designed AWS
Cloud Formation templates to create VPC, subnets, NAT to ensure successful deployment of Web applications and database templates. Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS. Act as technical liaison between customer and team on all AWS technical aspects. Involved in preparing the S2TM document as per the business requirement and worked with Source system SME's in understanding the source data behavior. Imported required tables from RDBMS to HDFS using Sqoop and used Storm/ Spark streaming and Kafka to get real time streaming of data into HBase.
Experience in Writing Map Reduce jobs for text mining and worked with predictive analysis team and Experience in working with Hadoop components such as HBase, Spark, Yarn, Kafka, Zookeeper, PIG, HIVE, Sqoop, Oozie, Impala and Flume. Wrote HIVE UDF's as per requirements and to handle different schemas and xml data. Designing and developing MapReduce jobs to process data coming in different file formats like XML, CSV, JSON. Involved in Apache SPARK testing. Implemented ETL code to load data from multiple sources into HDFS using Pig Scripts. Implemented MapReduce programs to handle semi/ unstructured data like XML
JSON, Avro data files and sequence files for log files. Responsible to review the test cases in HP ALM. Developed Spark applications using Scala for easy Hadoop transitions. And Hands on experienced in writing Spark jobs and Spark streaming API using Scala and Python. Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive developed Spark code and Spark - SQL/Streaming for faster testing and processing of data. Installed Oozie workflow engine to run multiple Hive and Pig jobs. Designed and developed User Defined Function (UDF) for Hive and Developed the Pig UDF'S to pre-process the data for analysis as well as experience in (UDAFs) for custom data specific processing. Assisted in problem solving with Big Data technologies for integration of Hive with HBase and Sqoop with HBase. Designed and developed the core data pipeline code, involving work in Java and Python and built on Kafka an

Confidential

Sr. Hadoop Developer

Responsibilities:

Experienced in defining job flows. Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data. Created dynamic end to end REST API with Loopback - Node JS Framework. Experienced in managing and reviewing Hadoop log files. Maintenance of all the services in Hadoop ecosystem using ZOOKEPER. Extracted files from RDBMS through Sqoop and placed in HDFS and processed. Experienced in running Hadoop streaming jobs to process terabytes of xml format data. Load and transform large sets of structured, semi structured, and unstructured data.
Responsible to manage data coming from various sources. Got good experience with NOSQL database such as HBase Supported Map Reduce Programs those are running on the cluster. Involved in loading data from UNIX file system to HDFS. Installed and configured Hive and written Hive UDFs. Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way. Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc. Developed a custom File System plug in for Hadoop so it can access files on Data Platform. This plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly. Designed and implemented MapReduce-based large-scale parallel relation-learning system. Written the programs in Spark using Scala and used RDD for transformations and performed actions on them.

Environment: Java 6, Eclipse, Oracle 10g, Linux Red Hat. Linux, MapReduce, Node Js, HDFS, Oozie, Hive, Java (JDK 1.6), MapReduce, Spark, Oracle 11g / 10g, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, Elastic, Flume, Cloudera, UNIX Shell Scripting.

Confidential

Hadoop Developer

Responsibilities:

Designed, implemented, and tested clustered multi - tiered, e-Commerce products. Core technologies used includeIIS, SQL Server, ASP, XML/ XSLT, JSP, Tomcat, JavaBeans and Java Servlets. Developed the XML Schema and Web services for the data maintenance and structures. Implemented the Web Service client for the login authentication, credit reports and applicant information using Apache Axis 2 Web Service. Followed Agile Methodology (TDD, SCRUM) to satisfy the customers and wrote JUnit test cases for unit testing the integration layer. Used Hive to analyse the partitioned and bucketed data and compute various metrics for reporting on the dashboard. Analysed the data by performing Hive queries and running Pig scripts to know user behaviour. Developed scripts to extract data from MySQL into HDFS. Worked on different file formats like Sequence files, XML files and Map files using MapReduce
Programs. Used Hibernate ORM framework with Spring framework for data persistence and transaction management. Loaded the aggregated data onto DB2 for reporting on the dashboard. Continuous monitoring and managing the Hadoop cluster using Cloudera Manager. Strong expertise on MapReduce programming model with XML, JSON, CSV file formats. Experience in managing and reviewing Hadoop log files. Involved in loading data from Linux file system to HDFS. Implemented test scripts to support test driven development and continuous integration. Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive. Worked with the Data Science team to gather requirements for various data mining projects.

Environment: HDFS, Hadoop 2.2.0 (Yarn), Flume 1.5.2, Eclipse, SQL Server, Map Reduce, Hive 1.1.0, Pig Latin 0.14.0, JavaBeans, SQL, Sqoop 1.4.6, Oozie, CentOS, Zookeeper 3.5.0 and NOSQL database.

Confidential

Java/ Big Data Developer

Responsibilities:

Involved in loading data from LINUX file system to HDFS. Importing and exporting data into HDFS and Hive using Sqoop. Worked on processing unstructured data using Pig and Hive. Performed Hadoop streaming jobs to process terabytes of xml format data. Involved in scheduling Oozie workflow engine to run multiple
Hive and pig jobs. Developed Pig Latin scripts to extract data from the web server output files to load into HDFS. Extensively used Pig for data cleansing. Implemented SQL, PL/SQL Stored Procedures. Worked on debugging, performance tuning of Hive & Pig Jobs. Implemented test scripts to support test driven development and Continuous Integration. Worked on tuning the performance of Pig queries. Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts. Actively involved in code review and bug fixing for improving the performance.

Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, LINUX, Cloudera, Big Data, Java APIs, Java collection, SQL.

Confidential

Big Data Developer

Responsibilities:

Involved in Full Life Cycle Development in Distributed Environment Using Java and J2EE framework. Responsible for developing and modifying the existing service layer based on the business requirements. Involved in designing & developing web - services using SOAP and WSDL. Involved in database design. Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle 9i. Created User Interface using JSF. Involved in integration testing the Business Logic layer and Data Access layer. Integrated JSF with JSP and used JSF
Custom Tag Libraries to display the value of variables defined in configuration files. Used technologies like JSP, JSTL, JavaScript, HTML, XML and Tiles for Presentation tier Involved in JUnit testing of the application using JUnit framework. Written Stored Procedures functions and views to retrieve the data. Used Maven builds to wrap around Ant build scripts. CVS tool is used for version control of code and project documents. Responsible to mentor/work with team members to make sure the standards and guidelines are followed and delivery of tasks in time.

Environment: JQuery, JSP, Servlets, JSF, JDBC, HTML, JUnit, JavaScript, XML, SQL, Maven, Web Services, UML, Web Logic Workshop and CVS.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship