We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Phoenix, AZ

PROFESSIONAL SUMMARY:

  • 6 plus years of IT experience in software development and support with experience in developing strategic methods for deploying Big Data technologies to efficiently solve Big Data processing requirement. And Development of Java based enterprise application.
  • Progressive experience in all phases of iterative Software Development Life Cycle (SDLC)/Agile Actively involved in Requirements Gathering, Analysis, Development, Unit Testing and Integration Testing
  • Expertise in Hadoop eco system components HDFS, Map Reduce, Yarn, HBase, Pig, Sqoop,Oozie, Nifi, Spark, Spark SQL, Spark Streaming and Hive for scalability, distributed computing, and high - performance computing.
  • Experience in using Hive Query Language for data Analytics.
  • Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
  • Hands on experience in extending the core functionalities of HIVE using UDF, UDAF and UDTF.
  • Excellent Experience in Designing, Developing, Documenting, Testing of ETL jobs and mappings in Server and Parallel jobs usingDataStageto populate tables inDataWarehouse andDatamarts
  • Having Good knowledge on Single node and Multi node Cluster Configurations.
  • Expertise on Talend Data Integration suite and Bigdata Integration Suite for Design and development of ETL/Bigdata code and Mappings for Enterprise DWH ETL Talend Projects
  • Strong knowledge in NOSQL column oriented databases like HBase, Cassandra, MongoDB and its integration with Hadoop cluster.
  • Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Good knowledge on Amazon EMR, S3 Buckets, Dynamo DB, RedShift.
  • Partner with Data Infrastructure team and business owners to implement new data sources and ensure consistent definitions are used in reporting and analytics
  • Promote full cycle approach including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to the requestor
  • Developed codes to migrate MapReduce jobs intoSparkRDD transformations usingScala
  • Very Good understanding of SQL, ETL and Data Warehousing Technologies
  • Knowledge of MS SQL Server 2012/2008/2005 and Oracle 11g/10g/9i and E-Business Suite.
  • Expert in TSQL, creating and using Stored Procedures, Views, User Defined Functions, implementing Business Intelligence solutions using SQL Server 2000/2005/2008.
  • Developed Web-Services module for integration using SOAP and REST.
  • Performed Data Warehousing Methodologies for ETL using InformaticaDesigner, Repository Manager, Workflow Manager, Workflow Monitor, Repository Server Administration Console.
  • Flexible with Unix/Linux and Windows Environments working with Operating Systems like Centos 5/6, Ubuntu 13/14, Cosmos.
  • Good experience on Kafka and Storm
  • Developed multiple Kafka Producers and Consumers from as per the software requirement specifications
  • Worked on Storm to handle the parallelization, partitioning, and retrying on failures and developed a data pipeline using Kafka and Strom to store data into HDFS.
  • Experience in build scripts using Maven and do continuous integrations systems like Jenkins.
  • Java Developer with extensive experience on various Java Libraries, API's and frameworks.
  • Hands on development experience with RDBMS, including writing complex SQLqueries, Stored procedure and triggers.
  • Have sound knowledge on designing data warehousing applications with using Tools like Teradata, Oracle, Netezza and SQL Server.
  • Experience on using Talend ETL tool.
  • Experience in working with job scheduler like Autosys and Maestro.
  • Strong in databases like Sybase, DB2, Oracle, MS SQL.
  • Strong understanding of Agile Scrum and Waterfall SDLC methodologies.
  • Hands on experience in AVRO and Parquet file format, Dynamic Partitions, Bucketing for best Practice and Performance improvement Experience of semi-structured data processing (XML,JSON, and CSV) in Hive/Impala
  • Strong communication, collaboration & team building skills with proficiency at grasping new Technical concepts quickly and utilizing them in a productive manner.
  • Adept in analyzing information system needs, evaluating end-user requirements, custom designing solutions and troubleshooting information systems.
  • Strong analytical and Problem solving skills.

TECHNICAL SKILLS:

Hadoop/Big Data Technologies: Monitoring and Reporting

Build Tools: Maven, SQL Developer

Programming & Scripting: JAVA, SQL, Shell Scripting, Python, Scala

Java Technologies: Servlets, JavaBeans, JDBC, Spring, Hibernate, SOAP/Rest services

Databases: Oracle, MY SQL, MS SQL server, Teradata, HQL, Netezza

Web Dev. Technologies: HTML, XML, JSON, CSS, JQUERY, JavaScript

Version Control: SVN, Confidential, GIT

Operating Systems: Linux, Unix, Mac OS-X, Cen OS, Windows10, Windows 8, Windows 7, Windows Server 2008/2003

Application Servers: Web Logic, Web Sphere, JBoss, Tomcat

PROFESSIONAL EXPERIENCE:

Confidential

Hadoop Developer

Responsibilities:

  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Worked on SQOOP to import data from various relational data sources.
  • Worked on strategizing SQOOP jobs to parallelize data loads from source systems.
  • Build hive tables with vendors generated flat files.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Administered, installed, upgraded and managing HDP2.2, Pig, Hive&Hbase.
  • Expertise in writing MapReduce programs in Java on MRv2 / YARN environment.
  • Worked with Presto which allows in querying data where it lives, including Hive, Cassandra, relational databases or even proprietary data stores.
  • Created Data stage jobs (ETL Process) for populating the data into the Data warehouse constantly from different source systems like ODS, flat files.
  • Worked on DataStage production job scheduling process using the Scheduling tools anddatastage scheduler.
  • Done Proof of Concept in Apache Nifiworkflow in place of Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig
  • Good at implementing unit tests with MRUnit and PIGUnit.
  • Involved in Exposing theDataStagejobs as Web Services using IBM Information Server Console
  • Improving the performance and optimization of existing algorithms in Hadoop usingSparkcontext Spark-SQL andSparkYARN using Scala.
  • ImplementedSparkCore in Scala to process data in memory.
  • Created NIFI POC to do sentimental analysis.
  • Created complex HQL, SQL Queries and SQL tuning, writing PL/SQL blocks like stored procedures, Functions, cursors, Index, triggers and packages
  • Write test cases, analyze and reporting test results to product teams.
  • Created FanIn and FanOut multiplexing flows with Flume
  • Developed bash scripts to bring the log files from ftp server and then processing it to load into hive tables.
  • Configured Flume to stream data into HDFS and Hive using HDFS Sinks
  • Good knowledge on Amazon EMR, S3 Buckets, Dynamo DB, RedShift
  • CreatedTalendjobs to copy the files from one server to another and utilizedTalendFTP components
  • UsedTalendmost used components (tMap, tDie, tConvertType, tFlowMeter, tLogCatcher, tRowGenerator, tSetGlobalVar, tHashInput&tHashOutput and many more)
  • Worked onTalend for scheduling jobs and adding users
  • Using Sqoop to load data from SQL into HBASE environment.
  • Implemented Spark Streaming to read real-time data from Kafka in parallel and processed in parallel and save the result as parquet format in Hive
  • Inserted Overwriting the HIVE data with HBasedata daily to get fresh data every day.
  • Design database, data models, ETL processes, data warehouse applications and business intelligence (BI) reports through the use of best practices and tools, including SQL,OLAP and OLTP.
  • All the bash scripts are scheduled using Resource Manager Scheduler.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.

Environment: Hadoop Cluster, HDFS, Hive, Pig, Sqoop,Talendfor Data integration Linux, Yarn,Oozie, Hadoop Map Reduce, Data stage, HBase, Shell Scripting, Ambari,Cassandra, HQL, Apache Spark.

Confidential, Phoenix, AZ

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, HBase database and Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Installed and configured Hive, Pig, Sqoop, HBase on the Hadoop cluster.
  • Managing and scheduling Jobs on a Hadoop cluster.
  • Wrote Map Reduce Jobs to analyze the data.
  • Resource management of HADOOP Cluster including adding/removing cluster nodes for maintenance and capacity needs
  • Loaded log data into HDFS
  • Monitored Hadoop cluster using tools like Cloudera Manager.
  • Installed and configured Hive and also written Hive UDFs.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Automated script to monitor HDFS and HBase through cronjobs.
  • Cluster coordination services through Zookeeper.
  • Managing and scheduling Jobs on a Hadoop cluster.
  • Prepare multi-cluster test harness to exercise the system for performance analysis and failover.
  • Develop high-performance cache, making the site stable and improving its performance.
  • Create a complete processing engine, enhanced to performance.

Environment: HDFS, HBase, Hive, Pig, HDFS, Java, JDBC, Struts, Maven, Subversion, JUnit, SQL, Putty and Eclipse

Confidential

Hadoop Developer

Responsibilities:

  • Performed benchmarking of HDFS and Resource manager using TestDFSIO and Tera Sort.
  • Worked on SQOOP to import data from various relational data sources.
  • Working with Flume in bringing click stream data from front facing application logs
  • Worked on strategizing SQOOP jobs to parallelize data loads from source systems
  • Participated in providing inputs for design of the ingestion patterns.
  • Participated in strategizing loads without impacting front facing applications.
  • Worked on design on Hive data store to store the data from various data sources.
  • Involved in brainstorming sessions for sizing the Hadoop cluster.
  • Proficient in data modelling with Hivepartitioning, Indexing, bucketing and other optimization techniques in Hive
  • DevelopedScalascripts, UDF's using both Data frames/SQL/Data sets and RDD/MapReduce inSpark1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Coded custom processors in NiFi and implemented consumers and producers for Kafka topics
  • Developing a Data ingestion workflow using tools like NiFi for Hbase Ingestions
  • Worked on performance tuning of Apache NiFi workflow to optimize the data ingestion speeds
  • Using HIVEmapside/skew join queries to join multiple tables of a source system and load them into Elastic Search Tables
  • Involved in providing inputs to analyst team for functional testing.
  • Worked with source system load testing teams to perform loads while ingestion jobs are in progress.
  • Worked on performing data standardization using PIG scripts.
  • Worked on installation and configuration Horton works cluster ground up.
  • Worked on both External and Managed Hive tables for optimized performance.Created Hive Generic UDF's, UDAF's, UDTF's in python to process business logic that varies based on policy.
  • Managed various groups for users with different queue configurations.
  • Worked on building analytical data stores for data science team’s model development.
  • Worked on design and development of Oozie works flows to perform orchestration of PIG and HIVE jobs.
  • Worked on performance tuning of HIVE queries with partitioning and bucketing process.
  • Worked on the core and Spark SQL modules of Spark extensively
  • Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
  • Experienced in analyzing the SQL scripts and designed the solution to implement using PySpark.
  • Analyzed the SQL scripts and designed the solution to implement using PySpark.

Environment: Hadoop, HDFS, Map Reduce, Flume, Pig, Sqoop, Hive, Pig, Sqoop, Oozie, Ganglia, HBase, Shell Scripting, Apache Spark.

Confidential

Hadoop Developer

Responsibilities:

  • Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into Hive tables, which are partitioned.
  • Developed Hive UDF’s to bring all the customers email id into a structured format.
  • Developed bash scripts to bring the log files from ftp server and then processing it to load into hive tables.
  • Using Sqoop to load data from DB2 into HBASE environment.
  • Inserted Overwriting the HIVE data with HBasedata daily to get fresh data every day.
  • All the bash scripts are scheduled using Resource Manager Scheduler.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
  • Developed Pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Developed Hive queries for Analysis across different banners.
  • Successfully designed and developed a solution for speeding up a SQL Job using Hadoop Map-Reduce framework. Processing time was reduced from hours to Minutes.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Responsible to manage data coming from different sources.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and also written Hive UDFs.
  • Developed MapReduce jobs for Log Analysis, Recommendation and Analytics.
  • Developed Map Reduce programs that filter bad and un-necessary records and find out unique records based on different criteria.
  • Responsible for performing extensive data validation using Hive.

Environnent: Hadoop, MapReduce, HDFS, Hive, Java (jdk1.6), Hadoop distribution of Hortonworks, DataStax, Flat files, Oracle 11g/10g, PL/SQL, SQL PLUS, UNIX Shell Scripting, Autosys r11.0.

Confidential

JAVA/J2EE Developer

Responsibilities:

  • Involved in Java, J2EE, struts, web services and Hibernate in a fast paced development environment.
  • Followed agile methodology, interacted directly with the client on the features, implemented optimal solutions, and tailor application to customer needs.
  • Implemented Action classes, ActionFrom classes for the entire Reports module using Struts framework.
  • Involved in design and implementation of web tier using Servlets and JSP.
  • Used Apache POI for Excel files reading.
  • Developed the user interface using JSP and Java Script to view all online trading transactions.
  • Implemented various design patterns like, MVC, Factory, Singleton
  • Designed and developed Data Access Objects (DAO) to access the database.
  • Worked with OOPS concepts and memory concepts like string pools.
  • Developed Struts Framework Action Servlets classes for Controller and developed Form Beans for transferring data between Action class and the View Layer.
  • Used DAO Factory and value object design patterns to organize and integrate the JAVA Objects
  • Coded Java Server Pages for the Dynamic front end content that use Servlets and EJBs.
  • Coded HTML pages using CSS for static content generation with JavaScript for validations.
  • Used Java Message Service (JMS) for reliable and asynchronous exchange of important information, such as loan status report, between the clients and the bank .
  • Used JDBC API to connect to the database and carry out database operations.
  • Used JSP and JSTL Tag Libraries for developing User Interface components.
  • Performing Code Reviews.
  • Performed unit testing, system testing and integration testing.
  • Involved in building and deployment of application in Linux environment.

Environment: Java, J2EE, JDBC, Struts, SQL. Hibernate, Eclipse, Apache POI,CSS,Servlets, Multi-Threading, UML, Oracle, Tomcat, Windows XP,HTML, JSP.

We'd love your feedback!