We provide IT Staff Augmentation Services!

Hadoop Developer Resume

0/5 (Submit Your Rating)

San Jose, CA

SUMMARY:

  • 8 years of extensive IT experience with multinational clients with over 2+ years of Bigdata experience developing Bigdata/Hadoop applications.
  • Hands on experience with the Hadoop stack (MapReduce, HDFS, Sqoop, Pig, Hive, HBase, Flume,Yarn, Oozie and Zookeeper)
  • Experience in importing and exporting terabytes of data using Sqoop from HDFS to Relational Database Systems and vice - versa.
  • Experienced with performing real time analytics on NoSQL databases like HBase.
  • Worked with Oozie workflow engine to schedule time based jobs to perform multiple actions.
  • Analyzed large amounts of data sets writing Pig scripts and Hive queries
  • Experienced in writing Map Reduce programs & UDFs for both Hive & Pig in Java
  • Used Flume to channel data from different sources to HDFS.
  • Supported MapReduce Programs running on the cluster and wrote custom Map Reduce Scripts for Data Processing in Java.
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse and Data Mart
  • Having experience in the Data Analysis, Design, Development, Implementation and Testing of Data Ware Housing and using Data Conversions, Data Extraction, Data Transformation and Data Loading (ETL)
  • Strong knowledge of Software Development Life Cycle (SDLC) including requirement analysis, design, development, testing, and implementation. Provided End User and Support.
  • Cluster maintenance as well as creation and removal of nodes using tools like Ganglia, Nagios and other tools.
  • Implementing a technical solution on POC's, writing programming codes using technologies such as Hadoop, YARN, Python and Microsoft SQL Server
  • Experienced with Spark using Scala and Python
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, and Flume.
  • Experience in working with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
  • Experience in analyzing data using HIVEQL, PIG Latin and custom Map Reduce programs in JAVA. Extending Hive and PIG core Functionality by using custom User Defined Functions.
  • Experience with Oozie Workflow Engine to automate and parallelize Hadoop Map/Reduce and Pig jobs.
  • Experience in using shell scripting and python for automation in project.
  • Good knowledge of Linux as Hadoop runs on Linux.
  • Having good hands on experience on different flavours of Linux like Suse and RedHat.
  • Manage Teradata Database using Teradata Administrator, Teradata SQL Assistant and BTEQ.
  • Having good experience in ETL Testing, Developing and Supporting Informatica applications.
  • Create / Modify / Drop Teradata objects like Tables, Views, Join Indexes, Triggers, Macros, Procedures, Databases, Users, Profiles and Roles.
  • Managing database space, allocating new space to database, moving space between databases as needed basis.
  • Participated in the Data Migration activities between Development and Production environments.
  • Performed Performance tuning on queries with efficiency of PI/SI indexes, Join Index, PPI, Using Explain analyzing the data distribution among AMPs and index usage, collect statistics, definition of indexes, revision of correlated sub queries, usage of Hash functions, etc.
  • Experienced in all facets of Software Development Life Cycle (Analysis, Design, Development, Testing and maintenance) using Waterfall and Agile methodologies.
  • Experience in Agile methodology and implementation of enterprise agile practices
  • Motivated team player with excellent communication, interpersonal, analytical and problem solving skills.

TECHNICAL SKILLS:

Database: NoSQL Databases (HBase, Cassandra, Rik, MongoDB), Teradata 13.10, Oracle 10g/9i, SQL Server 2005

Tools: BigData - Hadoop, HDFS, Flume, Sqoop, Yarn, Pig, Hive, Scala, Spark,MapReduce, Oozie, Python, Mongodb, Tez, Ganglia, Nagios.Teradata SQL Assistant, BTEQ, Teradata Administrator, Teradata Viewpoint, Priority Scheduler, Teradata Statistics Wizard, Teradata Visual Explain, MultiLoad, FastLoad, FastExport, Tpump, SQL*Plus.

ETL & Reporting: INFORMATICA, SSIS, SSRS, SSAS

Build Tools: Maven and Jenkins

SQL assistant tools: Toad, Squirrel

Programming Languages: SQL, PL/PLSQL, C, C++, HTML, Perl and Shell Programming, Java

Methodology: Agile Scrum, JIRA and Version One

Versioning systems: SVN and GIT

Operating Systems: Microsoft Windows XP/NT/2007, UNIX, Linux

PROFESSIONAL EXPERIENCE:

Confidential, San Jose, CA

Hadoop Developer

Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Java, Flume, Cloudera, Spark, Oozie, MySQL, UNIX, Core Java, Impala, Python.

Responsibilities:

  • Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
  • Written complex Hive and SQL queries for data analysis to meet business requirements.
  • Expert in importing and exporting data into HDFS and Hive using Sqoop.
  • Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including Pig, Hive, Sqoop, Oozie and Zookeeper.
  • Expert in writing HiveQL queries and Pig Latin scripts.
  • Experience in importing and exporting terabytes of data using Sqoop from Relational Database Systems to HDFS.
  • Experience in providing support to data analyst in running Pig and Hive queries
  • Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive serdes like REGEX, JSON and Avro.
  • Experience in developing customized Hive UDFs and UDAFs in Java, JDBC connectivity with hive development and execution of Pig scripts and Pig UDF's
  • Experience in validating and cleansing the data using Pig statements and hands-on experience in developing Pig MACROS.
  • Experience in using Sqoop to migrate data to and fro from HDFS and My SQL or Oracle and deployed Hive and HBase integration to perform OLAP operations on HBase data.
  • Used Flume in Loading log data into HDFS.
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
  • Worked on python scripts to analyze the data of the customer.
  • Experienced in running Hadoopstreaming jobs to process terabytes of formatted data using python scripts.
  • Created HIVE managed and external tables.
  • Load and transform large sets of structured, semi structured using Hive and Impala.
  • Moving the data from Oracle, Teradata and MS SQL Server in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
  • Responsible for design and creation of Hive tables and worked on various performance optimizations like Partition, Bucketing in hive.Handled incremental data loads from RDBMS into HDFS using Sqoop.
  • Designing conceptual model with Spark for performance optimization
  • Used Oozie scheduler to automate the pipeline workflow and orchestrate the sqoop, hive and pig jobs that extract the data on a timely manner.
  • Used Shell scriptingfor Jenkins job automation.
  • Exported the result set from Hive to MySQL using Shellscripts.
  • Involved in building up of Hadoopecosystem using AWSEC2 servers
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Implemented Spark applications from existing MapReduce framework for better performance
  • Involved in the process of load, transform and analyze health care data from various providers into Hadoop using flume on an on-going basis.
  • Filtered, transformed and combined data from multiple providers based on payer filter criteria using custom Pig UDFs.
  • Analyzed data using HiveQL to generate payer by reports for transmission to payers form payment summaries.
  • Extensively worked on PIG scripts data cleansing and optimization.
  • Responsible for design and creation of Hive tables, partitioning, bucketing, loading data and writing hive queries.
  • Importing and exporting data into HDFS, Hive and Hbase using Sqoop from Relational Database.
  • Exported analyzed data to downstream systems using Sqoop for generating end-user reports, Business Analysis reports and payment reports.
  • Analyzed large amounts of data sets from hospitals and providers to determine optimal way to aggregate and generate summary reports.
  • Worked with the Data Science team to gather requirements for various data mining projects.
  • Used HDFS system to copying files from local to hdfs file system.
  • DevelopedHiveandImpalascripts on Avro and parquet file formats.

Confidential, Omaha, NE

Hadoop Developer

Responsibilities:

  • Experience with professional software engineering practices and best practices for the full software development life cycle including coding standards, code reviews, source control management and build processes.
  • Effectively used Sqoop to transfer data between databases and HDFS.
  • Designed workflow by scheduling Hive processes for Log file data, which is streamed into HDFS using Flume.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries
  • Developed Pig Latin scripts to extract the data from the mainframes output files to load into HDFS.
  • Developed Map-Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Written Hive queries for analyzing and reporting purposes of different streams in the company.
  • Processed the source data to structured data and store in NoSQL database Couchbase.
  • Created alter, insert and delete queries involving lists, sets and maps in Couchbase.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
  • Use Avro serialization technique to serialize data.Applied transformations and standardizations and loaded into HBase for further data processing.
  • Created HB-ase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Exported the analyzed data to the relational databases using Sqoop for virtualization and to generate reports for the BI team.
  • Documented all the requirements, code and implementation methodologies for reviewing and analysation purposes.

Confidential, Stamford, CT

Teradata DBA

Responsibilities:

  • Understanding the specification and analyzed data according to client requirement.
  • Creating roles and profiles as needed basis. Granting privileges to roles, adding users to roles based on requirements.
  • Managing database space, allocating new space to database, moving space between databases as needed basis.
  • Assist developers, DBAs in designing, architecture, development and tuning queries of the project. This included modification of queries, Index selection, and refresh statistic collection.
  • Proactively monitoring bad queries, aborting bad queries using PMON, looking for blocked sessions and working with development teams to resolve blocked sessions. q qProactively monitoring database space, Identifying tables with high skew, working with data modeling team to change the Primary Index on tables with High skew.
  • Worked on moving tables from test to production using fast export and fast load.
  • Extensively worked with DBQL data to identify high usage tables and columns.
  • Implemented secondary indexes on highly used columns to improve performance
  • Worked on exporting data to flat files using Teradata FEXPORT.
  • Worked exclusively with the Teradata SQL Assistant to interface with the Teradata.
  • Written several Teradata BTEQ scripts to implement the business logic.
  • Populated data into Teradata tables by using Fast Load utility.
  • Created Teradata complex macros and Views and stored procedures to be used in the reports.
  • Did error handling and performance tuning in Teradata queries and utilities.
  • Creating error log tables for bulk loading.
  • Worked on capacity planning, reported disk and CPU usage growth reports using Teradata Manager, DBQL, and Resusage.
  • Used Teradata Manager collecting facility to setup AMP usage collection, canary query response, spool usage response etc.
  • Developed complex mappings using multiple sources and targets in different databases, flat files.
  • Developed Teradata BTEQ scripts. Automated Workflows and BTEQ scripts
  • Worked on exporting data to flat files using Teradata FEXPORT.
  • Query optimization (explain plans, collect statistics, Primary and Secondary indexes)

Confidential, Cincinnati, OH

Teradata DBA

Responsibilities:

  • Performed Data analysis and prepared the Physical database based on the requirements.
  • Used Teradata Utilities to ensure High System performance as well as High availability.
  • Implementation of TASM for performance Tuning and Workload Management.
  • Usage of analyst tools like Tset, Index Wizard, and Stats Wizard to improve performance.
  • Responsible for populating warehouse-staging tables.
  • Responsible for capacity planning and performance tuning.
  • Prepared Performance Matrices.
  • Worked on and developed scripts for CronTAB to automate the monitoring tasks.
  • Created Teradata objects like Databases, Users, Profiles, Roles, Tables, Views and Macros.
  • Developed complex mappings using multiple sources and targets in different databases, flat files.
  • Worked on Space considerations and managed Perm, Spool and Temp Spaces.
  • Developed BTEQ scripts for Teradata.
  • Automated Workflows and BTEQ scripts
  • Responsible for tuning the performances of Informatica mappings and Teradata BTEQ scripts.
  • Worked with DBAs to tune the performance of the applications and Backups.
  • Worked on exporting data to flat files using Teradata FEXPORT.
  • Query optimization (explain plans, collect statistics, Primary and Secondary indexes)
  • Build tables, views, UPI, NUPI, USI and NUSI.
  • Written several Teradata BTEQ scripts to implement the business logic.
  • Worked exclusively with the Teradata SQL Assistant to interface with the Teradata.
  • Written various Macros and automated Batch Processes
  • Writing UNIX Shell Scripts for processing/cleansing incoming text files.
  • Used CVS as a versioning tool.
  • Coordinating tasks and issues with Project Manager and Client on daily basis.

Confidential

ETL Developer

Responsibilities:

  • As a part of Enterprise reporting application, maintained a loan management system offering various loan products to customer.
  • Managed Operation Data Store and worked on database maintenance and other administrative activities.
  • Hands on experience on ETL tools like Oracle warehouse Builder (OWB) and Business Objects Data Integration (BODI)
  • Coordinated with Business Analysts/ downstream/ Source systems for requirement analysis.
  • Actively participated in requirement analysis, planning, estimation, coding & testing.
  • Ensured timely deliverables to the clients for any CR/Defects.
  • Implemented of various control checks to ensure data integrity.
  • Led performance tuning of database/process improvement steps.
  • Maintained various compliances in project in perspective of audits, Project Management Reviews (PMR)
  • Familiar with project related activities like IPMS task creation, UMP generation, PMR kit creation, Unified Project Plan.

Confidential

ETL Developer

Responsibilities:

  • As a part of Tata Communication Billing application enhancement and support, membered the mediation module including data collection from switch and transforming into readable format as prescribed for billing activities.
  • Managed the integration of two new Sri Lanka switches with Mediation India systems, which helped in increasing the revenues generated for the project and successfully completed within specified timelines.
  • Handled the planning activities, requirement analysis, impact analysis, designing, development and testing of number of requirements and fixing various production issues.
  • Adept in analyzing information system needs, evaluating end user requirements, custom designing solutions, troubleshooting for complex information systems such as Telecom systems.
  • Actively involved in delivering change requests issued by the client and parallel IT support teams
  • Took performance initiatives as a part of value add to the customer and resolved problem requests occurring from time to time by coordinating with parallel IT team delivering Level-2 support.
  • Ensured timely month end clearance to facilitate smooth billing process.
  • Actively participated in cross-functional s and knowledge sharing activities.

We'd love your feedback!