Hadoop Developer Resume San Jose, CA - Hire IT People

SUMMARY

8 years of extensive IT experience with multinational clients with over 2+ years of Bigdata experience developing Bigdata/Hadoop applications.
Hands on experience with the Hadoop stack (MapReduce, HDFS, Sqoop, Pig, Hive, HBase, Flume,Yarn, Oozie and Zookeeper)
Experience in importing and exporting terabytes of data using Sqoop from HDFS to Relational Database Systems and vice - versa.
Experienced with performing real time analytics on NoSQL databases like HBase.
Worked with Oozie workflow engine to schedule time based jobs to perform multiple actions.
Analyzed large amounts of data sets writing Pig scripts and Hive queries
Experienced in writing Map Reduce programs & UDFs for both Hive & Pig in Java
Used Flume to channel data from different sources to HDFS.
Supported MapReduce Programs running on the cluster and wrote custom Map Reduce Scripts for Data Processing in Java.
Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse and Data Mart
Having experience in the Data Analysis, Design, Development, Implementation and Testing of Data Ware Housing and using Data Conversions, Data Extraction, Data Transformation and Data Loading (ETL)
Strong knowledge of Software Development Life Cycle (SDLC) including requirement analysis, design, development, testing, and implementation. Provided End User and Support.
Cluster maintenance as well as creation and removal of nodes using tools like Ganglia, Nagios and other tools.
Implementing a technical solution on POC's, writing programming codes using technologies such as Hadoop, YARN, Python and Microsoft SQL Server
Experienced with Spark using Scala and Python
Hands on experience in installing, configuring, and using Hadoop ecosystem components like Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, and Flume.
Experience in working with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
Experience in analyzing data using HIVEQL, PIG Latin and custom Map Reduce programs in JAVA. Extending Hive and PIG core Functionality by using custom User Defined Functions.
Experience with Oozie Workflow Engine to automate and parallelize Hadoop Map/Reduce and Pig jobs.
Experience in using shell scripting and python for automation in project.
Good knowledge of Linux as Hadoop runs on Linux.
Having good hands on experience on different flavours of Linux like Suse and RedHat.
Manage Teradata Database using Teradata Administrator, Teradata SQL Assistant and BTEQ.
Having good experience in ETL Testing, Developing and Supporting Informatica applications.
Create / Modify / Drop Teradata objects like Tables, Views, Join Indexes, Triggers, Macros, Procedures, Databases, Users, Profiles and Roles.
Managing database space, allocating new space to database, moving space between databases as needed basis.
Participated in the Data Migration activities between Development and Production environments.
Performed Performance tuning on queries with efficiency of PI/SI indexes, Join Index, PPI, Using Explain analyzing the data distribution among AMPs and index usage, collect statistics, definition of indexes, revision of correlated sub queries, usage of Hash functions, etc.
Experienced in all facets of Software Development Life Cycle (Analysis, Design, Development, Testing and maintenance) using Waterfall and Agile methodologies.
Experience in Agile methodology and implementation of enterprise agile practices
Motivated team player with excellent communication, interpersonal, analytical and problem solving skills.

TECHNICAL SKILLS

Database: NoSQL Databases (HBase, Cassandra, Rik, MongoDB), Teradata 13.10, Oracle 10g/9i, SQL Server 2005

Tools: BigData - Hadoop, HDFS, Flume, Sqoop, Yarn, Pig, Hive, Scala, Spark,MapReduce, Oozie, Python, Mongodb, Tez, Ganglia, Nagios.Teradata SQL Assistant, BTEQ, Teradata Administrator, Teradata Viewpoint, Priority Scheduler, Teradata Statistics Wizard, Teradata Visual Explain, MultiLoad, FastLoad, FastExport, Tpump, SQL*Plus.

ETL & Reporting: INFORMATICA, SSIS, SSRS, SSAS

Build Tools: Maven and Jenkins

SQL assistant tools: Toad, Squirrel

Programming Languages: SQL, PL/PLSQL, C, C++, HTML, Perl and Shell Programming, Java

Methodology: Agile Scrum, JIRA and Version One

Versioning systems: SVN and GIT

Operating Systems: Microsoft Windows XP/NT/2007, UNIX, Linux

PROFESSIONAL EXPERIENCE

Confidential, San Jose, CA

Hadoop Developer

Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Java, Flume, Cloudera, Spark, Oozie, MySQL, UNIX, Core Java, Impala, Python.

Responsibilities:

Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
Written complex Hive and SQL queries for data analysis to meet business requirements.
Expert in importing and exporting data into HDFS and Hive using Sqoop.
Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including Pig, Hive, Sqoop, Oozie and Zookeeper.
Expert in writing HiveQL queries and Pig Latin scripts.
Experience in importing and exporting terabytes of data using Sqoop from Relational Database Systems to HDFS.
Experience in providing support to data analyst in running Pig and Hive queries
Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive serdes like REGEX, JSON and Avro.
Experience in developing customized Hive UDFs and UDAFs in Java, JDBC connectivity with hive development and execution of Pig scripts and Pig UDF's
Experience in validating and cleansing the data using Pig statements and hands-on experience in developing Pig MACROS.
Experience in using Sqoop to migrate data to and fro from HDFS and My SQL or Oracle and deployed Hive and HBase integration to perform OLAP operations on HBase data.
Used Flume in Loading log data into HDFS.
Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
Worked on python scripts to analyze the data of the customer.
Experienced in running Hadoopstreaming jobs to process terabytes of formatted data using python scripts.
Created HIVE managed and external tables.
Load and transform large sets of structured, semi structured using Hive and Impala.
Moving the data from Oracle, Teradata and MS SQL Server in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
Responsible for design and creation of Hive tables and worked on various performance optimizations like Partition, Bucketing in hive.Handled incremental data loads from RDBMS into HDFS using Sqoop.
Designing conceptual model with Spark for performance optimization
Used Oozie scheduler to automate the pipeline workflow and orchestrate the sqoop, hive and pig jobs that extract the data on a timely manner.
Used Shell scriptingfor Jenkins job automation.
Exported the result set from Hive to MySQL using Shellscripts.
Involved in building up of Hadoopecosystem using AWSEC2 servers
Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
Implemented Spark applications from existing MapReduce framework for better performance
Involved in the process of load, transform and analyze health care data from various providers into Hadoop using flume on an on-going basis.
Filtered, transformed and combined data from multiple providers based on payer filter criteria using custom Pig UDFs.
Analyzed data using HiveQL to generate payer by reports for transmission to payers form payment summaries.
Extensively worked on PIG scripts data cleansing and optimization.
Responsible for design and creation of Hive tables, partitioning, bucketing, loading data and writing hive queries.
Importing and exporting data into HDFS, Hive and Hbase using Sqoop from Relational Database.
Exported analyzed data to downstream systems using Sqoop for generating end-user reports, Business Analysis reports and payment reports.
Analyzed large amounts of data sets from hospitals and providers to determine optimal way to aggregate and generate summary reports.
Worked with the Data Science team to gather requirements for various data mining projects.
Used HDFS system to copying files from local to hdfs file system.
DevelopedHiveandImpalascripts on Avro and parquet file formats.

Confidential, Omaha, NE

Hadoop Developer

Responsibilities:

Experience with professional software engineering practices and best practices for the full software development life cycle including coding standards, code reviews, source control management and build processes.
Effectively used Sqoop to transfer data between databases and HDFS.
Designed workflow by scheduling Hive processes for Log file data, which is streamed into HDFS using Flume.
Involved in creating Hive tables, and loading and analyzing data using hive queries
Developed Pig Latin scripts to extract the data from the mainframes output files to load into HDFS.
Developed Map-Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
Written Hive queries for analyzing and reporting purposes of different streams in the company.
Processed the source data to structured data and store in NoSQL database Couchbase.
Created alter, insert and delete queries involving lists, sets and maps in Couchbase.
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
Use Avro serialization technique to serialize data.Applied transformations and standardizations and loaded into HBase for further data processing.
Created HB-ase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Exported the analyzed data to the relational databases using Sqoop for virtualization and to generate reports for the BI team.
Documented all the requirements, code and implementation methodologies for reviewing and analysation purposes.

Confidential, Stamford, CT

Teradata DBA

Responsibilities:

Understanding the specification and analyzed data according to client requirement.
Creating roles and profiles as needed basis. Granting privileges to roles, adding users to roles based on requirements.
Managing database space, allocating new space to database, moving space between databases as needed basis.
Assist developers, DBAs in designing, architecture, development and tuning queries of the project. This included modification of queries, Index selection, and refresh statistic collection.
Proactively monitoring bad queries, aborting bad queries using PMON, looking for blocked sessions and working with development teams to resolve blocked sessions. q qProactively monitoring database space, Identifying tables with high skew, working with data modeling team to change the Primary Index on tables with High skew.
Worked on moving tables from test to production using fast export and fast load.
Extensively worked with DBQL data to identify high usage tables and columns.
Implemented secondary indexes on highly used columns to improve performance
Worked on exporting data to flat files using Teradata FEXPORT.
Worked exclusively with the Teradata SQL Assistant to interface with the Teradata.
Written several Teradata BTEQ scripts to implement the business logic.
Populated data into Teradata tables by using Fast Load utility.
Created Teradata complex macros and Views and stored procedures to be used in the reports.
Did error handling and performance tuning in Teradata queries and utilities.
Creating error log tables for bulk loading.
Worked on capacity planning, reported disk and CPU usage growth reports using Teradata Manager, DBQL, and Resusage.
Used Teradata Manager collecting facility to setup AMP usage collection, canary query response, spool usage response etc.
Developed complex mappings using multiple sources and targets in different databases, flat files.
Developed Teradata BTEQ scripts. Automated Workflows and BTEQ scripts
Worked on exporting data to flat files using Teradata FEXPORT.
Query optimization (explain plans, collect statistics, Primary and Secondary indexes)

Confidential, Cincinnati, OH

Teradata DBA

Responsibilities:

Performed Data analysis and prepared the Physical database based on the requirements.
Used Teradata Utilities to ensure High System performance as well as High availability.
Implementation of TASM for performance Tuning and Workload Management.
Usage of analyst tools like Tset, Index Wizard, and Stats Wizard to improve performance.
Responsible for populating warehouse-staging tables.
Responsible for capacity planning and performance tuning.
Prepared Performance Matrices.
Worked on and developed scripts for CronTAB to automate the monitoring tasks.
Created Teradata objects like Databases, Users, Profiles, Roles, Tables, Views and Macros.
Developed complex mappings using multiple sources and targets in different databases, flat files.
Worked on Space considerations and managed Perm, Spool and Temp Spaces.
Developed BTEQ scripts for Teradata.
Automated Workflows and BTEQ scripts
Responsible for tuning the performances of Informatica mappings and Teradata BTEQ scripts.
Worked with DBAs to tune the performance of the applications and Backups.
Worked on exporting data to flat files using Teradata FEXPORT.
Query optimization (explain plans, collect statistics, Primary and Secondary indexes)
Build tables, views, UPI, NUPI, USI and NUSI.
Written several Teradata BTEQ scripts to implement the business logic.
Worked exclusively with the Teradata SQL Assistant to interface with the Teradata.
Written various Macros and automated Batch Processes
Writing UNIX Shell Scripts for processing/cleansing incoming text files.
Used CVS as a versioning tool.
Coordinating tasks and issues with Project Manager and Client on daily basis.

Confidential

ETL Developer

Responsibilities:

As a part of Enterprise reporting application, maintained a loan management system offering various loan products to customer.
Managed Operation Data Store and worked on database maintenance and other administrative activities.
Hands on experience on ETL tools like Oracle warehouse Builder (OWB) and Business Objects Data Integration (BODI)
Coordinated with Business Analysts/ downstream/ Source systems for requirement analysis.
Actively participated in requirement analysis, planning, estimation, coding & testing.
Ensured timely deliverables to the clients for any CR/Defects.
Implemented of various control checks to ensure data integrity.
Led performance tuning of database/process improvement steps.
Maintained various compliances in project in perspective of audits, Project Management Reviews (PMR)
Familiar with project related activities like IPMS task creation, UMP generation, PMR kit creation, Unified Project Plan.

Confidential

ETL Developer

Responsibilities:

As a part of Tata Communication Billing application enhancement and support, membered the mediation module including data collection from switch and transforming into readable format as prescribed for billing activities.
Managed the integration of two new Sri Lanka switches with Mediation India systems, which helped in increasing the revenues generated for the project and successfully completed within specified timelines.
Handled the planning activities, requirement analysis, impact analysis, designing, development and testing of number of requirements and fixing various production issues.
Adept in analyzing information system needs, evaluating end user requirements, custom designing solutions, troubleshooting for complex information systems such as Telecom systems.
Actively involved in delivering change requests issued by the client and parallel IT support teams
Took performance initiatives as a part of value add to the customer and resolved problem requests occurring from time to time by coordinating with parallel IT team delivering Level-2 support.
Ensured timely month end clearance to facilitate smooth billing process.
Actively participated in cross-functional s and knowledge sharing activities.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

San Jose, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship