Hadoop Developer Resume
San Jose, CA
SUMMARY
- 8 years of extensive IT experience with multinational clients with over 2+ years of Bigdata experience developing Bigdata/Hadoop applications.
- Hands on experience with the Hadoop stack (MapReduce, HDFS, Sqoop, Pig, Hive, HBase, Flume,Yarn, Oozie and Zookeeper)
- Experience in importing and exporting terabytes of data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Experienced with performing real time analytics on NoSQL databases like HBase.
- Worked with Oozie workflow engine to schedule time based jobs to perform multiple actions.
- Analyzed large amounts of data sets writing Pig scripts and Hive queries
- Experienced in writing Map Reduce programs & UDFs for both Hive & Pig in Java
- Used Flume to channel data from different sources to HDFS.
- Supported MapReduce Programs running on the cluster and wrote custom Map Reduce Scripts for Data Processing in Java.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse and Data Mart
- Having experience in the Data Analysis, Design, Development, Implementation and Testing of Data Ware Housing and using Data Conversions, Data Extraction, Data Transformation and Data Loading (ETL)
- Strong knowledge of Software Development Life Cycle (SDLC) including requirement analysis, design, development, testing, and implementation. Provided End User and Support.
- Cluster maintenance as well as creation and removal of nodes using tools like Ganglia, Nagios and other tools.
- Implementing a technical solution on POC's, writing programming codes using technologies such as Hadoop, YARN, Python and Microsoft SQL Server
- Experienced with Spark using Scala and Python
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, and Flume.
- Experience in working with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
- Experience in analyzing data using HIVEQL, PIG Latin and custom Map Reduce programs in JAVA. Extending Hive and PIG core Functionality by using custom User Defined Functions.
- Experience with Oozie Workflow Engine to automate and parallelize Hadoop Map/Reduce and Pig jobs.
- Experience in using shell scripting and python for automation in project.
- Good knowledge of Linux as Hadoop runs on Linux.
- Having good hands on experience on different flavours of Linux like Suse and RedHat.
- Manage Teradata Database using Teradata Administrator, Teradata SQL Assistant and BTEQ.
- Having good experience in ETL Testing, Developing and Supporting Informatica applications.
- Create / Modify / Drop Teradata objects like Tables, Views, Join Indexes, Triggers, Macros, Procedures, Databases, Users, Profiles and Roles.
- Managing database space, allocating new space to database, moving space between databases as needed basis.
- Participated in the Data Migration activities between Development and Production environments.
- Performed Performance tuning on queries with efficiency of PI/SI indexes, Join Index, PPI, Using Explain analyzing the data distribution among AMPs and index usage, collect statistics, definition of indexes, revision of correlated sub queries, usage of Hash functions, etc.
- Experienced in all facets of Software Development Life Cycle (Analysis, Design, Development, Testing and maintenance) using Waterfall and Agile methodologies.
- Experience in Agile methodology and implementation of enterprise agile practices
- Motivated team player with excellent communication, interpersonal, analytical and problem solving skills.
TECHNICAL SKILLS
Database: NoSQL Databases (HBase, Cassandra, Rik, MongoDB), Teradata 13.10, Oracle 10g/9i, SQL Server 2005
Tools: BigData - Hadoop, HDFS, Flume, Sqoop, Yarn, Pig, Hive, Scala, Spark,MapReduce, Oozie, Python, Mongodb, Tez, Ganglia, Nagios.Teradata SQL Assistant, BTEQ, Teradata Administrator, Teradata Viewpoint, Priority Scheduler, Teradata Statistics Wizard, Teradata Visual Explain, MultiLoad, FastLoad, FastExport, Tpump, SQL*Plus.
ETL & Reporting: INFORMATICA, SSIS, SSRS, SSAS
Build Tools: Maven and Jenkins
SQL assistant tools: Toad, Squirrel
Programming Languages: SQL, PL/PLSQL, C, C++, HTML, Perl and Shell Programming, Java
Methodology: Agile Scrum, JIRA and Version One
Versioning systems: SVN and GIT
Operating Systems: Microsoft Windows XP/NT/2007, UNIX, Linux
PROFESSIONAL EXPERIENCE
Confidential, San Jose, CA
Hadoop Developer
Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Java, Flume, Cloudera, Spark, Oozie, MySQL, UNIX, Core Java, Impala, Python.
Responsibilities:
- Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
- Written complex Hive and SQL queries for data analysis to meet business requirements.
- Expert in importing and exporting data into HDFS and Hive using Sqoop.
- Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including Pig, Hive, Sqoop, Oozie and Zookeeper.
- Expert in writing HiveQL queries and Pig Latin scripts.
- Experience in importing and exporting terabytes of data using Sqoop from Relational Database Systems to HDFS.
- Experience in providing support to data analyst in running Pig and Hive queries
- Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive serdes like REGEX, JSON and Avro.
- Experience in developing customized Hive UDFs and UDAFs in Java, JDBC connectivity with hive development and execution of Pig scripts and Pig UDF's
- Experience in validating and cleansing the data using Pig statements and hands-on experience in developing Pig MACROS.
- Experience in using Sqoop to migrate data to and fro from HDFS and My SQL or Oracle and deployed Hive and HBase integration to perform OLAP operations on HBase data.
- Used Flume in Loading log data into HDFS.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
- Worked on python scripts to analyze the data of the customer.
- Experienced in running Hadoopstreaming jobs to process terabytes of formatted data using python scripts.
- Created HIVE managed and external tables.
- Load and transform large sets of structured, semi structured using Hive and Impala.
- Moving the data from Oracle, Teradata and MS SQL Server in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
- Responsible for design and creation of Hive tables and worked on various performance optimizations like Partition, Bucketing in hive.Handled incremental data loads from RDBMS into HDFS using Sqoop.
- Designing conceptual model with Spark for performance optimization
- Used Oozie scheduler to automate the pipeline workflow and orchestrate the sqoop, hive and pig jobs that extract the data on a timely manner.
- Used Shell scriptingfor Jenkins job automation.
- Exported the result set from Hive to MySQL using Shellscripts.
- Involved in building up of Hadoopecosystem using AWSEC2 servers
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
- Implemented Spark applications from existing MapReduce framework for better performance
- Involved in the process of load, transform and analyze health care data from various providers into Hadoop using flume on an on-going basis.
- Filtered, transformed and combined data from multiple providers based on payer filter criteria using custom Pig UDFs.
- Analyzed data using HiveQL to generate payer by reports for transmission to payers form payment summaries.
- Extensively worked on PIG scripts data cleansing and optimization.
- Responsible for design and creation of Hive tables, partitioning, bucketing, loading data and writing hive queries.
- Importing and exporting data into HDFS, Hive and Hbase using Sqoop from Relational Database.
- Exported analyzed data to downstream systems using Sqoop for generating end-user reports, Business Analysis reports and payment reports.
- Analyzed large amounts of data sets from hospitals and providers to determine optimal way to aggregate and generate summary reports.
- Worked with the Data Science team to gather requirements for various data mining projects.
- Used HDFS system to copying files from local to hdfs file system.
- DevelopedHiveandImpalascripts on Avro and parquet file formats.
Confidential, Omaha, NE
Hadoop Developer
Responsibilities:
- Experience with professional software engineering practices and best practices for the full software development life cycle including coding standards, code reviews, source control management and build processes.
- Effectively used Sqoop to transfer data between databases and HDFS.
- Designed workflow by scheduling Hive processes for Log file data, which is streamed into HDFS using Flume.
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Developed Pig Latin scripts to extract the data from the mainframes output files to load into HDFS.
- Developed Map-Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Written Hive queries for analyzing and reporting purposes of different streams in the company.
- Processed the source data to structured data and store in NoSQL database Couchbase.
- Created alter, insert and delete queries involving lists, sets and maps in Couchbase.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
- Use Avro serialization technique to serialize data.Applied transformations and standardizations and loaded into HBase for further data processing.
- Created HB-ase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Exported the analyzed data to the relational databases using Sqoop for virtualization and to generate reports for the BI team.
- Documented all the requirements, code and implementation methodologies for reviewing and analysation purposes.
Confidential, Stamford, CT
Teradata DBA
Responsibilities:
- Understanding the specification and analyzed data according to client requirement.
- Creating roles and profiles as needed basis. Granting privileges to roles, adding users to roles based on requirements.
- Managing database space, allocating new space to database, moving space between databases as needed basis.
- Assist developers, DBAs in designing, architecture, development and tuning queries of the project. This included modification of queries, Index selection, and refresh statistic collection.
- Proactively monitoring bad queries, aborting bad queries using PMON, looking for blocked sessions and working with development teams to resolve blocked sessions. q qProactively monitoring database space, Identifying tables with high skew, working with data modeling team to change the Primary Index on tables with High skew.
- Worked on moving tables from test to production using fast export and fast load.
- Extensively worked with DBQL data to identify high usage tables and columns.
- Implemented secondary indexes on highly used columns to improve performance
- Worked on exporting data to flat files using Teradata FEXPORT.
- Worked exclusively with the Teradata SQL Assistant to interface with the Teradata.
- Written several Teradata BTEQ scripts to implement the business logic.
- Populated data into Teradata tables by using Fast Load utility.
- Created Teradata complex macros and Views and stored procedures to be used in the reports.
- Did error handling and performance tuning in Teradata queries and utilities.
- Creating error log tables for bulk loading.
- Worked on capacity planning, reported disk and CPU usage growth reports using Teradata Manager, DBQL, and Resusage.
- Used Teradata Manager collecting facility to setup AMP usage collection, canary query response, spool usage response etc.
- Developed complex mappings using multiple sources and targets in different databases, flat files.
- Developed Teradata BTEQ scripts. Automated Workflows and BTEQ scripts
- Worked on exporting data to flat files using Teradata FEXPORT.
- Query optimization (explain plans, collect statistics, Primary and Secondary indexes)
Confidential, Cincinnati, OH
Teradata DBA
Responsibilities:
- Performed Data analysis and prepared the Physical database based on the requirements.
- Used Teradata Utilities to ensure High System performance as well as High availability.
- Implementation of TASM for performance Tuning and Workload Management.
- Usage of analyst tools like Tset, Index Wizard, and Stats Wizard to improve performance.
- Responsible for populating warehouse-staging tables.
- Responsible for capacity planning and performance tuning.
- Prepared Performance Matrices.
- Worked on and developed scripts for CronTAB to automate the monitoring tasks.
- Created Teradata objects like Databases, Users, Profiles, Roles, Tables, Views and Macros.
- Developed complex mappings using multiple sources and targets in different databases, flat files.
- Worked on Space considerations and managed Perm, Spool and Temp Spaces.
- Developed BTEQ scripts for Teradata.
- Automated Workflows and BTEQ scripts
- Responsible for tuning the performances of Informatica mappings and Teradata BTEQ scripts.
- Worked with DBAs to tune the performance of the applications and Backups.
- Worked on exporting data to flat files using Teradata FEXPORT.
- Query optimization (explain plans, collect statistics, Primary and Secondary indexes)
- Build tables, views, UPI, NUPI, USI and NUSI.
- Written several Teradata BTEQ scripts to implement the business logic.
- Worked exclusively with the Teradata SQL Assistant to interface with the Teradata.
- Written various Macros and automated Batch Processes
- Writing UNIX Shell Scripts for processing/cleansing incoming text files.
- Used CVS as a versioning tool.
- Coordinating tasks and issues with Project Manager and Client on daily basis.
Confidential
ETL Developer
Responsibilities:
- As a part of Enterprise reporting application, maintained a loan management system offering various loan products to customer.
- Managed Operation Data Store and worked on database maintenance and other administrative activities.
- Hands on experience on ETL tools like Oracle warehouse Builder (OWB) and Business Objects Data Integration (BODI)
- Coordinated with Business Analysts/ downstream/ Source systems for requirement analysis.
- Actively participated in requirement analysis, planning, estimation, coding & testing.
- Ensured timely deliverables to the clients for any CR/Defects.
- Implemented of various control checks to ensure data integrity.
- Led performance tuning of database/process improvement steps.
- Maintained various compliances in project in perspective of audits, Project Management Reviews (PMR)
- Familiar with project related activities like IPMS task creation, UMP generation, PMR kit creation, Unified Project Plan.
Confidential
ETL Developer
Responsibilities:
- As a part of Tata Communication Billing application enhancement and support, membered the mediation module including data collection from switch and transforming into readable format as prescribed for billing activities.
- Managed the integration of two new Sri Lanka switches with Mediation India systems, which helped in increasing the revenues generated for the project and successfully completed within specified timelines.
- Handled the planning activities, requirement analysis, impact analysis, designing, development and testing of number of requirements and fixing various production issues.
- Adept in analyzing information system needs, evaluating end user requirements, custom designing solutions, troubleshooting for complex information systems such as Telecom systems.
- Actively involved in delivering change requests issued by the client and parallel IT support teams
- Took performance initiatives as a part of value add to the customer and resolved problem requests occurring from time to time by coordinating with parallel IT team delivering Level-2 support.
- Ensured timely month end clearance to facilitate smooth billing process.
- Actively participated in cross-functional s and knowledge sharing activities.