Hadoop Admin/architect Resume
Woonsocket, RI
SUMMARY:
- Over Fourteen years of professional work experience in the IT Industry; More than Five years of experience with Hadoop, HDFS, MapReduce and Hadoop Ecosystem (Pig, Hive, HBase).
- Experience working with clients of all sizes in the financial, Retail, healthcare, logistics and manufacturing industries.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Resource Manager, Node Manager, Name Node, Data Node programming paradigm.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, and Flume.
- Experience in installation, configuration, supporting and managing - Cloudera's Hadoop platform along with CDH3&4 clusters.
- From Cloudera Manager Admin Console, Installed CDH4 Services such as HDFS, MapReduce, Zookeeper, Hive, Oozie, Hue, and Impala.
- Good Exposure on Apache Hadoop Map Reduce programming, PIG Scripting and Distribute Application and HDFS.
- Experience in NoSQL database MongoDB and Cassandra.
- Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
- Developed Teradata 13.10 and14.0 features such as PPI, Character PPI, and Timestamp PPI, and Columnar Tables.
- Expertise in Developing Materialized Views(Semantic Layer).
- Expert DBA skills in Teradata DBMS Administration, development and production DBA support, use of Teradata Manager, FASTLOAD, MLOAD, TSET, TPUMP, SQL, PDCR and ARCMAIN, TASM for workload management.
- Extensively used DataStage comprising DataStage Server / Parallel Extender (Enterprise Edition) / Mainframe (Enterprise MVS Edition) - DataStage Designer, DataStage Director, DataStage Manager, and DataStage Administrator.
- Experienced with Data Set, File Set, Lookup File Set, Transformer, Sort, Join, Merge, Lookup, Funnel, Copy, and Modify stages.
- Experience in planning and coordinating Disaster recovery solution for Teradata
- Working with BUSINESS OBJECTS/OBIEE/Hyperion team to optimize various report run times by applying various TD performance techniques such as Partitioning, JIs, AJIs, USI and NUSI.
- Did the Teradata SQL troubleshooting and fine tuning of scripts given by Analysts and developers.
- Defined account IDs, priority scheduler performance groups, and system date and time substitution variables in user and profile definitions.
- Responsible for the design and implementation of the entire DW Dimensional Model architecture using Join Indexes and Partitioned Primary Indexes to achieve and exceed the desired performance.
- Having good experience in Linux operating system and command set as well as competence with shell scripting.
- Involved in Implementation projects such as ORACLE to TERADATA DB/DW, and SQL SERVER to TERADATA conversion projects.
TECHNICAL SKILLS:
Database: Teradata 14.0, DB2, Oracle 9i /11g, MicroStrategy, Vsam, DB2/UDB, SQL Server
Hadoop / Big Data: Cloudera Hadoop, Hive, HDFS, MapReduce, Impala, Hue,Oozie, Zookeeper, Sqoop, HA Journal Nodes, Fair Scheduler, Oracle Loader for Hadoop, DistCp, Ganglia
O/S: Windows, Ubuntu, Red Hat Linux, UNIX
Reporting Tool: Business Objects XI, OBIEE, Hyperion 11.1, Cognos
ETL: Informatica 9.0.1, IBM InfoSphere, Datastage 9.1, Ab Initio, Mload, Fload, Xport, Bteq, Tpump, JCL, SQL, PL/SQL, SAS, XML,ER Studio.
Hadoop Distributions: Cloudera(CDH3,CDH4,CDH5),Hortonworks, Amazon EMR
No SQL Database: Mongo DB, Hive, Cassandra
Application Server: Apache Tomcat, JBoss, Web Sphere, Web Logic
Modeling Tools: Erwin and Visio Enterprise
Operating Systems: Windows, Ubuntu, Red Hat Linux, UNIX
Methodologies: Business Process, Relational Data Modeling, Logical Data Modeling, Agile, Object Oriented Modeling, SQL Assistant, Object Relational Modeling
Backup Technologies: ABU, Net vault/Netbackup GUI tools, Bar scripts, Data Mover, Tara
Others: PVCS, TOAD, Sql Assistant, HP Service Manager Scheduler, Shell scripting, sed, awk, Perl, Python and many others. Apache, Tomcat, JDBC, JSP, Java script
PROFESSIONAL EXPERIENCE:
Confidential, Woonsocket, RI
Hadoop Admin/Architect
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Installed operating system and Hadoop updates, patches, version upgrades when required.
- Responsible for implementation and administration of Hadoop infrastructure.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Configured High Availability for HDFS NameNode that uses Quorum-based Storage.
- Monitor Hadoop cluster connectivity and security and Manage Hadoop log files.
- Mainframe File system management and monitoring and HDFS support and maintenance.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
- Setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users
- Implemented NameNode backup using NFS. This was done for High availability.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Involved in the installation of CDH3 and up-gradation from CDH3 to CDH4.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Cluster maintenance and creation and removal of nodes using tools like Cloudera Manager Enterprise.
- Use of Sqoop to import and export data from HDFS to RDBMS(TERADATA) and vice-versa.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports.
- Performance tuning of Hadoop clusters and Hadoop MapReduce routines and Hadoop cluster job performances and capacity planning
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Automated workflows using shell scripts pull data from various databases into Hadoop and rolling day-to-day processes
- Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment: Hadoop 2.6.0, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, Zoo Keeper, CDH3, TERADATA 14.00, Datastage 9.1,Toad 5.1 and Unix/Linux.
Confidential, Atlanta
Hadoop Admin/Architect
Responsibilities:
- Provide advice on architect database design for large, complex database systems utilizing an array of database technologies such as Oracle RAC & Data Guard, MySQL, PostgreSQL, SQL Server and Big Data technologies such as HADOOP, and MarkLogic.
- Validated Hadoop Servers (24 nodes) Pre-Install Checklist - Disk Configuration, DNS Lookup, Network Configuration, Swappiness, Hugepage compaction, Kernal parameters, ulimits etc.
- Installed MySQL on CM Server and Created Databases for Cloudera Management Services (Service Monitor, Activity Monitor, Host Monitor, Reports Manager, and Resource Manager), Hive Metastore, Hue, and Oozie.
- Installed MySQL on NFS Server and Created Database for Impala Stats.
- Installed JDK 7 and Java Crypotography Extension (JCE) Unlimited Strength Jurisdiction Policy Files for Kerberos.
- Installed MySQL JDBC Connector on CM and NFS Servers.
- Created Yum Repositories for Cloudera packages to use Satellite Server.
- Configured passwordless SSH for all 24 nodes.
- Installed Cloudera Manager Server and configured the database for Cloudera Manager Server.
- From Cloudera Manager Admin Console, Installed CDH4 Services such as HDFS, MapReduce, Zookeeper, Hive, Oozie, Hue, and Impala.
- Configured MySQL for Hue, Oozie, and Impala (to store statistics).
- Changed ports for HiveServer2.
- Configured High Availability for HDFS NameNode that uses Quorum-based Storage.
- Configured Fair Scheduler.
- Increased Java Heap Size for Hive Metastore, HiveServer2, and Hue/Beeswax.
- Enabled Job Tracker and Hue to listen on all interfaces.
- Set purge and expiration for Management Services to limit the space usage.
- Installed and Configured Kerberos Security.
- Created Principal and keytab files for hdfs, mapreduce, and application user accounts in Keberos.
- Applied Hive Patch.
- Verified firewall rules from Citrix farm to Hadoop.
- Configured alerts in Cloudera Manager.
- Added Pentaho node to Hadoop Cluster and Configured gateway and Kerberos client on pentaho server.
- Installed and Configured Ganglia Monitoring.
Confidential, Atlanta, GA
Database/Unix admin.
Responsibilities:
- Involved in Setting up the Teradata database, Created databases and users, Allocated perm and spool space
- Applied data protection including Transient Journal, Fallback and RAID. Resolved deadlocks and involved in Granting and revoke security privileges.
- Defined Permanent space limits both Confidential the database and user level.
- Created users, databases, roles, profiles and accounts.
- Established logon security, including external authentication.
- Defined account IDs, priority scheduler performance groups, and system date and time substitution variables in user and profile definitions.
- Experienced in Loading, archiving and restoring data Duties also involved are storage optimization, performance tuning, monitoring, UNIX shell scripting, and physical and logical database design.
- Controlled and tracked access to Teradata Database by granting and revoking privileges.
- Allocated spaces to the users, controlled Spool spaces and assigning of table spaces
- Planned the releases, monitored performance and reported to Teradata for further technical issues.
- Explained Users by showing PMON about their inefficient queries through WebEx
- Involved with Teradata to deploy patches, install, fix and figured out the settings.
- Applied DBQL settings to the business and application standards.
- Designed DataStage Story Boards and ETL jobs for extracting data from Teradata Database, Flat and sequential files transform and finally load into the Data warehouse
- Designed close to 100 User defined routines for efficient data processing, data cleansing and domain validation.
- Created and upgraded Oracle 8i/9i databases on AIX 4.3.3/5.1 64 bit 5L, Solaris 8 and Red Hat Enterprise Linux AS platforms.
- Enhanced application performance by tuning several database components like shared pool, buffer cache, I/O distribution etc.
- Tuned SQL statements using explain plan and SQL trace.
- Pinned system objects and cached frequently accessed small tables/indexes.
- Developed shell scripts, triggers, stored procedures and set up the cronjobs for nightly backups.
- Used Oracle Enterprise Manager for Database Administration.
- Handled Backup & restore of the databases.
- Loaded the data using Oracle tools like export, import and SQL Loader.
- Installed and configured OpenLDAP for Oracle names resolution.
- Compiled Apache/2.0.49 source code with SSL support.
- Provided suggestions and participated in the actions of developer teams.
- Installed, configured and managed concurrent versions system (CVS) on RedHat Linux v9.0 with WinCVS as client using OpenSSH connection to the CVS server.
- Implemented triggers to monitor restricted tables and database access
- Coded UNIX scripts for Database extraction and filtering.
- Responsible for the source data profiling and data cleansing before transformation for avoiding validation overhead down the data flow path.
- Responsible for Schema comparison and DDL validation.
- Reverse engineered the source code to generate models and also to write functional requirements.
- Designed ETL version of datamodel for the developers to understand the modeling from the developed logical and physical models.
Confidential, Riverwoods, IL
Admin.
Responsibilities:
- Installed, configured, upgraded and maintained Oracle 8i/9i databases on Solaris, Linux, HP-UX, AIX & Win NT/2000 and SQL Server7.0/2000 databases on Windows NT/2000.
- Design, debug and manage stored procedures, functions and triggers.
- Monitored, tracked and analyzed Oracle databases to improve performance and productivity for customers.
- Utilized monitoring tools like OEM to determine best use of memory, CPU, disk space, etc.
- Performed production & development support, performance monitoring & tuning, and database backup & recovery
- Setup connectivity between Web servers and database servers using JDBC & ODBC. Also setup Oracle internet Application servers and WebDB.
- Configured multi-master replication and parallel server setup.
- Scripting in Perl and Korn shell.
- Developed and generated all required production Operations documentation according to established standards.
- Provided Database Administrative services and 24x7 on-call support to various customers like Proctor & Gamble, Coke, AMVESCAP, NCS, Schering Plough, Brown & Williamson etc in production environments.
- Advised Managers on database concepts and functional capabilities.
Confidential, Buffalo, NY
Teradata/Unix Admin
Responsibilities:
- Managing database space, allocating new space to database, moving space between databases as need basis.
- Creating roles and profiles as needed basis. Granting privileges to roles, adding users to roles based on requirements.
- Extensively worked with DBQL data to identify high usage tables and columns. Redesigning the Logical Data Models, and Physical Data Models.
- Use of Teradata Manager, BTEQ, FASTLOAD, MULTILOAD, TPUMP, SQL and TASM for workload management.
- Monitoring bad queries, aborting bad queries using PMON, looking for blocked sessions and working with development teams to resolve blocked sessions.
- Perform workload management using various tools like Teradata Manager, Fast Load, Multi Load, TPump, TPT, SQL Assistant and TASM.
- Involved writing BTEQ scripts for validation & testing of the sessions, data integrity between source and target databases and for report generation.
- Developed Backup and Recovery (BAR) procedures for Development, Testing and Production.
- Enterprise wide templates were created for handling SCD, Error handling etc.
- Involved in writing Teradata SQL bulk programs and in Performance tuning activities for Teradata SQL statements using Teradata EXPLAIN.
- Used External Loaders like Multi Load, T Pump and Fast Load to load data into Teradata Database. Involved in analysis, development, testing, implementation and deployment.
- Created debugging sessions for error identification by creating break points and monitoring the debug monitor.
- Created Stored Procedures to transform the data and worked extensively on SQL, PL/SQL for various needs of the transformations.
Confidential, Medicaid, CA
Teradata Admin
Responsibilities:
- Responsible and involved in Setting up the Teradata database, users, Roles and Profiles
- Responsible for Architectural Solutions for the applications on the Teradata system.
- Used Erwin data Modeler for Data Modeling and using Star Schema, Snowflake Schema, Fact and Dimension tables, Physical and Logical data modeling for Data Warehousing and Data Mart.
- Responsible in interacting with the users and helping in writing proper SQL and opportunities in tuning improvements
- Extensively working closely with the BI Integration and Reporting team for day-to-day issues and resolving in it.
- Proactively and periodically check with the users for issues and upcoming tasks.
- Responsible for capacity planning of the DWH and plan road map for the growth.
- Primary DBA for Performance tuning and recommendations for applications.
- Responsible in interacting with the users and helping in writing proper SQL and opportunities in tuning improvements.
- Analyzing & Validating the source data, and design source to target mapping, transformation logic, document the approach and scripts, database layout.
- Developed various Mappings and Transformations using Informatica Designer
- Expertise in implementing complex Business rules by creating robust mappings, mapplets, and reusable transformations using Informatica Power Center and Power Mart.
- Work with developers to promote their code and solve day to day issues.
- Produced documentation and procedures for best practices in Teradata development and administration.
- Developed pro- active processes for monitoring capacity and performance.
- Performed implementation analysis for Capacity on Demand (COD), CPU limits and Resusage monitoring.
- Worked with various user groups and developers to define TASM Workloads, developed TASM exceptions, implemented filters and throttles as needed basis
- Monitoring database space, Identifying tables with high Skew, working with data modeling team to change the Primary Index on tables with High skew
- Implemented various Teradata alerts using Alert facility in Teradata Manager. Involved in setting up alters to page DBA for events such as node down, AMP down, too many blocked sessions, high data skew etc.
- Involved in database upgrades, TTU client software upgrade.
- Used Teradata Manager collecting facility to setup AMP usage collection, canary query response, spool usage response etc.
- Worked on capacity planning, reported disk and CPU usage growth reports using Teradata Manager, DBQL, and Resusage.
Confidential, MI
Teradata Developer
Responsibilities:
- Development of scripts for loading the data into the base tables in EDW and to load the data from source to staging and staging area to target tables using FastLoad, MultiLoad and BTEQ utilities of Teradata. Writing scripts for data cleansing, data validation, data transformation for the data coming from different source systems.
- Performed application level DBA activities creating tables, indexes, monitored and tuned Teradata BETQ scripts using Teradata Visual Explain utility.
- Written complex SQLs using joins, sub queries and correlated sub queries. Expertise in SQL Queries for cross verification of data.
- Developed the Teradata Macros, Stored Procedures to load data into Incremental/Staging tables and then move data from staging into Base tables
- Reviewed the SQL for missing joins & join constraints, data format issues, mis-matched aliases, casting errors.
- Developed procedures to populate the customer data warehouse with transaction data, cycle and monthly summary data, and historical data.
- Dealt with initials, delta and Incremental data as well Migration data to load into the Teradata.
- Analyzing data and implementing the multi-value compression for optimal usage of space.
- Query Analysis using Explain for unnecessary product joins, confidence factor, join type, order in which the tables are joined.
- Very good understanding of Database Skew, PPI, Join Methods and Join Strategies, Join Indexes including sparse, aggregate and hash.
- Used extensively Teradata Analyst Pack such as Teradata Visual Explain, Teradata Index Wizard and Teradata Statistics Wizard.
- Used extensively Derived Tables, Volatile Table and GTT tables in many of the ETL scripts.
- Flat files are loaded into databases using FastLoad and then used in the queries to do joins.
- Use SQL to query the databases and do as much crunching as possible in Teradata, using very complicated SQL Query optimization (explains plans, collect statistics, data distribution across AMPS, primary and secondary indexes, locking, etc) to achieve better performance
- Use PMON, Teradata manager to monitor the production system during online day.
- Excellent experience in performance tuning and query optimization of the Teradata SQLs.
- Developed mappings in Ab Initio to load the data from various sources using various Ab Initio Components such as Partition by Key, Partition by round robin, Reformat, Rollup, Join, Scan, Normalize, Gather, Merge etc.
- Created checkpoints, phases to avoid dead locks and tested the graphs with some sample data then committed the graphs and related files into Repository from sandbox environment. Then schedule the graphs using Autosys and loaded the data into target tables from staging area by using SQL Loader.
- Implemented Data parallelism by using Multi-file System, Partition and De-partition components and also preformed repartition to improve the overall performance
- Developed graphs separating the Extraction, Transformation and Load process to improve the efficiency of the system.
- Involved in designing Load graphs using Ab Initio and Tuned Performance of the queries to make the load process run faster.
- Extensively used Partition components and developed graphs using Write Multi-Files, Read Multi-Files, Filter by Expression, Run Program, Join, Sort, Reformat, and Dedup.
- Used Data profiling task to identify problems in the data that have to be fixed.
- Performed validations, Data Quality checks and Data profiling on incoming data.
- Used Enterprise Meta Environment (EME) for version control, Control-M for scheduling purposes.
- Used AIR commands to do dependency analysis for all ABI objects
- Testing and tuning the Ab Initio graphs and Teradata SQL’s for better performance
- Developed UNIX shell scripts to run batch jobs in Autosys and loads into production.
- Interaction with different teams for finding failure of jobs running in production systems and providing solution, restarting the jobs and making sure jobs complete in the specified time window.
- Provide 24*7 production support for the Teradata ETL jobs for daily, Monthly and Weekly Schedule.
Confidential
Teradata Developer
Responsibilities:
- Involved in Logical Data Modeling and Physical Data Modeling.
- Monitoring Production system whenever in need and controlling Online and Batch jobs.
- Worked on exporting data to flat files using Teradata Fast Export (FEXPORT).
- Query optimization (Explain plans, Collect statistics, Primary and Secondary indexes).
- Attending the calls with Business and other technical groups to resolve the production issues.
- Build tables, views, with USI and NUSI.
- Created separate table spaces for Users, Indexes, Rollback Segments and Temporary Segments. Renamed and resized redo log files, user data files, rollback files, index files and temporary files.
- Created and maintained Triggers, Packages, Functions and Procedures.
- Created and maintained roles and profiles for security and limited access of data.
- Worked with the users and testing teams to implement the business logic as expected.
- Written several Teradata BTEQ scripts to implement the business logic.
- Worked in creating users and databases, tables and other objects using WINDDI, and BTEQ scripts.
- Developed several complex Datastage mappings using transformations like Lookup, Router, Update Strategy, Aggregator, Filter, Joiner, and Sorter to in corporate business rules in transformation.
- Involved in creating roles and profiles, and assigning them to users to control access to the system.
- Used Teradata manager, PMON in monitoring the system.
- Experience in using Netbackup for backup and recovery tasks.
- Familiar with priority scheduler in giving priorities to different load operations.