Hadoop Admin Resume
Pittsburgh, PA
SUMMARY
- Having 10+ years of IT Experience in Analysis, Design, Development, Implementation and Testing of enterprise wide application, Data Warehouse, Client Server Technologies and Web - based Applications.
- Over 3 Years of experience in dealing with Apache Hadoop components like HDFS, MapReduce, HIVE, Hbase, PIG, SQOOP,NAGIOS, Spark,Impala,OOZIE, and Flume Big Data and Big Data Analytics.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, NameNode, Job Tracker, DataNode, Task Tracker and Map Reduce concepts.
- Experience in installation, configuration, support and management of a Hadoop Cluster.
- Experience in task automation using Oozie, cluster co-ordination through Pentaho and MapReduce job scheduling using Fair Scheduler.
- Experience in analyzing data using HiveQL, Pig Latin and custom Map Reduce programs in Java.
- Experience in writing custom UDF’s to extend Hive and Pig core functionality.
- Worked with Sqoop to move (import/export) data from a relational database into Hadoop and used FLUME to collect data and populate Hadoop.
- Worked with HBase to conduct quick look ups (updates, inserts and deletes) in Hadoop.
- Experience in working with cloud infrastructure like Amazon Web Services (AWS) and Rackspace.
- Experience in Core Java, Hadoop Map Reduce related program. Used Hive to transfer the data from RDBMS to our Hive datawarehouse.
- Experience in writing PigLatin. Use Pig Interpreter to run Map Reduce jobs.
- Experience in storing and managing data on H-catalog data model.
- Experience in writing SQL queries to process some joins on Hive table and No SQL Database.
- Experience in Agile Methodology, Management tracking and bug tracking using JIRAWorking experience on designing and implementing complete end-to-end Hadoop Infrastructure including Pig, Hive, Sqoop, Oozie and Zookeeper.
- Experience in DBMS Utilities such as SQL, PL/SQL, TOAD, SQL*Loader, Teradata SQL Assistant.
- Experienced with Teradata utilities Fast Load, Multi Load, BTEQ scripting, FastExport, OleLoad, SQL Assistant.
- Skillfully exploit OLAP analytical power of Teradata by using OLAP functions such as Rank, Quantile, Csum, MSum, group by grouping set etc to generate detail reports for marketing folks.
- Worked with Transform Components such as Aggregate, Router, Sorted, Filter by Expression, Join, Normalize and Scan Components and created appropriate DMLs and Automation of load processes using Autosys.
- Extensively worked on several ETL assignments to extract, transform and load data into tables as part of Data Warehouse development with high complex Data models of Relational, Star, and Snowflake schema.
- Experienced in all phases of Software Development Life Cycle (SDLC).
- Expert knowledge in using various Transformation components such as Join, Lookup, Update, Router, Normalize, De-normalize, Partitioning and De-partitioning components etc.
- Experience in Data Modeling, Data Extraction, Data Migration, Data Integration, Data Testing and Data Warehousing using Ab Initio.
- Configured Informatica environment to connect to different databases using DB config, Input Table, Output Table, Update table Components.
- Able to interact effectively with other members of the Business Engineering, Quality Assurance, Users and other teams involved with the System Development Life cycle
- Excellent Communication skills in interacting with various people of different levels on all projects and also playing an active role in Business Analysis.
TECHNICAL SKILLS
Big Data Ecosystem: Cloudera, Hortonworks, Hadoop, MapR, HDFS, HBase, Zookeeper, Nagios, Hive, Pig, Ambari.Spark,Impala
Utilities: Oozie, Sqoop, HBase, NoSQL, Cassandra, Flume.
Data warehousing Tools: Informatica 6.1/7.1x,9.x,Data Stage
Data Modeling: Star-Schema Modeling, Snowflakes Modeling, Erwin 4.0, Visio
RDBMS: Oracle 11g/10g/9i/8i/,Teradata 13.0, Teradata V2R6, Teradata 4.6.2, DB2, MS SQL Server 2000, 2005,2008
Programming: UNIX Shell Scripting, C/C++, Java, Korn Shell, SQL*Plus, PL/SQL,HTML
Operating Systems: Windows NT/XP/2000, UNIX, LINUX(Redhat)
BI tools: Business Objects 4.0,BASE SAS 9.3
PROFESSIONAL EXPERIENCE
Confidential, Pittsburgh, PA
Hadoop Admin
Responsibilities:
- Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Hbase, Zookeeper and Sqoop.
- Extensively involved in Installation and configuration of Cloudera distribution Hadoop NameNode, Secondary NameNode, Resource Manager, Node Manager and DataNodes.
- Collected the logs data from web servers and integrated into HDFS using Flume.
- Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
- Installed Oozie workflow engine to run multiple Hive Jobs
- Developed data pipeline using Flume, Sqoop and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Experience in Implementing High Availability of Name Node and Hadoop Cluster capacity planning to add and remove the nodes.
- Installed and configured Hive, Hbase
- Identity, Authorization and Authentication including Kerberos Setup.
- Configuring Sqoop and Exporting/Importing data into HDFS
- Configured NameNode high availability and NameNode federation.
- Experienced in loading data from UNIX local file system to HDFS.
- Configured NameNode high availability and NameNode federation.
- Use of Sqoop to import and export data from HDFS to Relational database and vice-versa.
- Data analysis in running Hive queries.
Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, Ubuntu
Confidential, Germantown, MD
Hadoop Admin
Responsibilities:
- Developed and implemented platform architecture as per established standards.
- Supported integration of reference architectures and standards.
- Utilized Big Data technologies for producing technical designs, prepared architectures and blue prints for Big Data implementation.
- Assisted in designing, development and architecture of Hadoop clusters and HBase systems.
- Coordinated with technical teams for installation of Hadoop and third related applications on systems.
- Formulated procedures for planning and execution of system upgrades for all existing Hadoop clusters.
- Supported technical team members for automation, installation and configuration tasks.
- Provided technical assistance for configuration, administration and monitoring of Hadoop clusters.
- Evaluated and documented use cases and proof of concepts, participated in learning of tools in Big Data systems.
- Developed process frameworks and supported data migration on Hadoop systems.
- Worked on Data Lake architecture to collate all the enterprise data into single place for ease of correlation, data analysis to find operational and functional issues in the enterprise workflow as part of this project.
- Designed ETL flows to get data from various sources, transform for further processing and load in Hadoop/HDFS for easy access and analysis by various tools.
- Developed multiple Proof-Of-Concepts to justify viability of the ETL solution including performance and compliance to non-functional requirements.
- Conduct Hadoop training workshops for the development teams as well as directors and management team to increase awareness.
- Prepare presentations of solutions to BigData/Hadoop business cases and present the same to company directors to get go-ahead on implementation.
- Collaborate with Hortonworks team for technical consultation on business problems and validate the architecture/design proposed.
- Designed end to end ETL flow for one of the feed having millions of records inflow daily. Used apache tools/frameworks Hive, Pig, Sqoop & HBase for the entire ETL workflow.
- Setup Hadoop cluster, build Hadoop expertise across development, production support and testing teams, enable production support functions, optimize Hadoop cluster performance in isolation as well as in context of the production workloads/jobs.
- Designed the Data Model to be used for correlation in Hadoop/Hortonworks.
- Designed Data flow and transformation functions for cleansing call records generated on various networks as well as reference data.
- Supported technical team members in management and review of Hadoop log files and data backups.
- Designed and proposed end-to-end data pipeline using falcon and Oozie by doing POCs.
- Use NAGIOS to configure cluster/server level alerts and notifications in case of a failure or glitch in the service.
Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, spark,impala,ZooKeeper, Nagios, Hortonworks HDP 2.0/2.1, MongoDB, Cassandra, Oracle, NoSQL and Unix/Linux.
Confidential, Memphis, TN
Hadoop Admin/Developer
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Extensively involved in Installation and configuration of Cloudera distribution Hadoop 2, 3, NameNode, Secondary NameNode, JobTracker, TaskTrackers and DataNodes.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs for data cleaning.
- Involved in clustering of Hadoop in the network of 70 nodes.
- Experienced in loading data from UNIX local file system to HDFS.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Involved in developing new work flow Map Reduce jobs using Oozie framework.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Involved in the installation of CDH3 and up-gradation from CDH3 to CDH4.
- Responsible for developing data pipeline using HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs
- Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Used Hive and created Hive external/internal tables and involved in data loading and writing Hive UDFs.
- Exported the analyzed data to relational databases using Sqoop for visualization and to generate reports.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Created Hive queries to compare the raw data with EDW reference tables and performing aggregates.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Automated workflows using shell scripts to pull data from various databases into Hadoop.
Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, ZooKeeper, CDH3, CDH4, MongoDB, Cassandra, Oracle, NoSQL and Unix/Linux.
Confidential, Orlando, FL
Lead SAP Business Objects Developer
Responsibilities:
- Interacted with Account Managers for various drug manufacturers to gather the new requirements
- Participated in Business Requirements Document review sessions.
- Prepared Business requirements documents and non-functional requirements documents.
- Working as on site coordinator and Leading team of 8 from Offshore.
- Performed requirement analysis of various BO reports and Universes.
- Worked on Technical specifications, Object mapping and GAP analysis.
- Debugged the universe.
- Designed, developed and maintained Universe for supporting specialty pharmacy Ad hoc and canned reports.
- Worked of Building Universe from Teradata
- Analyzed legacy pl/sql reports and converted them in to BO reports
- Measure objects and User objects were created keeping in view of the end-user requirement
- Created Filters (Condition Objects) to improve the report generation
- Created Hierarchies from providing drill down options for the end-user
- Created Context, Alias to resolve the Loops in the universe
- Designed and implemented Data Masking as per the HIPPA and state wise privacy laws.
- Involved in Unit Testing, Integrated Testing & User Acceptance Testing.
- Involved in creating Test plans and test scripts for reports and universe testing
- Worked on Performance Tuning of Universe and Reports
- Successfully implemented around 150 WebI reports and Universe in to Production
- Working on production support.
Environment: Windows XP, SAP Business Objects XI 3.1(Web Intelligence, Designer), Teradata.
Confidential, Blue Bell, PA
Informatica Developer/Data Warehouse Developer
Responsibilities:
- Worked with Architect, Business Analyst and Project Manager in data model design, development, testing, migration and scheduling of Informatica jobs.
- Worked on Data Domain BRS (DD Backup Recovery System) migration Project is to deliver customer data for the Analytical use and give support for the GreenPlum data warehouse users.
- Working on GDW Logistics Project is to deliver customer data for the Analytical use and give support for the GDW warehouse users.
- Imported Source and target tables from Sales force, GDW and Greenplum systems. Created Mappings in mapping designer by using transformations like Source Qualifier, Expression and Lookup, Update strategy, Sequence generator and Union.
- Created sessions and workflows in Workflow manager, used Greenplum writer option for fast load of data with Merge Option (Both INSERT and UPDATE) and INSERT/TRUNCATE for deletion of records from temporary target tables.
- After loading the Target table based on last Update Date from parameter file we again loaded control table for next run and Control table have columns like session number(Track all the sessions or objects), Workflow name, Session Name, Start and end run time, control id(Sequence key) and Number of rows.
- Used Informatica persistent variables method in one of my workflow to load records from Sales force cloud database based on key column in source table.
- Used Shortcuts (Global/Local) to reuse objects without creating multiple objects in the repository and inherit changes made to the source automatically.
- Migrated mappings, sessions and workflows from Development to Testing and then to Production environments by using labeling and creation of deployment group.
- Performed unit, integration and system level performance testing. Associated with production support team in various performance related issues. Participated in testing with TCOE team and solved the issues raised by TCOE team.
- Created Unit test/QA Peer review and Technical Design documents.
- Created the control M jobs by using control m tool and run jobs through control M enterprise manager tool.
- Maintained warehouse metadata and naming standards for future application development, Parsing high-level design document to simple ETL coding and mapping standards.
Environment: Informatica Power Center, Oracle 10g, Business Objects, Sales force, PG Admin tool, WINSQL, WINSCP, SAP R3, ECC, SAP BW, Data loader, Force.com explorer, Windows XP and 7, UNIX Shell Scripting, Control M, SQL*Loader.
Confidential, Blue Bell, PA
SAP Business Objects Consultant
Responsibilities:
- Directly coordinating with user to gather the new requirements and working with them to create use case
- Manage Users (using CMC created user and assign the different level of access, delete user). Also creating folders and providing the access to users
- Creating Folder and subfolders for different categories (Sales & Marketing, Contracts, Membership, Underwriting, Claims) of reports and which will help in providing access to the users.
- Designed, developed and maintained Universes for supporting Adhoc and canned reports
- Worked on Building Universes from Sybase for the HPS&SPA
- Worked with the infrastructure team to define reference architecture of the various reporting tools.
- Performed analysis of various BO reports and what they will translate to in BI world.
- Debugged the universes, contexts and the various BO reports for conversion to equivalent BI Web reports.
- Measure objects and User objects were created keeping in view of the end-user requirement
- Created Filters (Condition Objects) to improve the report generation
- Created Hierarchies from providing drill down options for the end-user
- Created Context, Alias to resolve the Loops in the universe
- Modifying the existing universes, Adding new objects editing joins, mapping the existing objects with renamed tables
- Creating new Connections and editing existing to run the report in different databases like test, development
- Used the @ Functions such as @ AggregateAware, @Select, @Prompt, @Variable in designing Universe
- Resolved the fantrap and chasm traps problems
- Detected cardinalities and checked the Integrity of Universes
- Created Dashboards using Dashboard Builder in Dashboard Manager
- Created Database Objects such as Views, Functions, Stored Procedures with Respect to the Requirements of Universe and Reports Development
- Created reports for Underwriting using the store procedure which is used to pull data in one table.
- Created Monthly and Summary reports for the different domains like Membership, Contracts, Claims Payable details, Sales by Product etc.
- Involved in Unit Testing, Integrated Testing & User Acceptance Testing of E-Watson reporting project
- Involved in creating Test plans and test scripts for reports testing and test run in Quality Center
- Worked on Performance Tuning of Universe and Reports
- Worked extensively on distribution, maintenance, and optimization of universes for Business Objects and Web Intelligence deployments.
Environment: Windows XP, Informatica, SAP Business Objects 6.5/XI (Desktop Intelligence, CMC, CMS, Application Foundation, Web Intelligence, Crystal Reports, Designer, Universe Builder, Import Wizard), SAP BW, Sybase, Rapid SQL, VB Macros, Mercury Quality Center, MS Office.