Hadoop Admin Resume
Irvine, CA
PROFESSIONAL SUMMARY:
- 8+ years of experience in IT industry includes MS SQL DBA and Big data consultant in Banking, Telecom and financial clients.
- Having 3+ years of comprehensive experience as a Hadoop (HDFS, MAPREDUCE, HIVE, PIG, SQOOP, FLUME, SPARK, KAFKA, DRILL, ZOOKEEPER, AVRO, OOZIE, HBASE)Hadoop Consultant & Administrator.
- Excellent understanding / knowledge of Hadoop architecture and various components such as Hadoop File System HDFS, JobTracker, TaskTracker, NameNode, DataNode (Hadoop1.X),YARN concepts like ResourceManager, NodeManager (Hadoop 2.x) and Hadoop MapReduce programming paradigm.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop along with CDH3 - 5.8 & HDP1-2.3 clusters. MapReduce, HDFS, Spark, Oozie, Hive, Sqoop, Pig, HBase.
- Good Understanding in Kerberos and how it interacts with Hadoop and LDAP.
- Practical knowledge on functionalities of every Hadoop daemons, interaction between them, resource utilizations and dynamic tuning to make cluster available and efficient.
- Good Expertise on MapReduce programming, PIG Scripting and Distributed Application and HDFS.
- Experience in handling Hadoop Cluster and monitoring the cluster using Cloudera Manager, Ambari, Nagios and Ganglia.
- Experience in managing and reviewing Hadoop and eco system tools log files.
- Dealt with projects including importing and exporting data using Sqoop between HDFS and Relational Database Systems.
- Hands on experience in writing shell scripts.
- Experienced in deployment of Hadoop Cluster using Puppet tool.
- Expert in setting up SSH, SCP, SFTP connectivity between UNIX hosts.
- Experience in Hadoop Shell commands, verifying managing and reviewing Hadoop Log files.
- In depth knowledge of JobTracker, TaskTracker, NameNode, DataNodes and MapReduce concepts.
- Strong analytical skills with ability to quickly understand client’s business needs and create specifications.
- Hands on experience on installation and configuration of Capacity Scheduler on Hadoop cluster.
- Experience in Knox and Ranger authorization and authentication on Hadoop.
- Application development using RDBMS, and Linux shell scripting.
- Experience in performing major and minor upgrades of Hadoop clusters in Apache, and Cloudera distributions.
TECHNICAL SKILLS:
Big Data Ecosystem: Hadoop, MapReduce, HDFS, HBase, Cassandra, Zookeeper, Avro, Drill, Ambari, Spark, Hive, Pig, Sqoop, Oozie, Falcon and Flume.
J2EE Technologies: JSP 2.1 Servlets 2.3, JDBC, JMS, JNDI, JAXP, Java Beans
Operating Systems: DOS, Windows NT/ 2000/2003/2008/2012, Sun Solaris 2.6-8, Linux, and MacOS
Database: MS SQL Server 2000/2005/2008 /2008 R2/2012/2014, Oracle 8/9/10, Access, Sybase, and IBM DB2
Database Tools: Profiler, DTS, Management Studio/Enterprise Manager, Query Analyzer
Web Servers: IIS 5.0/6.0, Apache Tomcat and Apache Http web server
Languages: T-SQL, C, JAVA, VB.NET, XML, C++, Java, VB 6.0,ASP.NET, CGI, JSP, Unix shell scripting
Reporting Tools: SQL Server Reporting Services, Crystal Reports
Applications: Microsoft Office, Erwin, MS VISIO 2003/2005
Networking: TCP/IP, Named Pipes, VIA
Design Tools: Erwin, Visio 2000, Designer 2000, Developer 2000, DTS, OLAP, Crystal Reports, PeopleSoft SQR, Informatica, Ab Initio
ETL Tools: AscentialDataStage, InformaticaPowerCenter / PowerMart, DecisionStream, DBArtisan
PROFESSIONAL EXPERIENCE:
Confidential, Irvine, CA
Hadoop Admin
Responsibilities:
- Installed, configured and maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Oozie, Flume, Zookeeper and Sqoop.
- Installing and Upgrading Cloudera CDH on production & Hortonworks HDP Versions on test.
- Moving the Services (Re-distribution) from one Host to another host within the Cluster to facilitate securing the cluster and ensuring High availability of the services.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning, and slots configuration.
- Implemented NameNode backup using NFS for High availability.
- Used Kerberos 5-1.6.1 on CentOS 6 in CDH 5.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Created Hive tables and involved in data loading and writing Hive UDFs.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked on HBase, Hive, and Impala.
- Automated workflows using shell scripts to pull data from various databases into Hadoop.
- Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
- Used Nagios and Ganglia for monitoring tools.
Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Flume, HBase, Zookeeper, Ranger, Knox, MongoDB, Cassandra, Oracle, NoSQL and Unix/Linux.
Confidential, Mason, OH
Hadoop Administrator
Responsibilities:
- Installed and configured CDH cluster, using Cloudera manager for easy management of existing Hadoop cluster.
- Extensively using Cloudera manager for managing multiple clusters with petabytes of data.
- Implementing Oracle Big Data Appliance for production environment.
- Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
- Regular Commissioning and Decommissioning of nodes depending upon the amount of data.
- Worked on setting up high availability for major production cluster.
- Performed Hadoop version updates using automation tools.
- Implemented rack aware topology on the Hadoop cluster.
- Importing and exporting structured data from different relational databases into HDFS and Hive using Sqoop.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
- Managed load balancers, firewalls in a production environment.
- Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Good experience in troubleshoot production level issues in the cluster and its functionality.
- Backed up data on regular basis to a remote cluster using distcp.
- Managing and scheduling Jobs on a Hadoop cluster.
Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Flume, HBase, Zookeeper, Cloudera Distributed Hadoop, Cloudera Manager.
Confidential, San Diego, CA
Hadoop Administrator/ SQL Server DBA
Responsibilities:
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Imported logs from web servers with Flume to ingest the data into HDFS.
- Retrieved data from HDFS into relational databases with Sqoop. Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis
- Partitioned and queried the data in Hive for further analysis by the BI team.
- Extending the functionality of Hive and Pig with custom UDF’s and UDAF’s.
- Involved in extracting the data from various sources into Hadoop HDFS for processing.
- Effectively used Sqoop to transfer data between databases and HDFS.
- Designed and implemented PIG UDFS for evaluation, filtering, loading and storing of data.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several
- Installed and configured SQL Server 2008 R2/2012 on DEV, TEST and PRODUCTION servers
- Migrated SQL Server 2005 to 2008, SQL Server 2008 to 2012 and DTS to SSIS (ETL).
- Scheduled jobs and Monitored job Failures to Diagnose the problems on a daily basis.
- Experience with AlwaysOn Availability Transactional Replication and Clustering as well as log shipping.
Environment: Hadoop, HDFS, MapReduce, Hive, Oozie, Java (jdk1.6), Cloudera, MySQL, Windows Server 2008 R2/2012/2012 R2, MS SQL Server 2012/2008 R2, SSIS, SSRS, SSAS, Erwin, Visual Basic 6.0, SQL Azure, Crystal Reports and Ganglia.
Confidential
SQL Server DBA
Responsibilities:
- Responsible for logical and physical database design for tables in SQL Server 2008/2008 R2/2005 databases.
- Tuned the database to perform efficiently by managing databases using the options to grow or shrink database files and monitored the size of the transaction log using DBCC SHRINKFILE
- Managed databases on multiple disks using Disk Mirroring and RAID technology.
- Evaluated data storage considerations to store databases and transaction logs.
- Created database Stored Procedures and functions using SQL Server Management Studio
- Performed Extract Transfer and Loading (ETL) of physical data to the systems by means of SSIS/DTS packages and workflows between source and destination servers
- Used log shipping for synchronization of databases and supported Active/Passive clusters
- Maintained the database consistency with DBCC at regular intervals. Involved in troubleshooting and fine-tuning of databases for its performance and concurrency.
- Completed documentation about the database. Analyzed long running, slow queries and tuned the same to optimize application and system performance.
- Monitored event and server error logs for troubleshooting.
- Setup SQL Server configuration settings and transactional replication on production SQL servers.
- Performed daily backup and created named backup devices to backup the servers regularly.
- Generated Script files of the databases whenever changes were made to stored procedures or views.
- Scheduled tasks for transformation of data from heterogeneous environment.
- Worked on DTS and BCP Import and Export utility for transferring data.
Environment: Windows Server 2008/2008 R2, MS SQL Server 2008/2008 R2, Visual Basic 6.0, .NET, PL/SQL, VB, ASP.
Confidential
Jr. SQL Server DBA/ Developer
Responsibilities:
- Managing Production SQL Servers on VIRTUAL and PHYSICAL environments
- Installing and Configuring SQL Server 2005/2008.
- Supporting the business 24/7, maintaining 99.999 uptimes depending on the application and business SLA requirements.
- Involved in upgrading SQL Server 2000 instances to SQL Server 2008.
- Supported the configuration, deployment and administration of vendor and in-house applications interfacing with Oracle and SQL Server databases.
- Supported for installation Oracle Applications R12/11i in single node & Multimode
- Involved in database design, database standards, and T-SQL code reviews.
- Configured Active/Active and Active/passive SQL Server Clusters.
- Implemented Mirroring and Log Shipping for Disaster recovery
- Used Send Mail, Bulk Insert, Execute SQL, Data Flow, Import Export control extensively in SSIS
- Performed Multi File Imports, Package configuration, Debugging Tasks and Scripts in SSIS
- Scheduled package execution in SSIS
- Installing packages on multiple servers in SSIS
- Extensively worked on SSIS for data manipulation and data extraction, ETL Load
- Created extensive reports using SSRS (Tabular, Matrix).
- Configured transactional and snapshot replication and managed publications, articles.
- Proactively involved in SQL Server Performance tuning at Database level, T-SQL level and operating system level. Maintained database response times and proactively generated performance reports.
- Automated most of the DBA Tasks and monitoring stats.
- Responsible for SQL Server Edition upgrades and SQL Server patch management.
- Created a mirrored database for reporting using Database Mirroring with high performance mode.
- Created database snapshots and stored procedures to load data from the snapshot database to the report database.
- Involved in data modeling for the application and created ER diagrams using Erwin and VISIO.
- Created Schemas, Logins, Tables, Clustered and Non-Clustered Indexes, Views, Functions and Stored Procedures.
- Troubleshooting and supporting all user problems by using SQL server profiler, network traces, Windows logs etc.
- Migrated DTS Packages to SSIS Package.
- Monitored event and server error logs for troubleshooting.
Environment: MS SQL server 2000/2005/2008, Windows 2003/2008, Enterprise Manager, Query Analyzer, SQL Server Profiler, SSIS, SSRS, Erwin, VISIO.