- 7+ years of experience as DBA in various phases of project implementation including System Integration, Big data technologies and Hadoop ecosystem: HDFS, Job Tracker, Task Tracker, NameNode, Data Node and MapReduce programming paradigm.
- Administration, Testing, Change Control Process, Hadoop administration activities such as installation and configuration and maintenance of clusters.
- Experience in installation, configuration, supporting and monitoring Hadoop cluster using Cloudera and Horton works distributions.
- Experience in data integrity, Recovery, Disaster Recovery planning, contingency planning, Research &development, cost benefit analysis.
- Technical support User training and Documentation. On various technologies primarily Linux/UNIX & Big Data Systems in diverse industries.
- Expertise in setting, configuring & monitoring of Hadoop cluster using Cloudera CDH3, CDH4, Apache tar balls, Apache Tomcat & Hortonworks Ambari on Ubuntu, Python, Red hat, Centos & Windows.
- Experience in design, development, and maintenance and support of Big Data Analytics using Hadoop Ecosystem components like HDFS, Hive, Pig, HBase, Sqoop, Flume, MapReduce, Kafka and Oozie.
- Knowledge of Mapper/Reduce/HDFS Framework.
- Experience in Infrastructure Recommendations Data capacity, node forecasting and planning.
- Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Experience in User and Group management of Horton works Ambary.
- Experience in Big data domains like Shared Service (Hadoop Clusters, Operational Model, Inter - Company Charge back, and Lifecycle Management).
- Expertise in distributed system performance.
- Expertise in Collaborating across Multiple technology groups and getting things done.
- Worked on NoSQL databases including HBase and MongoDB also plugging them to Hands on experience with "Productionalizing" Hadoop applications (i.e. administration, Hadoop eco system.
- Exposure to installing Hadoop and its ecosystem components such as Hive and Pig.
- Experience in setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS amazon cloud.
- Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade and in i nstalling, supporting operating systems and hardware in CentOS/RHEL.
- Administrative support for parallel computation research on a 24-node Fedora/ Linux cluster.
- Expertise in setting Hive views in Excel Power pivot, analyze the statistical data, storage capacity management and performance tuning.
- Experience in systems & network design; physical system consolidation through server and storage virtualization, remote access solutions.
- Experience in understanding and managing Hadoop Log Files, experience in managing the Hadoop infrastructure with Cloudera Manager. Involved in building Big Data cluster and successfully performed installation of CHD using Cloudera manager.
- Experienced in loading and transforming of large sets of structured, semi structured and unstructured data.
- Extensive experience in architect the Hadoop cluster and experience in setup, configuration and management of security for Hadoop clusters.
- Experience in Linux admin activities and in IT system design, analysis and management.
- Experience in HDFS data storage and support for running map-reduce jobs.
Programming Languages: Java, HTML, SQL, pig, XML, UNIX shell scripts
Hadoop/Big Data: Hadoop, HDFS, Hive, Sqoop, Oozie, Flume and MapReduce
Scripting Language: Shell
No Sql Database: HBase, MongoDB
Auto Monitoring Tools: Amber, CDH4
Operating Systems: Linux, Unix, Ubuntu, CentOS, Windows
Databases: NoSQL, Mongo DB
Technologies: Eclipse, RDBMS
Protocols: TCP/IP, HTTP, DNS
Confidential, Irving, TX
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Installed and configured Apache Hadoop, Hive and Pig environment on Amazon EC2.
- Wrote Map Reduce jobs to discover trends in data usage by users.
- Configured MySQL Database to store Hive metadata.
- Involved in managing and reviewing Hadoop log files.
- Involved in running Hadoop streaming jobs to process terabytes of text data.
- Worked with Linux systems and MySQL database on a regular basis.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Responsible to manage data coming from different sources.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Installed and configured Hive, Pig, Sqoop and Oozie on the HDP 2.0 cluster.
- Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services.
- Assist the team in their development & deployment activities.
- Instrumental in preparing TDD &developing Java Web-Services for WU applications for many of the money transfer functionalities.
- Involved in developing Database access components using Spring DAO integrated with Hibernate for accessing the data.
- Involved in writing HQL queries, Criteria queries and SQL queries for the Data access layer.
- Involved in managing deployments using xml scripts.
- Testing - Unit testing through JUNIT & Integration testing in staging environment.
- Followed Agile SCRUM principles in developing the project.
- Worked with big data developers, designers and scientists in troubleshooting map reduce job failures and issues with Hive, Pig and Flume.
- Coordinating with offshore/onshore, collaboration and arranging the weekly meeting to discuss and track the development progress.
- Involved in coordinating for Unit Testing, Quality Assurance, User Acceptance Testing and Bug Fixing.
- Coordination with team, peer reviews and collaborative System level testing.
Confidential, Dubuque, IA
- Installed and configured various components of Hadoop ecosystem and maintained their integrity, planning for production cluster hardware and software installation on production cluster and communicating with multiple teams to get it done.
- Good understanding and related experience with Hadoop stack - internals, Hive, Pig and Map/Reduce.
- Designed, configured and managed the backup and disaster recovery for HDFS data, commissioned data nodes when data grew and decommissioned when the hardware degraded.
- Involved in defining job flows.
- Trouble shooting many cloud related issues such as Data Node down, Network failure and data Tblock missing.
- Managing and reviewing Hadoop and HBase log files.
- Load and transform large sets of structured, semi structured and unstructured data.
- Worked with application teams to install Hadoop updates, patches, version upgrades as required, Installed and configured Hive, Pig, Sqoop and Oozie on the HDP 2.0 cluster.
- Installed and configured Hive and also written Hive QL scripts.
- Created shell scripts to clean the log files and check the disk space after cleaning, to restore data node, to copy data across different cluster.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Successfully performed installation of CDH4 - Cloudera’s Distribution including Apache Hadoop through Cloudera manager.
- Successfully performed installation of CDH3 - Cloudera’s Distribution including Apache Hadoop through Cloudera manager.
- Used Web services concepts like SOAP, WSDL, JAXB, and JAXP to interact with other project within Supreme Court for sharing information.
- Work with the Hadoop production support team to implement new business initiatives as they relate to Hadoop Perform other work related duties as assigned and available 24 x 7 on call support.
- Involved in development of SQL Server Stored Procedures and SSIS DTSX Packages to automate regular mundane tasks as per business needs.
- Worked on establishing Operational/Governance model and Change Control Board for various lines of business running on Big Data Clusters.
Environment: Hadoop, HDFS, Hive, Flume, HBase, Sqoop, Hue PIG, Java (JDK 1.6), Eclipse, MySQL, Linux, Ubuntu, Zookeeper, Cloudera CDH4 with HA.
Confidential, Fresno, CA
- Experience in Commissioning and Decommissioning nodes.
- Involved in installation, configuration, supporting and managing Hadoop clusters, Hadoop cluster administration that includes commissioning & decommissioning of Data Node, capacity planning, slots configuration, performance tuning, cluster monitoring and troubleshooting.
- Installation and Configuration of other Open Source Software like Pig, Hive, HBASE, Flume and Sqoop
- Built automated set up for cluster monitoring and issue escalation process.
- Working Closely with SA Team to make sure all hardware and software is properly setup for Optimum usage of resources.
- Plan and execute on system upgrades for existing Hadoop clusters.
- Created POC to store Server Log data into Cassandra to identify System Alert Metrics
- Rack Aware Configuration, Configuring Client Machines Configuring, Monitoring and Management Tools.
- Used Fair Scheduler to manage Map Reduce jobs so that each job gets roughly the same amount of CPU time.
- Recover from Name Node failures
- Load and transform data into HDFS from large set of structured data/ As400/Mainframe/Sql server using Talend Big data studio.
- Supporting Hadoop developers and assisting in optimization of map reduce jobs, Pig Latin scripts, Hive Scripts, and HBase ingest required.
- Changes to the configuration properties of the cluster based on volume of the data being processed and performance of the cluster.
- Handle the upgrades and Patch updates.
- Worked on configuring security for Hadoop Cluster, managing and scheduling jobs on a Hadoop Cluster.
- Commission or decommission the data nodes from cluster in case of problems.
- Set up automated processes to archive/clean the unwanted data on the cluster, in particular on Name node and Secondary name node.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Rack Aware Configuration and AWS working nature
- Cluster HA Setup.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
- Day to day support for the cluster issues and job failures.
- Working with dev Team to tune Job Knowledge of Writing Hive Jobs.
- Constantly learning various big data tools and provide strategic direction as per development requirement.
- Analyze and understand the business requirements.
- Develop Informatica mappings using Powercenter Designer to load data from Flat files to Target database (Teradata).
- Prepare test scripts and execute for unit testing.
Environment: cloudera 4.3.2, HDFS, Hive, Sqoop, Zookeeper and HBase, Windows 2000/2003 Unix Linux Java, HDFS Map Reduce, Pig Hive HBase Flume Sqoop, NOSQL Oracle 9i/10g/11g RAC with Solaris/red hat, Exadata Machines X2/X3, Big Data Cloud era CDH Apache Hadoop, Toad, MYSQL plus, Oracle Enterprise Manager (OEM), RMAN, Shell Scripting, Golden Gate, RedHat/Suse Linux, EM Cloud Control, Teradata 13.
Confidential, Milwaukee, WI
- Hands on experience Installation, configuration, maintenance, monitoring, performance and tuning, and troubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster and Production.
- Developed shell scripts to automate the cluster installation.
- Involved in installing Hadoop ecosystem components.
- Defined file system layout and data set permissions.
- Administration, installing, upgrading and managing distributions of Hadoop (CDH3, CDH4, Cloudera manager), Hive, Hbase.
- Imported/exported data from RDBMS to HDFS using Data Ingestion tools like Sqoop.
- Commissioning and Decommissioning nodes to Hadoop Cluster.
- Extracted files from HBase and placed in HDFS/HIVE for processing.
- Recovering from node failures and troubleshooting common Hadoop cluster issues.
- Scripting Hadoop package installation and configuration to support fully-automated deployments.
- Good experience with Hadoop Ecosystem components such as Hive, HBase, qoop, Oozie.
- Involved in creating Hive Internal/External tables, loading with data and troubleshoot with Hive jobs.
- Familiarized with automated monitoring tools like Nagios and Ganglia.
- Worked on pulling the data from oracle databases into the Hadoop cluster using the Sqoop import.
- Tuning of Map Reduce configurations to optimize the run time of jobs.
- Experienced in managing and reviewing Hadoop log files.
- Help design of scalable Big Data clusters and solutions.
- Managing nodes on Hadoop cluster connectivity and security.
- Experience in deploying Java applications on cluster.
- Involved in HDFS File system management and monitoring.
- Good knowledge on Creating ETL jobs to load Twitter JSON data into MongoDB and jobs to load data from MongoDB into Data warehouse.
- Work with Hadoop developers, designers in troubleshooting map reduce job failures and issues and helping to developers.
- Applying Patches and Perform Version Upgrades.
- Evaluate and propose new tools and technologies to meet the needs of the organization.
Environment: Java, Eclipse Juno 4.2, Map Reduce, HDFS, Pig, Hive, Hcatalog, HBase, Flume, Sqoop, Oozie, ZooKeeper Nagios, Ganglia, Zookeeper, Fair Scheduler.
- Responsible for Cluster configuration maintenance and troubleshooting and tuning the cluster.
- Good experience on cluster audit findings and tuning configuration parameters.
- Implemented Kerberos security in all environments.
- Solid Understanding of Hadoop HDFS, Map-Reduce and other Eco-System Projects.
- Install and configure Phoenix on HDP 2.1. Create views over HBase table and used SQL queries to retrieve alerts and Meta data.
- Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce way.
- Implemented Capacity Scheduler to share the resources of the cluster for the map reduce jobs given by the users.
- Experience in configuring the cluster using FIFO or FAIR share.
- Demonstrate and understanding of concepts, best practices and functions to implement a Big Data solution in a corporate environment.
- Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
- Worked on High Availability for Name Node using Cloudera Manager to avoid single point of failure.
- Develop Session tasks and Workflows using Power center Workflow Manager.
- Manage and review data backups and log files.
- Used Ganglia to monitor the cluster around the clock.
- Set up automated processes to analyze the System and Hadoop log files for predefined errors and send alerts to appropriate groups.
- Commissioning and Decommissioning Nodes from time to time.
- Set up and manage HA name node and Name node federation using Apache 2.0 to avoid single point of failures in large clusters
- Work with network and Linux system engineers to define optimum network configurations, server hardware and operating system.
- Working with Cloudera Support Team to Fine tune Cluster.
- Production support responsibilities include cluster maintenance.