- Over 7 + years of expertise in Hadoop, Big Data Analytics and Linux including design, installation, configuration and management of Apache Hadoop Clusters, Mapr, and Hortonworks& Cloudera Hadoop Distribution.
- Experienced in Hadoop Big Data Integration with ETL on performing data extract, loading and transformation process for ERP data.
- Set up Disks for MapR, Handled Disk Failures configured storage pools and worked with a Logical Volume Manager.
- Experience in AWS CloudFront, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
- Used POSTGRESQL a system of data management was designed and developed where queries were optimized to improve the performance .
- Good understanding on cluster capacity planning and configuring the cluster components based on requirements.
- Collecting and aggregating a large amount of log data using Apache Flume and storing data in HDFS for further analysis.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Knowledge of securing the Hadoop cluster using Kerberos and Sentry.
- Experience in setting up automatic failover control and manual failover control using ZooKeeper and quorum journal nodes.
- Implemented Capacity schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Comprehensive Knowledge of Linux kernel tuning, patching and extensive knowledge of Linux system imaging/mirroring using System Imager.
- Hands on Experience on Data Extraction, Transformation, Loading Data, Data Visualization using hortonworks Platform HDFS, Hive, Sqoop, Hbase, Oozie, Beeline, Docker, Yarn, impala, spark, Scala, Vertica, Oracle,JSON, MS-Sql.
- Experience in writing Shell scripts using ksh, bash, and perl, for process automation of databases, applications, backup and scheduling.
- Experienced on setting up Hortonworks cluster and installing all the ecosystem components through Ambari.
- Involved in Hadoop cluster environment administration that includes adding and removing cluster nodes, cluster monitoring and troubleshooting.
- Worked on Apache Flink to implements the transformation on data streams for filtering, aggregating, update state.
- Experience in monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
- Excellent knowledge on standard FlowFile processors in NIFI that are used for data routing, transformation and Mediation between systems. e.g.: "GETFILE", "PUTKAFKA", "GETFILE", "PUTFILE", "PUTHDFS" and etc.
- Experience in performance tuning of Hadoop cluster using various JVM metrics
- Implemented and managed for Devops infrastructure, Terraform, Jenkins, Puppet and Ansible implementation, Responsible for CI infrastructure and CD infrastructure and process and deployment strategy.
- Good knowledge on spark components like Spark SQL, MLib, Spark Streaming and GraphX.
- Hands on experience in installation, configuration, management and support of full stack Hadoop Cluster both on premise and cloud using Hortonworks and Cloudera bundles.
- Working knowledge of Kafka for real-time data streaming and event based architecture.
- Knowledge of minor and major upgrades of Hadoop and eco-system components.
- Hands-on experience in AWS cloud landscape including EC2, Identity and Access management (IAM), S3, Cloud Watch, VPC, RDS.
- Managed Data with Volumes and worked with Snapshots, Mirror Volumes, Data protection and Scheduling in MapR.
- Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster.
- Expertise in Writing, Reviewing and Executing Test Cases.
- Conceptual Knowledge in Load testing, Black Box testing, and Performance testing, and Stress Testing.
- Analyzed and tuned performance for spark jobs in EMR, understanding the type and size of the input processed using specific instance types.
- Hands on experience in deploying AWS services using Terraform.
- Expert in setting up SSH, SCP, SFTP connectivity between UNIX hosts.
- Experienced in DNS, NIS, NFS, Solr, FTP, NIS+, Samba Server, LDAP/AD remote access, security management, and system troubleshooting skills.
BIG Data Ecosystem: Spark, Pig, Hive, Hbase, sqoop, zookeeper, Sentry, Ranger, Storm, Kafka, Oozie, flume, Apache Flink, Docker, Solr, Hue, Knox, NiFi, HDFS, MapReduce
Cloud Technologies: Amazon Web Services (Amazon RedShift, S3), Microsoft Azure Insight
BIG Data Security: Redaction, Sentry, Ranger, Navencrypt, Kerberos, AD, LDAP, KTS, KMS, SSL/TLS, Cloudera Navigator, Hortonworks
Programming Languages: Scala, Python, Java, SQL, PL/SQL, Hive-QL, Pig Latin
No SQL Databases: Cassandra, MongoDB
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP,JSON
Web/Application servers: Apache Tomcat, WebLogic, JBoss
Version control: SVN, CVS, GIT
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
Databases: Oracle DB2, SQL Server, MySQL, Teradata, Postgre SQL, Cassandra
Tools: Eclipse, IntelliJ, NetBeans, Toad, Maven, Jenkins, ANT, SBT
Operating Systems: Unix, Linux and Windows
Other Tools: GitHub, Maven, Puppet, Chef, Clarify, JIRA, Quality Center, Rational Suite of Products, MS Test Manager, TFS, Jenkins, Confluence, Splunk, NewRelic
Confidential, Overland Park, KS
- Involved in Requirement analysis, Design, Development, Data Design and Mapping, extraction, validation and creating complex business requirements.
- Generated Python Django Forms to record data of online users.
- Used L2 for technical support level than Tier I and therefore costs more as the technicians are more experienced and knowledgeable on a particular product or service.
- Knowledgeable with continuous deployment using Heroku, Jenkins and Ansible
- Install KAFKA on Hadoop cluster and configure producer and consumer coding part in java to establish connection from twitter source to HDFS with popular hash tags.
- Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5 using Puppet.
- Worked on cloud based Architecture for Big Data with Azure.
- Good understanding on Architecture, Configurations and Security of Cassandra with Falcon. Data read path and write path for Cassandra.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. PIG Latin scripts for the analysis of semi structured data.
- Worked on Hadoop cluster with 100 nodes on Cloudera distribution 5.6
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Transfer the data from HDFS TO MONGODB using pig, hive and Map reduce scripts and visualize the streaming data in dashboard tableau.
- Managed and reviewed Hadoop log files, file system management and monitoring Hadoop cluster capacity planning.
- Monitoring workload, job performance and capacity planning using Cloudera Manager.
- Written Flume configuration files to store streaming data in HDFS & upgraded Kafka 0.8.2.2 to 0.9.0.0
- As an admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
- Used Python scripts to update content in the database and manipulate files.
- Supported 200+ servers and 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices
- Do analytics using Map reduce, hive and pig in HDFS and sends back those results to MongoDB databases and update information in collections.
- Imported the data from various formats like JSON, Sequential, Text, CSV, XML Flat files, AVRO, and Parquet to HDFS cluster with compression with MapReduce programs.
- Used Restful Web Services API to connect with the MapR table.
- Currently working as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC to PROD clusters.
- Built Cassandra Cluster on both the physical machines and on AWS Automated Cassandra Builds/installation/monitoring etc.
- Installation and configuration of Openstack Juno on RedHat6 with multiple Compute nodes.
- Involved in loading data from UNIX file system to HDFS. And created custom Solr Query components to enable optimum search matching.
- Create multiple groups and set permission polices for various groups in AWS
- Consulted with the operations team on deploying, migrating data, monitoring, analyzing, and tuning MongoDB applications.
- Worked on deployment on AWS EC2 instance with Postgres RDS and S3 file storage.
- Installed and configured CDH-Hadoop environment and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Used Jira for project tracking, Bug tracking and Project Management. Load data from various data sources into HDFS using Kafka.
- Load the data into Spark RDD and do in memory computation to generate output response.
Environment : MapR 5.2 & 6.0 , Hue, MongoDB-2.6, Hive-0.10, Sqoop-1.4.3, Oozie-3.3.4, AWS, Jira, Zeppelin, Web Logic 8.1 Kafka, Ranger, CentOS and other UNIX utilities, Cloudera Manager , Yarn, Falcon, Kerberos, Ansible, Cloudera 5.6 & 6.0, Azure, HDFS, CDH4.7, Hadoop-2.0.0 HDFS , Impala, Pig, Python Scripting
Confidential, Tyson, VA
- Implemented Cluster Security using Kerberos and HDFS ACLs.
- Involved in loading data from UNIX file system to HDFS. Created root cause analysis (RCA) efforts for the high severity incidents.
- Implemented Spark using python and Spark SQL for faster testing processing the data.
- Maintained GIT repositories for DevOps environment: automation code and configuration.
- Parsed high-level design specification to simple ETL coding and mapping standards and cluster co-ordination services through Zookeeper.
- Involved in Devops tools such as Jenkins, Nexus, Chef and Ansible for build and deploy applications.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Experience in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD at an Enterprise level.
- Used Hive and created Hive tables, loaded data from Local file system to HDFS.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
- Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
- Setting up test cluster with new services like Grafana and integrating with Kafka and Hbase for intense monitoring.
- Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberoes) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption at Rest).
- Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.
- Building and Implementing ETL process developed using Big data tools such as Spark(scala/python), Nifi.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, Pair RDD's.
- Successfully migrated the Elasticsearch database from SQLite to MySQL to PostgreSQL with complete data integrity.
- Worked on Cloudera to analyze data present on top of HDFS.
- Responsible for copying 800 TB of HDFS snapshot from Production cluster to DR cluster.
- Knowledge on in HDFS data storage and support for Azure Data Lake.
- Responsible for copying 210 TB of Hbase table from Production to DR cluster.
- Production experience in large environments using configuration management tools like Chef and Puppet supporting Chef Environment with 170+ servers and involved in developing manifests.
- Installed and configured Cloudera with REHL and responsible for maintaining cluster.
- Used Flink Streaming for pipelined Flink engine to process data streams to deploy new API including definition of flexible windows.
- Configured AWS IAM and Security Groups.
- Administering 150+ Hadoop servers which need java version updates, latest security patches, OS related upgrades and taking care of hardware related outages.
- Upgraded Ambari Ambari 2.2.0, Ambari 22.214.171.124. SOLR update from 4.10.3 to Ambari INFRA which is SOLR 5.5.2.
- Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
Environment : Cloudera, HDFS, Map R 5.1, YARN, Sqoop, Flume, Oozie, Zookeeper, Kerberos, Sentry, Pig, Spark, Hive, Docker, Hbase, Nifi, Python, Puppet/Chef, LDAP/AD, Oracle Enterprise Manager (OEM), MySQL, PostgreSQL, Teradata , Azure, Kafka, Apache Hadoop, Apache Flink, Ansible.
Confidential, Irvine, CA
- Involved in improving the performance and optimization of the existing algorithms using Spark.
- Migrated Hive QL queries on structured data into Spark QL to improve performance.
- Configured the Kerberos and installed MIT ticketing system.
- Hands on experience with Docker Puppet, Chef, Ansible, AWS CloudFormation, AWS CloudFront.
- Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
- Installing and configuring CDAP, an ETL tool in the development and Production clusters.
- Integrated CDAP with Ambari to for easy operations monitoring and management.
- Used CDAP to monitor the datasets and workflows to ensure smooth data flow.
- Key role in migrating production and development Hortonworks Hadoop clusters to a new cloud based cluster solution.
- Connected to the HDFS using the third-party tools like Teradata SQL assistant using ODBC driver.
- Upgraded HDP (HortonWorks Data Platform) multiple times starting from HDP 1.7, HDP 2.3, HDP 2.4.3
- Cluster maintenance as well as commission and decommission of nodes.
- Collaborate with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
- Implemented Secondary sorting to sort reducer output globally in Map reduce.
- Set up Hortonworks Infrastructure from configuring clusters to Node.
- Installing, configuring and administering Jenkins CI tool on AWS EC2 instances.
- Responsible for Cluster maintenance. Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
- Hands on experience in working with ecosystem like Hive, Pig scripts, Sqoop, MapReduce, Hortonworks YARN, and zookeeper. Strong knowledge of hive's analytical functions.
- As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
- Introduced SmartSense and got optimal recommendations from the vendor and even for troubleshooting the issues.
- Experience with automated CM and maintained a CI/CD pipeline, deployment tools such as Chef, Puppet, or Ansible.
- Deployed Datalake cluster with Hortonworks Ambari on AWS using EC2 and S3.
- Responsible for building scalable distributed data solutions using Hadoop.
- Worked on Error handling techniques and tuning the ETL flow for better performance and worked Extensively TAC (Admin Console), where we Schedule Jobs in Job Conductor.
Environment: Hive, Pig, Java, SQL, Sqoop, Nifi, Flume, Oozie, CDH3, MongoDB, Cassandra, HBase, Java (jdk 1.6), Eclipse, Hortonworks, Oracle and Unix/Linux, Hadoop, MapR 5.0, HDFS.
Linux / Hadoop Admin
- Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
- Installed single Node machines for stake holders with Hortonworks HDP Distribution.
- Configured Zoo keeper to implement node coordination, in clustering support.
- Experienced in providing security for Hadoop Cluster with Kerberos.
- Dumped the data from MYSQL database to HDFS and vice-versa using SQOOP.
- Used Ganglia and Nagios to monitor the cluster around the clock.
- Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
- Monitored multiple clusters environments using Metrics and Nagios.
- Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
- Configuring and Troubleshooting of various services like NFS, SSH, Telnet, FTP on UNIX platform.
- Experience in system authentication on RHEL servers using Kerberos, DAP.
- Installing and Configuring DNS on RHEL servers.
- Managing Disk File Systems, Server Performance, Users Creation and Granting File Access Permissions.
- Troubleshooting the backup issues by analyzing the NetBackup logs.
- Configuring and troubleshooting DHCP on RHEL servers.
- Installation, configuration and building RHEL servers troubleshooting and maintaining.
- Software installation of Enterprise Security Manager on RHEL servers.
- Performing Shell, Python and Perl Scripting for automation crucial tasks.
Environment: LINUX, Java, AWS, Hortonworks , Sqoop, Oozie, Pig, Hive, HBase, Flume, Eclipse, Cassandra HDP 2.0, Hadoop, Big Data, HDFS, Map Reduce. Web sphere, NFS, DNS, Red Hat Linux servers, Oracle RAC, VMware, DHCP, RHEL, Solaris, VMware, Apache.