- 7 + years of experience in IT specializing in Hadoop Administration, AWS infrastructure setup, DevOps and Software testing.
- Good knowledge on spark components like Spark SQL, MLib, Spark Streaming and GraphX.
- Working knowledge of Kafka for real - time data streaming and event based architecture.
- Knowledge of minor and major upgrades of Hadoop and eco-system components.
- Hands-on experience in AWS cloud landscape including EC2, Identity and Access management (IAM), S3, Cloud Watch, VPC, RDS.
- Managed Data with Volumes and worked with Snapshots, Mirror Volumes, Data protection and Scheduling in MapR.
- Implemented and managed for Devops infrastructure, Terraform, Jenkins, Puppet and Ansible implementation, Responsible for CI infrastructure and CD infrastructure and process and deployment strategy.
- Comprehensive Knowledge of Linux kernel tuning, patching and extensive knowledge of Linux system imaging/mirroring using System Imager.
- Experienced with deployments, maintenance and troubleshooting applications on Microsoft Azure Cloud infrastructure. Excellent knowledge of NOSQL databases like HBase, Cassandra.
- Hands on Experience on Data Extraction, Transformation, Loading Data, Data Visualization using hortonworks Platform HDFS, Hive, Sqoop, Hbase, Oozie, Beeline, Docker, Yarn, impala, spark, Scala, Vertica, Oracle ,JSON, MS-Sql.
- Experience in writing Shell scripts using ksh, bash, and perl, for process automation of databases, applications, backup and scheduling.
- Experience in monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
- Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster.
- Experience working on Big data with Azur e.
- Hands on experience in installation, configuration, management and support of full stack Hadoop Cluster both on premise and cloud using Hortonworks and Cloudera bundles.
- Conceptual Knowledge in Load testing, Black Box testing, and Performance testing, and Stress Testing.
- Analyzed and tuned performance for spark jobs in EMR, understanding the type and size of the input processed using specific instance types.
- Hands on experience in deploying AWS services using Terraform.
- Expert in setting up SSH, SCP, SFTP connectivity between UNIX hosts.
- Experienced in DNS, NIS, NFS, FTP, NIS+, Samba Server, LDAP/AD remote access, security management, and system troubleshooting skills.
- Hands-on experience in Azure Cloud Services (PaaS & IaaS)
- Experienced in Hadoop Big Data Integration with ETL on performing data extract, loading and transformation process for ERP data.
- Set up Disks for MapR, Handled Disk Failures configured storage pools and worked with a Logical Volume Manager.
- Experience in AWS CloudFront, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
- Good understanding on cluster capacity planning and configuring the cluster components based on requirements.
- Collecting and aggregating a large amount of log data using Apache Flume and storing data in HDFS for further analysis.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Knowledge of securing the Hadoop cluster using Kerberos and Sentry.
- Implemented Capacity schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users .
- Experienced on setting up Hortonworks cluster and installing all the ecosystem components through Ambari.
- Involved in Hadoop cluster environment administration that includes adding and removing cluster nodes, cluster monitoring and troubleshooting.
- Worked on Apache Flink to implements the transformation on data streams for filtering, aggregating, update state.
- Excellent knowledge on standard FlowFile processors in NIFI that are used for data routing, transformation and Mediation between systems. e.g.: "GETFILE", "PUTKAFKA", "GETFILE", "PUTFILE", "PUTHDFS" and etc.
- Experience in performance tuning of Hadoop cluster using various JVM metrics.
BIG Data Ecosystem: HDFS, MapReduce, Spark, Pig, Hive, Hbase, sqoop, zookeeper, Sentry, Ranger, Storm, Kafka, Oozie, flume, Apache Flink, Docker, Hue, Knox, NiFi
BIG Data Security: Kerberos, AD, LDAP, KTS, KMS, Redaction, Sentry, Ranger, Navencrypt, SSL/TLS, Cloudera Navigator, Hortonworks
No SQL Databases: HBase, Cassandra, MongoDB
Programming Languages: Java, Scala, Python, SQL, PL/SQL, Hive-QL, Pig Latin
Frameworks: MVC, Struts, Spring, Hibernate
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP,JSON
Web/Application servers: Apache Tomcat, WebLogic, JBoss
Version control: SVN, CVS, GIT
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
Business Intelligence Tools: Talend, Informatica, Tableau
Databases: Oracle … DB2, SQL Server, MySQL, Teradata, Postgre SQL, Cassandra
Tools: and IDE: Eclipse, IntelliJ, NetBeans, Toad, Maven, Jenkins, ANT, SBT
Cloud Technologies: Amazon Web Services (Amazon RedShift, S3), Microsoft Azure HDInsight
Operating Systems: Unix, Linux and Windows
Other Tools: GitHub, Maven, Puppet, Chef, Clarify, JIRA, Quality Center, Rational Suite of Products, MS Test Manager, TFS, Jenkins, Confluence, Splunk, NewRelic
Confidential, Sunnyvale, CA
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. PIG Latin scripts for the analysis of semi structured data.
- Worked on Hadoop cluster with 450 nodes on Cloudera distribution 7.7.0
- Supported 200+ servers and 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
- Do analytics using M ap reduce, hive and pig in HDFS and sends back those results to MongoDB databases and update information in collections.
- Troubleshooting the Azure Development, configuration and Performance issues.
- Imported the data from various formats like JSON, Sequential, Text, CSV, XML Flat files, AVRO, and Parquet to HDFS cluster with compression with MapReduce programs.
- Used Restful Web Services API to connect with the MapR table.
- Currently working as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC to PROD clusters.
- Built Cassandra Cluster on both the physical machines and on AWS Automated Cassandra Builds/installation/monitoring etc.
- Installation and configuration of Openstack Juno on RedHat6 with multiple Compute nodes.
- Involved in loading data from UNIX file system to HDFS. And created custom Solr Query components to enable optimum search matching.
- Create multiple groups and set permission polices for various groups in AWS
- Consulted with the operations team on deploying, migrating data, monitoring, analyzing, and tuning MongoDB applications.
- Worked on deployment on AWS EC2 instance with Postgres RDS and S3 file storage.
- Installed and configured CDH-Hadoop environment and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Used Jira for project tracking, Bug tracking and Project Management. Load data from various data sources into HDFS using Kafka.
- Utilized Azure HDInsight to monitor and manage the Hadoop Cluster.
- Load the data into Spark RDD and do in memory computation to generate output response.
- Used L2 for issue which is cannot be handled by L1 it is forwarded to L2.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Transfer the data from HDFS TO MONGODB using pig, hive and Map reduce scripts and visualize the streaming data in dashboard tableau.
- Managed and reviewed Hadoop log files, file system management and monitoring Hadoop cluster capacity planning.
- Monitoring workload, job performance and capacity planning using Cloudera Manager.
- Written Flume configuration files to store streaming data in HDFS & u pgraded Kafka 0.8.2.2 to 0.9.0.0
- As an admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
- Used Python scripts to update content in the database and manipulate files.
- Generated Python Django Forms to record data of online users.
- Used L2 for technical support level than Tier I and therefore costs more as the technicians are more experienced and knowledgeable on a particular product or service.
- Installed SonarQube as a Docker container on openstack, Azure, AWS EC2 and integrated it with Jenkins.
- Knowledgeable with continuous deployment using Heroku, Jenkins and Ansible
- Install KAFKA on Hadoop cluster and configure producer and consumer coding part in java to establish connection from twitter source to HDFS with popular hash tags.
- Worked on cloud based Architecture for Big Data with Azure.
- Good understanding on Architecture, Configurations and Security of Cassandra with Falcon. Data read path and write path for Cassandra.
Environment : Ansible, Cloudera 5.6, Azure, HDFS, CDH4.7, Hadoop-2.0.0 HDFS, MapReduce, Hue, MongoDB-2.6, Hive-0.10, Sqoop-1.4.3, Oozie-3.3.4, AWS, Jira, Zeppelin, Web Logic 8.1 Kafka, Ranger, Yarn, Falcon, Kerberos, Impala, Pig, Python Scripting, MySQL,Perl, Red Hat Linux, CentOS and other UNIX utilities, Cloudera Manager
Sr. Hadoop Admin
Confidential, New York, NY
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes .
- Setting up test cluster with new services like Grafana and integrating with Kafka and Hbase for intense monitoring.
- Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberoes) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption at Rest).
- Analysis of raw data which was scheduled to dump in Azure blob storage(HDInsight cluster).
- Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.
- Building and Implementing ETL process developed using Big data tools such as Spark(scala/python), Nifi.
- Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, Pair RDD's.
- Drive end to end deployment of various Components on the Azure Platform
- Worked on Cloudera to analyze data present on top of HDFS.
- Responsible for copying 800 TB of HDFS snapshot from Production cluster to DR cluster.
- Responsible for copying 210 TB of Hbase table from Production to DR cluster.
- Production experience in large environments using configuration management tools like Chef and Puppet supporting Chef Environment with 170+ servers and involved in developing manifests.
- Installed and configured Cloudera with REHL and responsible for maintaining cluster.
- Used Flink Streaming for pipelined Flink engine to process data streams to deploy new API including definition of flexible windows.
- Configured AWS IAM and Security Groups.
- Administering 150+ Hadoop servers which need java version updates, latest security patches, OS related upgrades and taking care of hardware related outages.
- Upgraded Ambari Ambari 2.2.0, Ambari 188.8.131.52. SOLR update from 4.10.3 to Ambari INFRA which is SOLR 5.5.2.
- Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Knowledge on in HDFS data storage and support for Azure Data Lake.
- Implemented Cluster Security using Kerberos and HDFS ACLs.
- Involved in loading data from UNIX file system to HDFS. Created root cause analysis (RCA) efforts for the high severity incidents.
- Implemented Spark using python and Spark SQL for faster testing processing the data.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Experience in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD at an Enterprise level.
- Used Hive and created Hive tables, loaded data from Local file system to HDFS.
Environment : Cloudera, Kafka, Apache Hadoop, Apache Flink, Ansible, HDFS, Map R, YARN, Sqoop, Flume, Oozie, Zookeeper, Kerberos, Sentry, Pig, Spark, Hive, Docker, Hbase, Nifi, Python, Puppet/Chef, LDAP/AD, Oracle Enterprise Manager (OEM), MySQL, PostgreSQL, Teradata , Azure
Confidential, San Jose, CA
- Responsible for Cluster maintenance. Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
- Hands on experience in working with ecosystem like Hive, Pig scripts, Sqoop, MapReduce, Hortonworks YARN, and zookeeper. Strong knowledge of hive's analytical functions.
- Experience in dealing with Windows Azure IAAS - Virtual Networks, Virtual Machines, Cloud Services.
- As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
- Introduced SmartSense and got optimal recommendations from the vendor and even for troubleshooting the issues.
- Experience with automated CM and maintained a CI/CD pipeline, deployment tools such as Chef, Puppet, or Ansible.
- Deployed Datalake cluster with Hortonworks Ambari on AWS using EC2 and S3.
- Responsible for building scalable distributed data solutions using Hadoop.
- Integration of HDInsight with Azure Data Lake store (ADLS) and blob storage.
- Cluster maintenance as well as commission and decommission of nodes.
- Collaborate with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
- Implemented Secondary sorting to sort reducer output globally in M ap reduce.
- Set up Hortonworks Infrastructure from configuring clusters to Node.
- Installing, configuring and administering Jenkins CI tool on AWS EC2 instances.
- Worked on Error handling techniques and tuning the ETL flow for better performance and worked Extensively TAC (Admin Console), where we Schedule Jobs in Job Conductor.
- Migrated Hive QL queries on structured data into Spark QL to improve performance.
- Involved in improving the performance and optimization of the existing algorithms using Spark.
- Configured the Kerberos and installed MIT ticketing system.
- Hands on experience with Docker Puppet, Chef, Ansible, AWS CloudFormation, AWS CloudFront.
- Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
- Installing and configuring CDAP, an ETL tool in the development and Production clusters.
- Integrated CDAP with Ambari to for easy operations monitoring and management.
- Used CDAP to monitor the datasets and workflows to ensure smooth data flow.
- Key role in migrating production and development Hortonworks Hadoop clusters to a new cloud based cluster solution.
- Monitor Hadoop cluster and proactively optimize and tune cluster for performance.
- Connected to the HDFS using the third-party tools like Teradata SQL assistant using ODBC driver.
- Upgraded HDP (HortonWorks Data Platform) multiple times starting from HDP 1.7, HDP 2.3, HDP 2.4.3.
Environment: Hadoop, MapR, HDFS, Hive, Pig, Java, SQL, Sqoop, Nifi, Flume, Oozie, CDH3, MongoDB, Cassandra, HBase, Java (jdk 1.6), Eclipse, Hortonworks, Oracle and Unix/Linux.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name node recovery, Capacity planning, Cassandra and slots configuration.
- Implemented Security in Web Applications using Azure and Deployed Web Applications to Azure.
- Used Apache Spark API over Hortonworks Hadoop YARN cluster to perform analytics on data in Hive.
- Monitored multiple clusters environments using Metrics and Nagios.
- Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
- Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Enabling the monitoring solution for HDInsight, Interactive query, Spark & HBase clusters.
- Deployed a Hadoop cluster and integrated with Nagios and Ganglia.
- Installed single Node machines for stake holders with Hortonworks HDP Distribution.
- Extensively involved in cluster capacity planning, Hardware planning, Installation, Performance tuning of the Hadoop cluster.
- Used Windows Azure to deploy the application on the cloud and managed the session.
- Configured Zoo keeper to implement node coordination, in clustering support.
- Experienced in providing security for Hadoop Cluster with Kerberos.
- Dumped the data from MYSQL database to HDFS and vice-versa using SQOOP.
- Used Ganglia and Nagios to monitor the cluster around the clock.
- Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
- Worked on a live 52 node Hadoop Cluster running Hortonworks Data Platform.
- Converted MapReduce jobs into Spark transformations and actions using Spark.
- Resolving tickets submitted, P1 issues, troubleshoot the error, and documenting resolved errors.
- Adding new Data Nodes when needed and running balancer.
Environment: Hadoop, Big Data, HDFS, Map Reduce, Hortonworks, Sqoop, Oozie, Pig, Hive, HBase, Flume, LINUX, Java,AWS, HDinsight, Eclipse, Cassandra HDP 2.0 .
- Configuring and troubleshooting DHCP on RHEL servers.
- Installation, configuration and building RHEL servers troubleshooting and maintaining.
- Software installation of Enterprise Security Manager on RHEL servers.
- Performing Shell, Python and Perl Scripting for automation crucial tasks.
- Patching RHEL servers for security, OS patches and upgrades.
- Experience in system authentication on RHEL servers using Kerberos, DAP.
- Installing and Configuring DNS on RHEL servers.
- Managing Disk File Systems, Server Performance, Users Creation and Granting File Access Permissions.
- Troubleshooting the backup issues by analyzing the NetBackup logs.
- Configuring and Troubleshooting of various services like NFS, SSH, Telnet, FTP on UNIX platform.
- Monitoring Disk, CPU and Memory & Performance of servers. Configuring LVM's
Environment : RHEL, Solaris, VMware, Apache, JBOSS, Web Logic, System Authentication, Web sphere, NFS, DNS, SAMBA, Red Hat Linux servers, Oracle RAC, VMware, DHCP.