We provide IT Staff Augmentation Services!

Hadoop Consultant Resume

Sunnyvale, CA

SUMMARY:

  • 7 + years of experience in IT specializing in Hadoop Administration, AWS infrastructure setup, DevOps and Software testing.
  • Good knowledge on spark components like Spark SQL, MLib, Spark Streaming and GraphX.
  • Working knowledge of Kafka for real - time data streaming and event based architecture.
  • Knowledge of minor and major upgrades of Hadoop and eco-system components.
  • Hands-on experience in AWS cloud landscape including EC2, Identity and Access management (IAM), S3, Cloud Watch, VPC, RDS.
  • Managed Data with Volumes and worked with Snapshots, Mirror Volumes, Data protection and Scheduling in MapR.
  • Implemented and managed for Devops infrastructure, Terraform, Jenkins, Puppet and Ansible implementation, Responsible for CI infrastructure and CD infrastructure and process and deployment strategy.
  • Comprehensive Knowledge of Linux kernel tuning, patching and extensive knowledge of Linux system imaging/mirroring using System Imager.
  • Experienced with deployments, maintenance and troubleshooting applications on Microsoft Azure Cloud infrastructure. Excellent knowledge of NOSQL databases like HBase, Cassandra.
  • Hands on Experience on Data Extraction, Transformation, Loading Data, Data Visualization using hortonworks Platform HDFS, Hive, Sqoop, Hbase, Oozie, Beeline, Docker, Yarn, impala, spark, Scala, Vertica, Oracle ,JSON, MS-Sql.
  • Experience in writing Shell scripts using ksh, bash, and perl, for process automation of databases, applications, backup and scheduling.
  • Experience in monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
  • Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster.
  • Experience working on Big data with Azur e.
  • Hands on experience in installation, configuration, management and support of full stack Hadoop Cluster both on premise and cloud using Hortonworks and Cloudera bundles.
  • Conceptual Knowledge in Load testing, Black Box testing, and Performance testing, and Stress Testing.
  • Analyzed and tuned performance for spark jobs in EMR, understanding the type and size of the input processed using specific instance types.
  • Hands on experience in deploying AWS services using Terraform.
  • Expert in setting up SSH, SCP, SFTP connectivity between UNIX hosts.
  • Experienced in DNS, NIS, NFS, FTP, NIS+, Samba Server, LDAP/AD remote access, security management, and system troubleshooting skills.
  • Hands-on experience in Azure Cloud Services (PaaS & IaaS)
  • Experienced in Hadoop Big Data Integration with ETL on performing data extract, loading and transformation process for ERP data.
  • Set up Disks for MapR, Handled Disk Failures configured storage pools and worked with a Logical Volume Manager.
  • Experience in AWS CloudFront, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
  • Good understanding on cluster capacity planning and configuring the cluster components based on requirements.
  • Collecting and aggregating a large amount of log data using Apache Flume and storing data in HDFS for further analysis.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Knowledge of securing the Hadoop cluster using Kerberos and Sentry.
  • Implemented Capacity schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users .
  • Experienced on setting up Hortonworks cluster and installing all the ecosystem components through Ambari.
  • Involved in Hadoop cluster environment administration that includes adding and removing cluster nodes, cluster monitoring and troubleshooting.
  • Worked on Apache Flink to implements the transformation on data streams for filtering, aggregating, update state.
  • Excellent knowledge on standard FlowFile processors in NIFI that are used for data routing, transformation and Mediation between systems. e.g.: "GETFILE", "PUTKAFKA", "GETFILE", "PUTFILE", "PUTHDFS" and etc.
  • Experience in performance tuning of Hadoop cluster using various JVM metrics.

TECHNICAL SKILLS:

BIG Data Ecosystem: HDFS, MapReduce, Spark, Pig, Hive, Hbase, sqoop, zookeeper, Sentry, Ranger, Storm, Kafka, Oozie, flume, Apache Flink, Docker, Hue, Knox, NiFi

BIG Data Security: Kerberos, AD, LDAP, KTS, KMS, Redaction, Sentry, Ranger, Navencrypt, SSL/TLS, Cloudera Navigator, Hortonworks

No SQL Databases: HBase, Cassandra, MongoDB

Programming Languages: Java, Scala, Python, SQL, PL/SQL, Hive-QL, Pig Latin

Frameworks: MVC, Struts, Spring, Hibernate

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP,JSON

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Version control: SVN, CVS, GIT

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

Business Intelligence Tools: Talend, Informatica, Tableau

Databases: Oracle … DB2, SQL Server, MySQL, Teradata, Postgre SQL, Cassandra

Tools: and IDE: Eclipse, IntelliJ, NetBeans, Toad, Maven, Jenkins, ANT, SBT

Cloud Technologies: Amazon Web Services (Amazon RedShift, S3), Microsoft Azure HDInsight

Operating Systems: Unix, Linux and Windows

Other Tools: GitHub, Maven, Puppet, Chef, Clarify, JIRA, Quality Center, Rational Suite of Products, MS Test Manager, TFS, Jenkins, Confluence, Splunk, NewRelic

PROFESSIONAL EXPERIENCE:

Hadoop Consultant

Confidential, Sunnyvale, CA

Responsibilities:

  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. PIG Latin scripts for the analysis of semi structured data.
  • Worked on Hadoop cluster with 450 nodes on Cloudera distribution 7.7.0
  • Supported 200+ servers and 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
  • Do analytics using M ap reduce, hive and pig in HDFS and sends back those results to MongoDB databases and update information in collections.
  • Troubleshooting the Azure Development, configuration and Performance issues.
  • Imported the data from various formats like JSON, Sequential, Text, CSV, XML Flat files, AVRO, and Parquet to HDFS cluster with compression with MapReduce programs.
  • Used Restful Web Services API to connect with the MapR table.
  • Currently working as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC to PROD clusters.
  • Built Cassandra Cluster on both the physical machines and on AWS Automated Cassandra Builds/installation/monitoring etc.
  • Installation and configuration of Openstack Juno on RedHat6 with multiple Compute nodes.
  • Involved in loading data from UNIX file system to HDFS. And created custom Solr Query components to enable optimum search matching.
  • Create multiple groups and set permission polices for various groups in AWS
  • Consulted with the operations team on deploying, migrating data, monitoring, analyzing, and tuning MongoDB applications.
  • Worked on deployment on AWS EC2 instance with Postgres RDS and S3 file storage.
  • Installed and configured CDH-Hadoop environment and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Used Jira for project tracking, Bug tracking and Project Management. Load data from various data sources into HDFS using Kafka.
  • Utilized Azure HDInsight to monitor and manage the Hadoop Cluster.
  • Load the data into Spark RDD and do in memory computation to generate output response.
  • Used L2 for issue which is cannot be handled by L1 it is forwarded to L2.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Transfer the data from HDFS TO MONGODB using pig, hive and Map reduce scripts and visualize the streaming data in dashboard tableau.
  • Managed and reviewed Hadoop log files, file system management and monitoring Hadoop cluster capacity planning.
  • Monitoring workload, job performance and capacity planning using Cloudera Manager.
  • Written Flume configuration files to store streaming data in HDFS & u pgraded Kafka 0.8.2.2 to 0.9.0.0
  • As an admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
  • Used Python scripts to update content in the database and manipulate files.
  • Generated Python Django Forms to record data of online users.
  • Used L2 for technical support level than Tier I and therefore costs more as the technicians are more experienced and knowledgeable on a particular product or service.
  • Installed SonarQube as a Docker container on openstack, Azure, AWS EC2 and integrated it with Jenkins.
  • Knowledgeable with continuous deployment using Heroku, Jenkins and Ansible
  • Install KAFKA on Hadoop cluster and configure producer and consumer coding part in java to establish connection from twitter source to HDFS with popular hash tags.
  • Worked on cloud based Architecture for Big Data with Azure.
  • Good understanding on Architecture, Configurations and Security of Cassandra with Falcon. Data read path and write path for Cassandra.

Environment : Ansible, Cloudera 5.6, Azure, HDFS, CDH4.7, Hadoop-2.0.0 HDFS, MapReduce, Hue, MongoDB-2.6, Hive-0.10, Sqoop-1.4.3, Oozie-3.3.4, AWS, Jira, Zeppelin, Web Logic 8.1 Kafka, Ranger, Yarn, Falcon, Kerberos, Impala, Pig, Python Scripting, MySQL,Perl, Red Hat Linux, CentOS and other UNIX utilities, Cloudera Manager

Sr. Hadoop Admin

Confidential, New York, NY

Responsibilities:

  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes .
  • Setting up test cluster with new services like Grafana and integrating with Kafka and Hbase for intense monitoring.
  • Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberoes) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption at Rest).
  • Analysis of raw data which was scheduled to dump in Azure blob storage(HDInsight cluster).
  • Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.
  • Building and Implementing ETL process developed using Big data tools such as Spark(scala/python), Nifi.
  • Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, Pair RDD's.
  • Drive end to end deployment of various Components on the Azure Platform
  • Worked on Cloudera to analyze data present on top of HDFS.
  • Responsible for copying 800 TB of HDFS snapshot from Production cluster to DR cluster.
  • Responsible for copying 210 TB of Hbase table from Production to DR cluster.
  • Production experience in large environments using configuration management tools like Chef and Puppet supporting Chef Environment with 170+ servers and involved in developing manifests.
  • Installed and configured Cloudera with REHL and responsible for maintaining cluster.
  • Used Flink Streaming for pipelined Flink engine to process data streams to deploy new API including definition of flexible windows.
  • Configured AWS IAM and Security Groups.
  • Administering 150+ Hadoop servers which need java version updates, latest security patches, OS related upgrades and taking care of hardware related outages.
  • Upgraded Ambari Ambari 2.2.0, Ambari 2.4.2.0. SOLR update from 4.10.3 to Ambari INFRA which is SOLR 5.5.2.
  • Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Knowledge on in HDFS data storage and support for Azure Data Lake.
  • Implemented Cluster Security using Kerberos and HDFS ACLs.
  • Involved in loading data from UNIX file system to HDFS. Created root cause analysis (RCA) efforts for the high severity incidents.
  • Implemented Spark using python and Spark SQL for faster testing processing the data.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Experience in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD at an Enterprise level.
  • Used Hive and created Hive tables, loaded data from Local file system to HDFS.

Environment : Cloudera, Kafka, Apache Hadoop, Apache Flink, Ansible, HDFS, Map R, YARN, Sqoop, Flume, Oozie, Zookeeper, Kerberos, Sentry, Pig, Spark, Hive, Docker, Hbase, Nifi, Python, Puppet/Chef, LDAP/AD, Oracle Enterprise Manager (OEM), MySQL, PostgreSQL, Teradata , Azure

Hadoop Admin

Confidential, San Jose, CA

Responsibilities:

  • Responsible for Cluster maintenance. Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
  • Hands on experience in working with ecosystem like Hive, Pig scripts, Sqoop, MapReduce, Hortonworks YARN, and zookeeper. Strong knowledge of hive's analytical functions.
  • Experience in dealing with Windows Azure IAAS - Virtual Networks, Virtual Machines, Cloud Services.
  • As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
  • Introduced SmartSense and got optimal recommendations from the vendor and even for troubleshooting the issues.
  • Experience with automated CM and maintained a CI/CD pipeline, deployment tools such as Chef, Puppet, or Ansible.
  • Deployed Datalake cluster with Hortonworks Ambari on AWS using EC2 and S3.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Integration of HDInsight with Azure Data Lake store (ADLS) and blob storage.
  • Cluster maintenance as well as commission and decommission of nodes.
  • Collaborate with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
  • Implemented Secondary sorting to sort reducer output globally in M ap reduce.
  • Set up Hortonworks Infrastructure from configuring clusters to Node.
  • Installing, configuring and administering Jenkins CI tool on AWS EC2 instances.
  • Worked on Error handling techniques and tuning the ETL flow for better performance and worked Extensively TAC (Admin Console), where we Schedule Jobs in Job Conductor.
  • Migrated Hive QL queries on structured data into Spark QL to improve performance.
  • Involved in improving the performance and optimization of the existing algorithms using Spark.
  • Configured the Kerberos and installed MIT ticketing system.
  • Hands on experience with Docker Puppet, Chef, Ansible, AWS CloudFormation, AWS CloudFront.
  • Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
  • Installing and configuring CDAP, an ETL tool in the development and Production clusters.
  • Integrated CDAP with Ambari to for easy operations monitoring and management.
  • Used CDAP to monitor the datasets and workflows to ensure smooth data flow.
  • Key role in migrating production and development Hortonworks Hadoop clusters to a new cloud based cluster solution.
  • Monitor Hadoop cluster and proactively optimize and tune cluster for performance.
  • Connected to the HDFS using the third-party tools like Teradata SQL assistant using ODBC driver.
  • Upgraded HDP (HortonWorks Data Platform) multiple times starting from HDP 1.7, HDP 2.3, HDP 2.4.3.

Environment: Hadoop, MapR, HDFS, Hive, Pig, Java, SQL, Sqoop, Nifi, Flume, Oozie, CDH3, MongoDB, Cassandra, HBase, Java (jdk 1.6), Eclipse, Hortonworks, Oracle and Unix/Linux.

Hadoop Admin

Confidential

Responsibilities:

  • Implemented Kerberos for authenticating all the services in Hadoop Cluster.
  • Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name node recovery, Capacity planning, Cassandra and slots configuration.
  • Implemented Security in Web Applications using Azure and Deployed Web Applications to Azure.
  • Used Apache Spark API over Hortonworks Hadoop YARN cluster to perform analytics on data in Hive.
  • Monitored multiple clusters environments using Metrics and Nagios.
  • Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Enabling the monitoring solution for HDInsight, Interactive query, Spark & HBase clusters.
  • Deployed a Hadoop cluster and integrated with Nagios and Ganglia.
  • Installed single Node machines for stake holders with Hortonworks HDP Distribution.
  • Extensively involved in cluster capacity planning, Hardware planning, Installation, Performance tuning of the Hadoop cluster.
  • Used Windows Azure to deploy the application on the cloud and managed the session.
  • Configured Zoo keeper to implement node coordination, in clustering support.
  • Experienced in providing security for Hadoop Cluster with Kerberos.
  • Dumped the data from MYSQL database to HDFS and vice-versa using SQOOP.
  • Used Ganglia and Nagios to monitor the cluster around the clock.
  • Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
  • Worked on a live 52 node Hadoop Cluster running Hortonworks Data Platform.
  • Converted MapReduce jobs into Spark transformations and actions using Spark.
  • Resolving tickets submitted, P1 issues, troubleshoot the error, and documenting resolved errors.
  • Adding new Data Nodes when needed and running balancer.

Environment: Hadoop, Big Data, HDFS, Map Reduce, Hortonworks, Sqoop, Oozie, Pig, Hive, HBase, Flume, LINUX, Java,AWS, HDinsight, Eclipse, Cassandra HDP 2.0 .

Linux Admin

Confidential

Responsibilities:

  • Configuring and troubleshooting DHCP on RHEL servers.
  • Installation, configuration and building RHEL servers troubleshooting and maintaining.
  • Software installation of Enterprise Security Manager on RHEL servers.
  • Performing Shell, Python and Perl Scripting for automation crucial tasks.
  • Patching RHEL servers for security, OS patches and upgrades.
  • Experience in system authentication on RHEL servers using Kerberos, DAP.
  • Installing and Configuring DNS on RHEL servers.
  • Managing Disk File Systems, Server Performance, Users Creation and Granting File Access Permissions.
  • Troubleshooting the backup issues by analyzing the NetBackup logs.
  • Configuring and Troubleshooting of various services like NFS, SSH, Telnet, FTP on UNIX platform.
  • Monitoring Disk, CPU and Memory & Performance of servers. Configuring LVM's

Environment : RHEL, Solaris, VMware, Apache, JBOSS, Web Logic, System Authentication, Web sphere, NFS, DNS, SAMBA, Red Hat Linux servers, Oracle RAC, VMware, DHCP.

Hire Now