Hadoop Consultant Resume Overland Park, KS - Hire IT People

PROFESSIONAL SUMMARY:

Over 7 + years of expertise in Hadoop, Big Data Analytics and Linux including design, installation, configuration and management of Apache Hadoop Clusters, Mapr, and Hortonworks& Cloudera Hadoop Distribution.
Experienced in Hadoop Big Data Integration with ETL on performing data extract, loading and transformation process for ERP data.
Set up Disks for MapR, Handled Disk Failures configured storage pools and worked with a Logical Volume Manager.
Experience in AWS CloudFront, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
Used POSTGRESQL a system of data management was designed and developed where queries were optimized to improve the performance .
Good understanding on cluster capacity planning and configuring the cluster components based on requirements.
Collecting and aggregating a large amount of log data using Apache Flume and storing data in HDFS for further analysis.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
Knowledge of securing the Hadoop cluster using Kerberos and Sentry.
Experience in setting up automatic failover control and manual failover control using ZooKeeper and quorum journal nodes.
Implemented Capacity schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
Comprehensive Knowledge of Linux kernel tuning, patching and extensive knowledge of Linux system imaging/mirroring using System Imager.
Hands on Experience on Data Extraction, Transformation, Loading Data, Data Visualization using hortonworks Platform HDFS, Hive, Sqoop, Hbase, Oozie, Beeline, Docker, Yarn, impala, spark, Scala, Vertica, Oracle,JSON, MS-Sql.
Experience in writing Shell scripts using ksh, bash, and perl, for process automation of databases, applications, backup and scheduling.
Experienced on setting up Hortonworks cluster and installing all the ecosystem components through Ambari.
Involved in Hadoop cluster environment administration that includes adding and removing cluster nodes, cluster monitoring and troubleshooting.
Worked on Apache Flink to implements the transformation on data streams for filtering, aggregating, update state.
Experience in monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
Excellent knowledge on standard FlowFile processors in NIFI that are used for data routing, transformation and Mediation between systems. e.g.: "GETFILE", "PUTKAFKA", "GETFILE", "PUTFILE", "PUTHDFS" and etc.
Experience in performance tuning of Hadoop cluster using various JVM metrics
Implemented and managed for Devops infrastructure, Terraform, Jenkins, Puppet and Ansible implementation, Responsible for CI infrastructure and CD infrastructure and process and deployment strategy.
Good knowledge on spark components like Spark SQL, MLib, Spark Streaming and GraphX.
Hands on experience in installation, configuration, management and support of full stack Hadoop Cluster both on premise and cloud using Hortonworks and Cloudera bundles.
Working knowledge of Kafka for real-time data streaming and event based architecture.
Knowledge of minor and major upgrades of Hadoop and eco-system components.
Hands-on experience in AWS cloud landscape including EC2, Identity and Access management (IAM), S3, Cloud Watch, VPC, RDS.
Managed Data with Volumes and worked with Snapshots, Mirror Volumes, Data protection and Scheduling in MapR.
Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster.
Expertise in Writing, Reviewing and Executing Test Cases.
Conceptual Knowledge in Load testing, Black Box testing, and Performance testing, and Stress Testing.
Analyzed and tuned performance for spark jobs in EMR, understanding the type and size of the input processed using specific instance types.
Hands on experience in deploying AWS services using Terraform.
Expert in setting up SSH, SCP, SFTP connectivity between UNIX hosts.
Experienced in DNS, NIS, NFS, Solr, FTP, NIS+, Samba Server, LDAP/AD remote access, security management, and system troubleshooting skills.

TECHNICAL SKILLS:

BIG Data Ecosystem: Spark, Pig, Hive, Hbase, sqoop, zookeeper, Sentry, Ranger, Storm, Kafka, Oozie, flume, Apache Flink, Docker, Solr, Hue, Knox, NiFi, HDFS, MapReduce

Cloud Technologies: Amazon Web Services (Amazon RedShift, S3), Microsoft Azure Insight

BIG Data Security: Redaction, Sentry, Ranger, Navencrypt, Kerberos, AD, LDAP, KTS, KMS, SSL/TLS, Cloudera Navigator, Hortonworks

Programming Languages: Scala, Python, Java, SQL, PL/SQL, Hive-QL, Pig Latin

No SQL Databases: Cassandra, MongoDB

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP,JSON

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Version control: SVN, CVS, GIT

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

Databases: Oracle DB2, SQL Server, MySQL, Teradata, Postgre SQL, Cassandra

Tools: Eclipse, IntelliJ, NetBeans, Toad, Maven, Jenkins, ANT, SBT

Operating Systems: Unix, Linux and Windows

Other Tools: GitHub, Maven, Puppet, Chef, Clarify, JIRA, Quality Center, Rational Suite of Products, MS Test Manager, TFS, Jenkins, Confluence, Splunk, NewRelic

WORK EXPERIENCE:

Hadoop Consultant

Confidential, Overland Park, KS

Responsibilities:

Involved in Requirement analysis, Design, Development, Data Design and Mapping, extraction, validation and creating complex business requirements.
Generated Python Django Forms to record data of online users.
Used L2 for technical support level than Tier I and therefore costs more as the technicians are more experienced and knowledgeable on a particular product or service.
Knowledgeable with continuous deployment using Heroku, Jenkins and Ansible
Install KAFKA on Hadoop cluster and configure producer and consumer coding part in java to establish connection from twitter source to HDFS with popular hash tags.
Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5 using Puppet.
Worked on cloud based Architecture for Big Data with Azure.
Good understanding on Architecture, Configurations and Security of Cassandra with Falcon. Data read path and write path for Cassandra.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. PIG Latin scripts for the analysis of semi structured data.
Worked on Hadoop cluster with 100 nodes on Cloudera distribution 5.6
Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
Transfer the data from HDFS TO MONGODB using pig, hive and Map reduce scripts and visualize the streaming data in dashboard tableau.
Managed and reviewed Hadoop log files, file system management and monitoring Hadoop cluster capacity planning.
Monitoring workload, job performance and capacity planning using Cloudera Manager.
Written Flume configuration files to store streaming data in HDFS & upgraded Kafka 0.8.2.2 to 0.9.0.0
As an admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
Used Python scripts to update content in the database and manipulate files.
Supported 200+ servers and 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices
Do analytics using Map reduce, hive and pig in HDFS and sends back those results to MongoDB databases and update information in collections.
Imported the data from various formats like JSON, Sequential, Text, CSV, XML Flat files, AVRO, and Parquet to HDFS cluster with compression with MapReduce programs.
Used Restful Web Services API to connect with the MapR table.
Currently working as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC to PROD clusters.
Built Cassandra Cluster on both the physical machines and on AWS Automated Cassandra Builds/installation/monitoring etc.
Installation and configuration of Openstack Juno on RedHat6 with multiple Compute nodes.
Involved in loading data from UNIX file system to HDFS. And created custom Solr Query components to enable optimum search matching.
Create multiple groups and set permission polices for various groups in AWS
Consulted with the operations team on deploying, migrating data, monitoring, analyzing, and tuning MongoDB applications.
Worked on deployment on AWS EC2 instance with Postgres RDS and S3 file storage.
Installed and configured CDH-Hadoop environment and responsible for maintaining cluster and managing and reviewing Hadoop log files.
Used Jira for project tracking, Bug tracking and Project Management. Load data from various data sources into HDFS using Kafka.
Load the data into Spark RDD and do in memory computation to generate output response.

Environment : MapR 5.2 & 6.0 , Hue, MongoDB-2.6, Hive-0.10, Sqoop-1.4.3, Oozie-3.3.4, AWS, Jira, Zeppelin, Web Logic 8.1 Kafka, Ranger, CentOS and other UNIX utilities, Cloudera Manager , Yarn, Falcon, Kerberos, Ansible, Cloudera 5.6 & 6.0, Azure, HDFS, CDH4.7, Hadoop-2.0.0 HDFS , Impala, Pig, Python Scripting

Hadoop Admin

Confidential, Tyson, VA

Responsibilities:

Implemented Cluster Security using Kerberos and HDFS ACLs.
Involved in loading data from UNIX file system to HDFS. Created root cause analysis (RCA) efforts for the high severity incidents.
Implemented Spark using python and Spark SQL for faster testing processing the data.
Maintained GIT repositories for DevOps environment: automation code and configuration.
Parsed high-level design specification to simple ETL coding and mapping standards and cluster co-ordination services through Zookeeper.
Involved in Devops tools such as Jenkins, Nexus, Chef and Ansible for build and deploy applications.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
Experience in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD at an Enterprise level.
Used Hive and created Hive tables, loaded data from Local file system to HDFS.
Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
Setting up test cluster with new services like Grafana and integrating with Kafka and Hbase for intense monitoring.
Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberoes) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption at Rest).
Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.
Building and Implementing ETL process developed using Big data tools such as Spark(scala/python), Nifi.
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, Pair RDD's.
Successfully migrated the Elasticsearch database from SQLite to MySQL to PostgreSQL with complete data integrity.
Worked on Cloudera to analyze data present on top of HDFS.
Responsible for copying 800 TB of HDFS snapshot from Production cluster to DR cluster.
Knowledge on in HDFS data storage and support for Azure Data Lake.
Responsible for copying 210 TB of Hbase table from Production to DR cluster.
Production experience in large environments using configuration management tools like Chef and Puppet supporting Chef Environment with 170+ servers and involved in developing manifests.
Installed and configured Cloudera with REHL and responsible for maintaining cluster.
Used Flink Streaming for pipelined Flink engine to process data streams to deploy new API including definition of flexible windows.
Configured AWS IAM and Security Groups.
Administering 150+ Hadoop servers which need java version updates, latest security patches, OS related upgrades and taking care of hardware related outages.
Upgraded Ambari Ambari 2.2.0, Ambari 2.4.2.0. SOLR update from 4.10.3 to Ambari INFRA which is SOLR 5.5.2.
Installed application on AWS EC2 instances and also configured the storage on S3 buckets.

Environment : Cloudera, HDFS, Map R 5.1, YARN, Sqoop, Flume, Oozie, Zookeeper, Kerberos, Sentry, Pig, Spark, Hive, Docker, Hbase, Nifi, Python, Puppet/Chef, LDAP/AD, Oracle Enterprise Manager (OEM), MySQL, PostgreSQL, Teradata , Azure, Kafka, Apache Hadoop, Apache Flink, Ansible.

Hadoop Admin

Confidential, Irvine, CA

Responsibilities:

Involved in improving the performance and optimization of the existing algorithms using Spark.
Migrated Hive QL queries on structured data into Spark QL to improve performance.
Configured the Kerberos and installed MIT ticketing system.
Hands on experience with Docker Puppet, Chef, Ansible, AWS CloudFormation, AWS CloudFront.
Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
Installing and configuring CDAP, an ETL tool in the development and Production clusters.
Integrated CDAP with Ambari to for easy operations monitoring and management.
Used CDAP to monitor the datasets and workflows to ensure smooth data flow.
Key role in migrating production and development Hortonworks Hadoop clusters to a new cloud based cluster solution.
Connected to the HDFS using the third-party tools like Teradata SQL assistant using ODBC driver.
Upgraded HDP (HortonWorks Data Platform) multiple times starting from HDP 1.7, HDP 2.3, HDP 2.4.3
Cluster maintenance as well as commission and decommission of nodes.
Collaborate with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
Implemented Secondary sorting to sort reducer output globally in Map reduce.
Set up Hortonworks Infrastructure from configuring clusters to Node.
Installing, configuring and administering Jenkins CI tool on AWS EC2 instances.
Responsible for Cluster maintenance. Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
Hands on experience in working with ecosystem like Hive, Pig scripts, Sqoop, MapReduce, Hortonworks YARN, and zookeeper. Strong knowledge of hive's analytical functions.
As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
Introduced SmartSense and got optimal recommendations from the vendor and even for troubleshooting the issues.
Experience with automated CM and maintained a CI/CD pipeline, deployment tools such as Chef, Puppet, or Ansible.
Deployed Datalake cluster with Hortonworks Ambari on AWS using EC2 and S3.
Responsible for building scalable distributed data solutions using Hadoop.
Worked on Error handling techniques and tuning the ETL flow for better performance and worked Extensively TAC (Admin Console), where we Schedule Jobs in Job Conductor.

Environment: Hive, Pig, Java, SQL, Sqoop, Nifi, Flume, Oozie, CDH3, MongoDB, Cassandra, HBase, Java (jdk 1.6), Eclipse, Hortonworks, Oracle and Unix/Linux, Hadoop, MapR 5.0, HDFS.

Linux / Hadoop Admin

Confidential

Responsibilities:

Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
Installed single Node machines for stake holders with Hortonworks HDP Distribution.
Configured Zoo keeper to implement node coordination, in clustering support.
Experienced in providing security for Hadoop Cluster with Kerberos.
Dumped the data from MYSQL database to HDFS and vice-versa using SQOOP.
Used Ganglia and Nagios to monitor the cluster around the clock.
Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
Monitored multiple clusters environments using Metrics and Nagios.
Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
Configuring and Troubleshooting of various services like NFS, SSH, Telnet, FTP on UNIX platform.
Experience in system authentication on RHEL servers using Kerberos, DAP.
Installing and Configuring DNS on RHEL servers.
Managing Disk File Systems, Server Performance, Users Creation and Granting File Access Permissions.
Troubleshooting the backup issues by analyzing the NetBackup logs.
Configuring and troubleshooting DHCP on RHEL servers.
Installation, configuration and building RHEL servers troubleshooting and maintaining.
Software installation of Enterprise Security Manager on RHEL servers.
Performing Shell, Python and Perl Scripting for automation crucial tasks.

Environment: LINUX, Java, AWS, Hortonworks , Sqoop, Oozie, Pig, Hive, HBase, Flume, Eclipse, Cassandra HDP 2.0, Hadoop, Big Data, HDFS, Map Reduce. Web sphere, NFS, DNS, Red Hat Linux servers, Oracle RAC, VMware, DHCP, RHEL, Solaris, VMware, Apache.

We provide IT Staff Augmentation Services!

Hadoop Consultant Resume

Overland Park, KS

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship