Hadoop Bigdata Engineer Resume
Woodcliff Lake, NJ
SUMMARY
- Over 7+ years of expertise in Hadoop, Big Data Analytics and Linux including architecture, design, installation, configuration and management of Apache Hadoop Clusters, Mapr, and Hortonworks& Cloudera Hadoop Distribution.
- Experience in configuring, installing and managing MapR, Hortonworks& Cloudera Distributions.
- Hands on experience in installing, configuring, monitoring and using Hadoop components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Hortonworks, Oozie, Apache Spark, Impala.
- Working experience wif large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
- Working noledge of monitoring tools and frameworks such as Splunk, Influx DB, Prometheus, SysDig, Data Dog, App - Dynamics, New Relic, and Nagios.
- Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
- Standardize Splunk forwarder deployment, configuration and maintenance across a variety of Linux platforms. Also worked on Devops tools like Puppet and GIT.
- Hands on experience on configuring a Hadoop cluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance.
- Experience wif complete Software Design Lifecycle including design, development, testing and implementation of moderate to advanced complex systems.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Hortonworks,Cloudera and Map Reduce.
- Extensive experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
- Experience in Ranger, Knox configuration to provide the security for Hadoop services (hive, base, hdfs etc.).Experience in administration of Kafka and Flume streaming using Cloudera Distribution.
- Developed automated scripts using Unix Shell for performing RUNSTATS, REORG, REBIND, COPY, LOAD, BACKUP, IMPORT, EXPORT and other related to database activities.
- Installed, configured, and maintained several Hadoop clusters which includes HDFS, YARN, Hive, HBase, Knox, Kafka, Oozie, Ranger, Atlas, Infra Solr, Zookeeper, and Nifi in Kerberized environments.
- Experienced wif deployments, maintenance and troubleshooting applications on Microsoft Azure Cloud infrastructure.Excellent noledge of NOSQL databases like HBase, Cassandra.
- Experience in large scale Hadoop cluster, handling all Hadoop environment builds, including design, cluster setup, performance tuning .
- Experience in hbase replication and maprdb replication setup between two clusters
- Release process implementation like Devops and Continuous Delivery methodologies to existing Build and Deployments.Experience wif scripting languages python, Perl or shell script also.
- Modified reports and Talen ETL jobs based on the feedback from QA testers and Users in development and staging environments.
- Worked wif systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Deployed Grafana Dashboards for monitoring cluster nodes using Graphite as a Data Source and collect as a metric sender.
- Experienced in workflow scheduling and monitoring tool Rundeck and Control-M.
- Proficiency wif the application servers like Web Sphere, WebLogic, JBOSS and Tomcat.
- Working experience on designing and implementing complete end to end Hadoop Infrastructure.
- Experienced in developing Map Reduce programs using Apache Hadoop for working wif Big Data.
- Responsible for designing highly scalable big data cluster to support various data storage and computation across varied big data cluster - Hadoop, Cassandra, MongoDB & Elastic Search.
TECHNICAL SKILLS
Hadoop/BigData Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Storm, Zookeeper, Kafka, Impala, HCatalog, Apache Spark, Spark Streaming, Spark SQL, HBase, NiFi and Cassandra, AWS (EMR, EC2), Horton Works, Cloudera
Languages: Java, SQL
Protocols: TCP/IP, HTTP, LAN, WAN
Network Services: SSH, DNS/BIND, NFS, NIS, Samba, DHCP, Telnet, FTP, IPtables, MS AD/LDS/ADC and OpenLdap.
Other Tools: Tableau, SAS
Mails Servers and Clients: Microsoft Exchange, Lotus Domino, Send mail, Postfix.
Databases: Oracle 9g/10g & MySQL 4.x/5.x, HBase, NoSQL, Postgres
Platforms: Red Hat Linux, Centos, Solaris, and Windows
Methodologies: Agile Methodology -SCRUM, Hybrid
PROFESSIONAL EXPERIENCE
Confidential - Woodcliff Lake, NJ
Hadoop Bigdata Engineer
Responsibilities:
- Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters, MapR, and Hortonworks& Cloudera Hadoop Distribution.
- Developed data pipeline using Spark, Hive, Pig, python, Impala and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive Sqoop.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Experienced in developing Spark scripts for data analysis in python.
- Installed and configured Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Built on-premise data pipelines using Kafka and Spark for real time data analysis.
- Implemented Hive complex UDF’s to execute business logic wif Hive Queries
- Setup security using Kerberos and AD on Hortonworks clusters/Cloudera CDH
- Responsible for installing, configuring, supporting and managing of Cloudera Hadoop Clusters.
- Installed Kerberos secured Kafka cluster wif no encryption on POC also set up Kafka ACL's
- Created NoSQL solution for a legacy RDBMS Using Kafka, Spark, SOLR, and HBase indexer for ingestion SOLR and HBase for and real-time querying
- Experienced in Administration, Installing, Upgrading and Managing distributions of Hadoop clusters wif MapR 5.1 on a cluster of 200+ nodes in different environments such as Development, Test and Production (Operational & Analytics) environments.
- Troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Extensively worked on Elastic search querying and indexing to retrieve the documents in high speeds.
- Installed, configured, and maintained several Hadoop clusters which includes HDFS, YARN, Hive, HBase, Knox, Kafka, Oozie, Ranger, Atlas, Infra Solr, Zookeeper, and Nifi in Kerberized environments.
- Involved in deploying a Hadoop cluster using Hortonworks Ambari HDP 2.2 integrated wif Sitescope for monitoring and Alerting.
- Implemented Kerberos security in all environments. Defined file system layout and data set permissions.
- Installed and configured Hadoop, Map Reduce, HDFS (Hadoop Distributed File System), developed multiple Map Reduce jobs in java for data cleaning.
- Experience in managing the Hadoop cluster wif IBM Big Insights, Hortonworks Distribution Platform.
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run wif time and data availability.
- Documented EDL (Enterprise Data Lake) best practices and standards includes Data Management
- Regular Maintenance of Commissioned/decommission nodes as disk failures occur using MapR File
- Experience in managing the Hadoop cluster wif IBM Big Insights, Hortonworks Distribution Platform
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration in MapR Control System (MCS).
- Experience in innovative, and where possible, automated approaches for system administration tasks.
- Experience on Ambari (Hortonworks) for management of Hadoop Ecosystem.
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- Working on Oracle Big Data SQL. Integrate big data analysis into existing applications
- Using Oracle Big Data Appliance Hadoop and NoSQL processing and also integrating data inHadoop and NoSQL wif data in Oracle Database
- Worked wif Different Relational Database systems like Oracle/PL/SQL. Used Unix Shell scripting, Python and Experience working on AWS EMR Instances.
- Developed applications, which access the database wif JDBC to execute queries, prepared statements, and procedures.
- Experience wif Cloudera Navigator and Unravel data for Auditing hadoop access.
- Performed data blending of Cloudera Impala and TeraData ODBC data source in Tableau.
- Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing. Mentored EQM team for creating Hive queries to test use cases.
- Sqoop configuration of JDBC drivers for respective relational databases, controlling parallelism, controlling distchache, controlling import process, compression codecs, importing data to hive, HBase, incremental imports, configure saved jobs and passwords, free form query option and trouble shooting.
- Created MapR DB tables and involved in loading data into those tables.
- Collection and aggregation of large amounts of streaming data into HDFS using Flume Configuration of Multiple Agents, Flume Sources, Sinks, Channels, Interceptors defined channel selectors to multiplex data into different sinks and log4j properties
- Extensively worked on the ETL mappings, analysis and documentation of OLAP reports
- Responsible for implementation and ongoing administration of MapR 4.0.1 infrastructure.
- Maintaining the Operations, installations, configuration of 150+ node cluster wif MapR distribution.
- Monitoring the health of the cluster and setting up alert scripts for memory usage on the edge nodes.
- Experience on Linux systems administration on production and development servers (Red Hat Linux, Cent OS and other UNIX utilities). Worked on NoSQL database like HBase and created hive tables on top.
Environment: HBase,Hadoop 2.2.4, Hive, Kerberos,Kafka, YARN, Spark, Impala, SOLR, Java Hadoop cluster, HDFS, Ambari, Ganglia, CentOS, RedHat, Windows, MapR, Yarn, Sqoop, Cassandra.
Confidential, Mountain view, CA
Hadoop Admin /BigData Engineer
Responsibilities:
- Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters, MapR, and Hortonworks& Cloudera Hadoop Distribution.
- Responsible for installing, configuring, supporting and managing of Cloudera Hadoop Clusters.
- Installed Kerberos secured Kafka cluster wif no encryption on POC also set up Kafka ACL's
- Created NoSQL solution for a legacy RDBMS Using Kafka, Spark, SOLR, and HBase indexer for ingestion SOLR and HBase for and real-time querying
- Troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Extensively worked on Elastic search querying and indexing to retrieve the documents in high speeds.
- Installed, configured, and maintained several Hadoop clusters which includes HDFS, YARN, Hive, HBase, Knox, Kafka, Oozie, Ranger, Atlas, Infra Solr, Zookeeper, and Nifi in Kerberized environments.
- Involved in deploying a Hadoop cluster using Hortonworks Ambari HDP 2.2 integrated wif Sitescope for monitoring and Alerting.
- Install OS and administrated Hadoop stack wif CDH5.9 (wif YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning.
- Installed and configured Hadoop, Map Reduce, HDFS (Hadoop Distributed File System), developed multiple Map Reduce jobs in java for data cleaning.
- Experience in managing the Hadoop cluster wif IBM Big Insights, Hortonworks Distribution Platform.
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run wif time and data availability.
- Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos Principal for the Cloudera Manager Server, Enabling Kerberos Using the Wizard.
- Regular Maintenance of Commissioned/decommission nodes as disk failures occur using MapR File
- Experience in managing the Hadoop cluster wif IBM Big Insights, Hortonworks Distribution Platform
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration in MapR Control System (MCS).
- Experience in innovative, and where possible, automated approaches for system administration tasks.
- Experience on Ambari (Hortonworks) for management of Hadoop Ecosystem.
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- Working on Oracle Big Data SQL. Integrate big data analysis into existing applications
- Using Oracle Big Data Appliance Hadoop and NoSQL processing and also integrating data inHadoop and NoSQL wif data in Oracle Database
- Worked on setting up of Hadoop ecosystem&Kafka Cluster on AWS EC2 Instances.
- Worked wif Different Relational Database systems like Oracle/PL/SQL. Used Unix Shell scripting, Python and Experience working on AWS EMR Instances.
- Designed a workflow dat will halp the Cloud-transition management decide the correct queries to by run for Google Big Query. (For each and every query executed in Google Big Query cost is applied)
- Designed and Implemented MongoDB cloud Manger for Google Cloud
- Migrated Data from Oracle & SQL Server Database by Reverse Engineering to MongoDB Database.
- Designed and Implemented MongoDB Cloud Manger for Google cloud
- Experience in automation of code deployment across multiple cloud providers such as Amazon Web Services, Microsoft Azure, Google Cloud, VMWare and OpenStack
- Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing. Mentored EQM team for creating Hive queries to test use cases.
- Deploy Kubernetes in both AWS and Google cloud. Setup cluster, replicator. Deploy multiple containers .
- Sqoop configuration of JDBC drivers for respective relational databases, controlling parallelism, controlling distchache, controlling import process, compression codecs, importing data to hive, HBase, incremental imports, configure saved jobs and passwords, free form query option and trouble shooting.
- Created MapR DB tables and involved in loading data into those tables.
- Collection and aggregation of large amounts of streaming data into HDFS using Flume Configuration of Multiple Agents, Flume Sources, Sinks, Channels, Interceptors defined channel selectors to multiplex data into different sinks and log4j properties
- Extensively worked on the ETL mappings, analysis and documentation of OLAP reports
- Responsible for implementation and ongoing administration of MapR 4.0.1 infrastructure.
- Maintaining the Operations, installations, configuration of 150+ node cluster wif MapR distribution.
- Monitoring the health of the cluster and setting up alert scripts for memory usage on the edge nodes.
- Experience on Linux systems administration on production and development servers (Red Hat Linux, Cent OS and other UNIX utilities). Worked on NoSQL database like HBase and created hive tables on top.
Environment: HBase,Hadoop 2.2.4, Hive, Kerberos,Kafka, YARN, Spark, Impala, SOLR, Java Hadoop cluster, HDFS, Ambari, Ganglia, CentOS, RedHat, Windows, MapR, Yarn, Sqoop, Cassandra.
Confidential - Menomonee Falls, WI
Hadoop Administrator
Responsibilities:
- Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
- Monitoring and support through Nagios and Ganglia.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Created MapR DB tables and involved in loading data into those tables.
- Maintaining the Operations, installations, configuration of 100+ node clusters wif MapR distribution.
- Installed and configured Cloudera CDH 5.7.0 REHL 5.7, 6.2, 64-bit Operating System and responsible for maintaining cluster.
- Implementing Security on MapR cluster using BOKS and by encrypting the data on fly
- Continuous monitoring and managing the HADOOP cluster through MapR Control System, Spyglassand Geneos.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Cloudera Navigator installation and configuration using Cloudera Manager.
- Cloudera RACK awareness and JDK upgrade using Cloudera manager.
- Sentry installation and configuration for Hive authorization using Cloudera manager.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Experience in setup, configuration and management of security for Hadoop clusters using Kerberos and integration wif LDAP/AD at an Enterprise level.
- Actively involved on proof of concept for Hadoop cluster in AWS. Used EC2 instances, EBS volumes and S3 for configuring the cluster.
- Involved in migrating the ON PREMISE data to AWS.
- Used Hive and created Hive tables, loaded data from Local file system to HDFS.
- Production experience in large environments using configuration management tools like Chef and Puppet supporting Chef Environment wif 250+ servers and involved in developing manifests.
- Created EC2 instances and implemented large multi node Hadoop clusters in AWS cloud from scratch using automated scripts such as terraform.
- Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.
- Hands on experience in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
- Assist in Install and configuration of Hive, Pig, Sqoop, Flume, Oozie and HBase on the Hadoop cluster wif latest patches.
- Worked wif Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, and Pair RDD's.
- Responsible for copying 400 TB of HDFS snapshot from Production cluster to DR cluster.
- Responsible for copying 210 TB of Hbase table from Production to DR cluster.
- Created SOLR collection and replicas for data indexing.
- Administering 150+ Hadoop servers which need java version updates, latest security patches, OS related upgrades and taking care of hardware related outages.
- Implemented Cluster Security using Kerberos and HDFS ACLs.
- Involved in Cluster Level Security, Security of perimeter (Autantication- Cloudera Manager, Active directory and Kerberoes) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption at Rest).
- Experience in setting up Test, QA, and Prod environment. Written Pig Latin Scripts to analyze and process the data.
- Involved in loading data from UNIX file system to HDFS. Created root cause analysis (RCA) efforts for the high severity incidents.
- Investigate the root cause of Critical and P1/P2 tickets.
Environment: Cloudera, Apache Hadoop, HDFS, YARN, Cloudera Manager, Sqoop, Flume, Oozie, Zookeeper, Kerberos, Sentry, AWS, Pig, Spark, Hive, Docker, Hbase, Python, LDAP/AD, NOSQL, Gloden Gate, EM Cloud Control, Exadata Machines X2/X3, Toad, MySQL, PostgreSQL, Teradata.
Confidential - Montgomery, AL
Hadoop Administrator
Responsibilities:
- Working as Hadoop Administrator wif Cloudera Distribution of Hadoop (CDH).
- Installed/Configured/Maintained Apache Hadoop and Cloudera Hadoop clusters for application development and Hadoop tools like HDFS, Hive, HBase, Zookeeper and Map Reduce.
- Managing and scheduling Jobs on Hadoop Clusters using Apache, Cloudera (CDH5.7.0, CDH5.10.0) distributions.
- Successfully upgraded Cloudera Distribution of Hadoop distribution stack from 5.7.0 to 5.10.0.
- Installed and configured a Cloudera Distribution of Hadoop (CDH) manually through command line.
- Maintaining the Operations, installations, configuration of 150 node clusters wif CDH distribution.
- Monitored multiple Hadoop clusters environments, workload, job performance and capacity planning using Cloudera Manager.
- Created instances in AWS as well as migrated data to AWS from data Center using snowball and AWS migration service.
- Created graphs for each HBase table in cloudera on basis of writes, reads, file size in respective dashboards.
- Exported and created Dashboards of cloudera logs in to Grafana by using JMX exporter and Prometheus.
- Installed and Configured SOLR in cloudera to query HBase data.
- Worked on setting up High availability for major Hadoop Components like Name Node, Resource Manager, Hive and Cloudera Manager.
- Created new Users, Principals, Keytabs in different kerberozed clusters.
- Part of every 30 day patching wif Operational team on Hadoop clusters.
- Installed and configured Hadoop cluster across various environments through Cloudera Manager.
- Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager
- Enable TLS between Cloudera manager and agents.
- Enhancing by tuning performance of HBase and HDFS to wif stand heavy writes and reads by changing Configurations.
- Installed and Configured Phoenix to query HBase data in Cloudera Environment.
- Configured CDH Dynamic Resource Pools to schedule and allocate resources to YARN applications.
- Involved in start to end process of Hadoop cluster setup which includes Configuring and Monitoring the Hadoop Cluster.
- Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
- Installed Name Node, Secondary Name Node, Yarn (resource Manager, Node manager, Application Master) and Data Nodes.
- Handling and generating tickets via the BMC Remedy ticketing tool.
- Commissioning and Decommissioning Hadoop Cluster Nodes Including Load Balancing HDFS block data.
- Monitoring performance and tuning configuration of services in Hadoop Cluster.
- Experienced in managing and reviewing Hadoop log files.
Environment: Linux, Shell Scripting, Teradata, SQL server, Cloudera 5.7, 5.8, 5.9 Hadoop, Flume, Sqoop, Pig, Hive, Zookeeper and HBase.
Confidential
Linux System Administrator
Responsibilities:
- Installation, configuration and administration of Red Hat Linux servers and support for Servers and regular upgrades of Red Hat Linux Servers using kick start based network installation.
- Provided 24x7 System Administration support for Red Hat Linux 3.x, 4.x servers and resolved trouble tickets on shift rotation basis.
- Configured HP ProLiant, Dell Power edge, R series, and Cisco UCS and IBM p-series machines, for production, staging and test environments.
- Creating, cloning Linux Virtual Machines, templates using VMware Virtual Client 3.5 and migrating servers between ESX hosts.
- Configured Linux native device mappers (MPIO), EMC power path for RHEL 5.5, 5.6, and 5.7.
- Performance monitoring utilities like IOSTAT, VMSTAT, TOP, NETSTAT and SAR.
- Worked on Support for Aix matrix sub system device drivers. Coordinating wif SAN team for allocation of LUN's to increase file system space.
- Worked on wif the computing by both physical and virtual from the desktop to the data center using the SUSE Linux. Worked wif the team members to create, execute, and implement the plans.
- Installation, Configuration, and Troubleshooting of Tivoli Storage Manager.
- Remediating failed backups, take manual incremental backups of failing servers.
- Upgrading TSM from 5.1.x to 5.3.x. Worked on HMC Configuration and management of HMC Console which included up gradation, micro partitioning.
- Installation of adapter cards cables and configuring them. Worked on Integrated Virtual Ethernet and building up of VIO servers.
- Install SSH Keys for Successful login of SRM data into the server wifout prompting password for daily backup of vital data such as processor utilization, disk utilization, etc.
- Coordinating wif application and database team for troubleshooting the application. Provide redundancy wif HBA card, Ether channel configuration and network devices.
- Managing UNIX Infrastructure involves day-to-day maintenance of servers and troubleshooting.
- Provisioning Red Hat Enterprise Linux Server using PXE Boot according to requirements.
- Performed Red Hat Linux Kickstart installations on RedHat 4.x/5.x, performed Red Hat Linux Kernel Tuning, memory upgrades.
- Working wif Logical Volume Manager and creating of volume groups/logical performed Red Hat Linux Kernel Tuning.
- Checking and cleaning the file systems whenever it's full. Used Log watch 7.3, which reports server info as scheduled.
- Managed and reviewed data backups and log files and worked on deploying Java applications on cluster.
Environment: Red Hat Enterprise Linux 3.x/4.x/5.x, Sun Solaris 10, on Dell Power Edge servers, Swoop, Hbase.