Bigdata Engineer Resume
Los Angeles, CA
SUMMARY
- Over 8+ years of expertise in Hadoop, Big Data Analytics and Linux including architecture, design, installation, configuration and management of Apache Hadoop Clusters, Mapr, and Hortonworks& Cloudera Hadoop Distribution.
- Experience in configuring, installing and managing MapR, Hortonworks& Cloudera Distributions.
- Hands on experience in installing, configuring, monitoring and using Hadoop components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Hortonworks, Oozie, Apache Spark, Impala.
- Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Cloudera, Hortonworks, and Amazon AWS.
- Experience in Amazon Web Services (AWS) provisioning and good knowledge of AWS services like EC2, ELB, Elastic Container Service, S3, DMS, VPC, Route53, Cloud Watch.
- Experience in installing, patching, upgrading and configuring Linux based operating systems like CentOS, Ubuntu, RHEL on a large set of clusters.
- Excellent understanding and knowledge of Hadoop architecture and various components such as HDFS, Namenode, Datanode, Job tracker, Task tracker, Node manager, Resource manager and Application master.
- Hands on experience in installing, configuring Hadoop ecosystem components such as MapReduce, spark, HDFS, HBase, Oozie, Hive, Pig, impala, and zookeeper, Yarn, Kafka and Sqoop.
- Experience in improving teh Hadoop cluster performance by considering teh OS kernel, storage, Networking, Hadoop HDFS and MapReduce by setting appropriate configuration parameters.
- Experience in Installation and configuration, Hadoop Cluster Maintenance, Cluster Monitoring and Troubleshooting.
- Experience in tuning teh performance of Hadoop eco system as well as monitoring.
- Experience in understanding teh security requirements for Hadoop and integrating wif Kerberos autantication infrastructure - KDC server setup, creating and managing teh real domain.
- Experience in importing and exporting teh data using sqoop from relational database/mainframe to HDFS and vice versa and logs using Flume.
- Experience in configuring Zookeeper to provide high availability and Cluster service co-ordination.
- Strong knowledge in Name Node High Availability and recovery of Name node metadata and data residing in teh cluster.
- Working experience wif large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
- Working knowledge of monitoring tools and frameworks such as Splunk, Influx DB, Prometheus, SysDig, Data Dog, App-Dynamics, New Relic, and Nagios.
- Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
- Standardize Splunk forwarder deployment, configuration and maintenance across a variety of Linux platforms. Also worked on Devops tools like Puppet and GIT.
- Hands on experience on configuring a Hadoop cluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance.
- Experience wif complete Software Design Lifecycle including design, development, testing and implementation of moderate to advanced complex systems.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Hortonworks,Cloudera and Map Reduce.
- Extensive experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
- Experience in Ranger, Knox configuration to provide teh security for Hadoop services (hive, base, hdfs etc.).Experience in administration of Kafka and Flume streaming using Cloudera Distribution.
- Developed automated scripts using Unix Shell for performing RUNSTATS, REORG, REBIND, COPY, LOAD, BACKUP, IMPORT, EXPORT and other related to database activities.
- Installed, configured, and maintained several Hadoop clusters which includes HDFS, YARN, Hive, HBase, Knox, Kafka, Oozie, Ranger, Atlas, Infra Solr, Zookeeper, and Nifi in Kerberized environments.
- Experienced wif deployments, maintenance and troubleshooting applications on Microsoft Azure Cloud infrastructure.Excellent knowledge of NOSQL databases like HBase, Cassandra.
- Experience in large scale Hadoop cluster, handling all Hadoop environment builds, including design, cluster setup, performance tuning .
- Release process implementation like Devops and Continuous Delivery methodologies to existing Build and Deployments.Experience wif scripting languages python, Perl or shell script also.
- Involving architecture of our storage service to meet changing requirements for scaling, reliability, performance, manageability, also.
- Worked wif systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Deployed Grafana Dashboards for monitoring cluster nodes using Graphite as a Data Source and collect as a metric sender.
- Experienced in workflow scheduling and monitoring tool Rundeck and Control-M.
- Proficiency wif teh application servers like Web Sphere, WebLogic, JBOSS and Tomcat.
- Working experience on designing and implementing complete end to end Hadoop Infrastructure.
- Good experience on Design, configure and manage teh backup and disaster recovery for Hadoop data.
- Experienced in developing Map Reduce programs using Apache Hadoop for working wif Big Data.
- Responsible for designing highly scalable big data cluster to support various data storage and computation across varied big data cluster - Hadoop, Cassandra, MongoDB & Elastic Search.
TECHNICAL SKILLS
BIG Data Ecosystem: HDFS, MapReduce, Spark, Pig, Hive, Hbase, sqoop, zookeeper, Sentry, Ranger, Storm, Kafka, Oozie, flume, Hue, Knox, NiFi
BIG Data Security: Kerberos, AD, LDAP, KTS, KMS, Redaction, Sentry, Ranger, Navencrypt, SSL/TLS, Cloudera Navigator
No SQL Databases: HBase, Cassandra, MongoDB
Programming Languages: Java, Scala, Python, SQL, PL/SQL, Hive-QL, Pig Latin
Frameworks: MVC, Struts, Spring, Hibernate
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Web/Application servers: Apache Tomcat, WebLogic, JBoss
Version control: SVN, CVS, GIT
BIG Data Ecosystem: HDFS, MapReduce, Spark, Pig, Hive, Hbase, sqoop, zookeeper, Sentry, Ranger, Storm, Kafka, Oozie, flume, Hue, Knox, NiFi
BIG Data Security: Kerberos, AD, LDAP, KTS, KMS, Redaction, Sentry, Ranger, Navencrypt, SSL/TLS, Cloudera Navigator
No SQL Databases: HBase, Cassandra, MongoDB
Programming Languages: Java, Scala, Python, SQL, PL/SQL, Hive-QL, Pig Latin
Frameworks: MVC, Struts, Spring, Hibernate
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Web/Application servers: Apache Tomcat, WebLogic, JBoss
Version control: SVN, CVS, GIT
PROFESSIONAL EXPERIENCE
Confidential, Los Angeles, CA
Bigdata Engineer
Responsibilities:
- Responsible for designing highly scalable big data cluster to support various data storage and computation across varied big data cluster - Hadoop, Cassandra, MongoDB & Elastic Search.
- Worked on analyzing Hortonworks Hadoop cluster and different big data analytic tools including Pig, HBase Database and Sqoop.
- Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
- Monitoring and support through Nagios and Ganglia.
- Responsible for troubleshooting issues in teh execution of MapReduce jobs by inspecting and reviewing log files.
- Working on Oracle Big Data SQL. Integrate big data analysis into existing applications
- Using Oracle Big Data Appliance Hadoop and NoSQL processing and also integrating data inHadoop and NoSQL wif data in Oracle Database
- Created MapR DB tables and involved in loading data into those tables.
- Maintaining teh Operations, installations, configuration of 100+ node clusters wif MapR distribution.
- Installed and configured Cloudera CDH 5.7.0 REHL 5.7, 6.2, 64-bit Operating System and responsible for maintaining cluster.
- Continuous monitoring and managing teh Hadoop cluster through Cloudera Manager.
- Experience wif Cloudera Navigator and Unravel data for Auditing hadoop access.
- Performed data blending of Cloudera Impala and TeraData ODBC data source in Tableau.
- Cloudera Navigator installation and configuration using Cloudera Manager.
- Cloudera RACK awareness and JDK upgrade using Cloudera manager.
- Sentry installation and configuration for Hive authorization using Cloudera manager.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted teh data from MySQL into HDFS using Sqoop.
- Experience in setup, configuration and management of security for Hadoop clusters using Kerberos and integration wif LDAP/AD Confidential an Enterprise level.
- Used Hive and created Hive tables, loaded data from Local file system to HDFS.
- Production experience in large environments using configuration management tools like Chef and Puppet supporting Chef Environment wif 250+ servers and involved in developing manifests.
- Created EC2 instances and implemented large multi node Hadoop clusters in AWS cloud from scratch using automated scripts such as terraform.
- Created NoSQL solution for a legacy RDBMS Using Kafka, Spark, SOLR, and HBase indexer for ingestion SOLR and HBase for and real-time querying
- Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.
- Configured AWS IAM and Security Groups.
- Hands on experience in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
- Setting up test cluster wif new services like Grafana and integrating wif Kafka and Hbase for intense monitoring.
- Administering and configuring Kubernetes .
- Worked wif Spark for improving performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, and Pair RDD's.
- Responsible for copying 400 TB of HDFS snapshot from Production cluster to DR cluster.
- Responsible for copying 210 TB of Hbase table from Production to DR cluster.
- Created SOLR collection and replicas for data indexing.
- Worked on Google Cloud Platform Services like Vision API, Instances.
- Administering 150+ Hadoop servers which need java version updates, latest security patches, OS related upgrades and taking care of hardware related outages.
- Upgraded Ambari 2.2.0, Ambari 2.4.2.0. SOLR update from 4.10.3 to Ambari INFRA which is SOLR 5.5.2.
- Implemented Cluster Security using Kerberos and HDFS ACLs.
- Involved in Cluster Level Security, Security of perimeter (Autantication- Cloudera Manager, Active directory and Kerberoes) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption Confidential Rest).
- Experience in setting up Test, QA, and Prod environment. Written Pig Latin Scripts to analyze and process teh data.
- Involved in loading data from UNIX file system to HDFS. Created root cause analysis (RCA) efforts for teh high severity incidents.
Environment: Cloudera, Apache Hadoop, HDFS, YARN, Cloudera Manager, Sqoop, Flume, Oozie, Zookeeper, Kerberos, Sentry, AWS, Pig, Spark, Hive, Docker, Hbase, Python, LDAP/AD, NOSQL, Exadata Machines X2/X3, Toad, MySQL, PostgreSQL, Teradata.
Confidential, Boston, MA
Sr. Hadoop Admin /Bigdata Enginer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Experienced on setting up Horton works cluster and installing all teh ecosystem components through Ambari and manually from command line.
- Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.2 and followed standard Back up policies to make sure teh high availability of cluster.
- Involved in deploying a Hadoop cluster using Hortonworks Ambari HDP 2.2 integrated wif Sitescope for monitoring and Alerting.
- Launching and Setup of HADOOP Cluster on AWS as well as physical servers, which includes configuring different components of HADOOP.
- Created a local YUM repository for installing and updating packages. Configured and deployed hive metastore using MySQL and thrift server.
- Responsible for building system that ingests terabytes of data per day into Hadoop from a variety of data sources providing high storage efficiency and optimized layout for analytics.
- Developed data pipelines that ingests data from multiple data sources and process them.
- Expertise in Using Sqoop to connect to teh ORACLE, MySQL, SQL Server, TERADATA and move teh pivoted data to Hive tables or Hbase tables.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Cluster Planning, Manage and review data backups, Manage & review log files.
- Experienced on setting up Horton works cluster and installing all teh ecosystem components through Ambari and through command line interface.
- Developed scripts for tracking teh changes in file permissions of teh files and directories through audit logs in HDFS.
- Deploy Kubernetes in both AWS and Google cloud. Setup cluster, replicator. Deploy multiple containers .
- Configured memory and v-cores for teh dynamic resource pools wifin teh fair and capacity scheduler.
- Implemented test scripts to support test driven development and continuous integration.
- Worked intact for integrating teh LDAP server and active directory wif teh Ambari through command line interface.
- Coordinated wif Hortonworks support team through support portal to sort out teh critical issues during upgrades.
- Worked on smoke tests of each service and client upon installation and configuration. Involved in loading data from UNIX file system to HDFS.
- Load and transform large sets of structured, semi structured and unstructured data. Cluster coordination services through Zookeeper.
- Experience in managing and reviewing Hadoop log files. Installed Oozie workflow engine to run multiple Map Reduce, Hive and pig jobs.
- Installed Knox gateway on a separate node for providing REST API services and to give HDFS access to various http applications.
- Balancing HDFS manually to decrease network utilization and increase job performance.
- Commission and decommission teh Data nodes from cluster in case of problems.
- Set up automated processes to archive/clean teh unwanted data on teh cluster, in particular on HDFS and Local file system.
- Set up and manage High Availability of Name node federation using Quorum Manager to avoid single point of failures in large clusters.
Environment: Hortonworks 2.2.1, HDFS, Hive, Pig, Sqoop, HBase, Micro strategy, Shell Scripting, Ubuntu, RedHat Linux.
Confidential, Baltimore, MD
Hadoop Admin
Responsibilities:
- Installing and Working on Hadoop clusters for different teams, supported 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Cloudera Manager is installed on Oracle Big Data Appliance to halp in (CDH) operations.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Upgraded teh Hadoop cluster CDH5.8 to CDH 5.9.
- Worked on Installing cluster, Commissioning & Decommissioning of DataNodes, NameNode Recovery, Capacity Planning, and Slots Configuration.
- Creating collection wifin Apache Sol and Installing teh Solr service through teh Cloudera Manager installation wizard.
- Working on Oracle Big Data SQL. Integrate big data analysis into existing applications
- Using Oracle Big Data Appliance Hadoop and NoSQL processing and also integrating data inHadoop and NoSQL wif data in Oracle Database
- Maintains and monitors database security, integrity, and access controls. Provides audit trails to detect potential security violations.
- Worked on Installing Cloudera Manager, CDH and install teh JCE Policy File to Create a Kerberos Principal for teh Cloudera Manager Server, Enabling Kerberos Using teh Wizard.
- Monitored cluster for performance, networking, and data integrity issues.
- Responsible for troubleshooting issues in teh execution of MapReduce jobs by inspecting and reviewing log files.
- Install OS and administrated Hadoop stack wif CDH5.9 (wif YARN) Cloudera Distributionincluding configuration management, monitoring, debugging, and performance tuning.
- Supported MapReduce Programs and distributed applications running on teh Hadoop cluster.
- Scripting Hadoop package installation and configuration to support fully-automated deployments.
- Designing, developing, and ongoing support of a data warehouse environments.
- Deployed teh Hadoop cluster using Kerberos to provide secure access to teh cluster.
- Converting Map Reduce programs into Spark transformations using Spark RDD's and Scala.
- Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters
- Worked on Hive for further analysis and for generating transforming files from different analytical formats to text files.
- Created Hive External tables and loaded teh data in to tables and query data using HQL.
- Worked wif application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
Environment: MapReduce, Hive 0.13.1, PIG 0.16.0, Sqoop 1.4.6, Spark 2.1, Oozie 4.1.0, Flume, HBase 1.0, Cloudera Manager 5.9, Oracle Server X6, SQL Server, Solr, Zookeeper 3.4.8, Cloudera 5.8, Kerberos and RedHat 6.5
Confidential, Englewood, CO
Hadoop Admin
Responsibilities:
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Worked in a fully managed peta byte-scale data warehouse service and it is designed for analytic workloads and connects to standard SQL-based clients and business intelligence tools in Red shift.
- Involved in architecture, designed, installation, configuration and management of Apache Hadoop, Hortonworks Distribution Platform (HDP).
- Worked in delivers fast query and me/O performance for virtually any size dataset by using columnar storage technology and parallelizing and distributing queries across multiple nodes in Red shift.
- Worked on Storm its distributed real-time computation system provides a set of general primitives for doing batch processing.
- Experience in Horton works Distribution Platform(HDP)cluster installation and configuration.
- Experience in Kerberos, Active Directory/LDAP, Unix based File System.
- Load data from various data sources into HDFS using Flume.
- Worked in statistics collection and table maintenance on MPP platforms.
- Worked on Cloudera to analyze data present on top of HDFS.
- Worked extensively on Hive and PIG.
- Worked on Kafka distributed, partitioned, replicated commit log service and provides teh functionality of a messaging system.
- Worked on spark it’s a fast and general - purpose clustering computing system.
- Wrote code in Python or Shell Scripting.
- Involved in Source Code Management tools and proficient in GIT, SVN, Accurev.
- Involved in Test Driven Development and wrote teh test cases in JUnit.
- Worked on large sets of structured, semi-structured and unstructured data.
- Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Involved in creating Hive tables, loading wif data and writing hive queries which will run internally in map reduce way.
- Worked in bulk load tools such as DW Loader and move data from PDW to Hadoop archive.
- Participated in design and development of scalable and custom Hadoop solutions as per dynamic data needs.
- Involved in Change Data Capture (CDC) data modeling approaches.
- Coordinated wif technical team for production deployment of software applications for maintenance.
- Read data from Cassandra and writing to it.
- Provided operational support services relating to Hadoop infrastructure and application installation.
- Handled teh imports and exports of data onto HDFS using Flume and Sqoop.
- Supported technical team members in management and review of Hadoop log files and data backups.
- Participated in development and execution of system and disaster recovery processes.
- Formulated procedures for installation of Hadoop patches, updates and version upgrades.
- Automated processes for troubleshooting, resolution and tuning of Hadoop clusters.
- Set up automated processes to send alerts in case of predefined system and application level issues.
- Set up automated processes to send notifications in case of any deviations from teh predefined resource utilization.
Environment: RadHat Linux/Centos 4, 5, 6, Logical Volume Manager, Hadoop, VMware ESX 5.1/5.5, Apache and Tomcat Web Server, Oracle 11,12, Oracle Rac 12c, HPSM, HPSA.
Confidential, Minneapolis, MN
Hadoop Admin
Responsibilities:
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Involved in Installing, Configuring Hadoop Eco System and Cloudera Manager using CDH4.
- Good understanding and related experience wif Hadoop stack - internals, Hive, Pig and MapReduce, involved in defining job flows.
- Involved in creating Hive tables, loading wif data and writing hive queries which will run internally in map reduce way.
- Involved in managing and reviewing Hadoop log files.
- Involved in running Hadoop streaming jobs to process terabytes of text data.
- Load large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Supported Map Reduce Programs those are running on teh cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Wrote MapReduce jobs to discover trends in data usage by users.
- Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables.
- Solved small file problem using Sequence files processing in Map Reduce.
- Monitor System health and logs and respond accordingly to any warning or failure conditions.
- Performed cluster co-ordination through Zookeeper.
- Involved in support and monitoring production Linux Systems.
- Expertise in Archive logs and monitoring teh jobs.
- Monitoring Linux daily jobs and monitoring log management system.
- Expertise in troubleshooting and able to work wif a team to fix large production issues.
- Expertise in creating and managing DB tables, Index and Views.
- User creation and managing user accounts and permissions on Linux level and DB level.
- Extracted large data sets from different sources wif different data-source formats which include relational databases, XML and flat files using ETL extra processing.
Environment: Cloudera Distribution CDH 4.1, Apache Hadoop, Hive, MapReduce, HDFS, PIG, ETL, HBase, Zookeeper
Confidential
Linux System Administrator
Responsibilities:
- Installation, configuration and administration of Linux (RHEL 4/5) and Solaris 8/9/10 servers.
- Maintained and support mission critical, front end and back-end production environments.
- Configured RedHat Kickstart server for installing multiple production servers.
- Provided Tier 2 support to issues escalated from Technical Support team and interfaced wif development teams, software vendors, or other Tier 2 teams to resolve issues.
- Installing and partitioning disk drives. Creating, mounting and maintaining file systems to ensure access to system, application and user data.
- Maintenance and installation of RPM and YUM package installations and other server.
- Creating users, assigning groups and home directories, setting quota and permissions; administering file systems and recognizing file access problems.
- Experience in Managing and Scheduling Cron jobs such as enabling system logging, network logging of servers for maintenance, performance tuning and testing.
- Maintaining appropriate file and system security, monitoring and controlling system access, changing permission, ownership of files and directories, maintaining passwords, assigning special privileges to selected users and controlling file access.
- Extensive use of LVM, creating Volume Groups, Logical volumes.
- Performed various configurations which include networking and IPTables, resolving hostnames, SSH key less login.
- Build, configure, deploy, support, and maintain enterprise class servers and operating systems.
- Building Centos 5/6 Servers, Oracle Enterprise Linux 5 and RHEL (4/5) Servers from scratch.
- Organized teh patch depots and act as POC for teh patch related issues.
- Configuration & Installation of Red hat Linux 5/6, Cent OS 5 and Oracle Enterprise Linux 5 by using Kick Start to reduce teh installation issues.
- Attended team meetings, change control meetings to update installation progress and for upcoming changes in environment.
- Handled patch upgrades and firmware upgrades on and RHEL Servers, Oracle Enterprise Linux Servers.
- User and Group administration on RHEL Systems.
- Creation of various user profiles and environment variables to ensure security.
- Server hardening and security configurations as per teh client specifications.
- A solid understanding of networking/distributed computing environment concepts, including principles of routing, bridging and switching, client/server programming, and teh design of consistent network-wide file system layouts.
- Strong understanding of Network Infrastructure Environment.
Environment: Red Hat Linux AIX, RHEL, Oracle 9i/10g, Samba, NT/2000 Server, VMware 2.x, Tomcat 5.x, Apache Server.