Hadoop/big Data Admin Resume
Sunrise, FL
SUMMARY:
- 6+ years of professional experience including 2 years of Linux Administrator and 3years in Big Data analytics as Hadoop/Bigdata Admin.
- Experience in all the phases of Data warehouse life cycle involving Requirement Analysis, Design, Coding, Testing, and Deployment.
- Experience in working with business analysts to identify study and understand requirements and translated them into ETL code in Requirement Analysis phase.
- Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters, Horton works, and Cloudera, MapR Distribution.
- Good Understanding in MAPRSASL, PAM and how it interacts with Hadoop and LDAP.
- Practical knowledge on functionalities of every Hadoop daemons, interaction between them, resource utilizations and dynamic tuning to make cluster available and efficient.
- Experience in understanding and managing Hadoop Log Files.
- Experience in understanding hadoop multiple data processing engines such as interactive SQL, real time streaming, data science and batch processing to handle data stored in a single platform in Yarn.
- Experience in Adding and removing the nodes in Hadoop Cluster.
- Experience in Change Data Capture (CDC) data modeling approaches.
- Experience in extracting the data from RDBMS into HDFS Sqoop.
- Experience in bulk load tools such as DWLoaderand move data from PDW to Hadoop archive.
- Experience in collecting the logs from log collector into HDFS using up Flume.
- Experience in setting up and managing the batch scheduler Oozie.
- Good understanding of No SQL databases such as HBaseand Mongo DB.
- Experience in analyzing data in HDFS through Map Reduce, Hive.
- Experience in tuning large / complex SQL queries and manage alerts from PDW and Hadoop.
- Experience on UNIX commands and Shell Scripting.
- Experience in Python Scripting.
- Experience in statistics collection and table maintenance on MPP platforms.
- Experience in creating physical data models for data warehousing.
- Experience in Microsoft SQL Server Integration Services (SSIS).
- Extensively worked on the ETL mappings, analysis and documentation of OLAP reports requirements. Solid understanding of OLAP concepts and challenges, especially with large data sets.
- Proficient in Oracle 9i/10g/11g, SQL, MYSQL and PL/SQL.
- Experience on Web development with proficiency on PHP, PHP frameworks like JavaScript, CSS and MySQL.
- Experience in integration of various data sources like Oracle, DB2, Sybase, SQL server and MS access and non - relational sources like flat files into staging area.
- Experience in Data Analysis, Data Cleansing (Scrubbing), Data Validation and Verification, Data Conversion, Data Migrations and Data Mining.
- Excellent interpersonal, communication, documentation and presentation skills.
TECHNICAL SKILLS:
Hadoop /Big Data Technologies: Hadoop 2.7.0, MapRfs, Map Reduce, HBase, Maprdb, Pig, Hive, Sqoop, Yarn, Flume, Zookeeper, Spark, Cassandra, Storm, Hue, Impala and Oozie.
Programming Languages: Java, SQL, PL/SQL, Shell Scripting, Python, Perl
Frameworks: MVC, Spring, Hibernate.
Web Technologies: HTML, XML, JSON, JavaScript, Ajax, SOAP and WSDL
Databases: Oracle 9i/10g/11g, SQL Server, MYSQL
Database Tools: CRM tool, Billing tool, Oracle Warehouse Builder (OWB).
Operating Systems: Linux, Unix, Windows, Mac, RedHat
Other Concepts: OOPS, Data Structures, Algorithms, Software Engineering, ETL
PROFESSIONAL EXPERIENCE:
Confidential, Sunrise, FL
Hadoop/Big data Admin
Responsibilities:
- Involved in Analysis, Design, System architectural design, Process interfaces design, design documentation.
- Responsible for developing prototypes the selected solutions and implementing complex big data projects with a focus on collecting, parsing, managing, analyzing and visualizing large sets of data using multiple platforms.
- Understand how to apply technologies to solve bigdata problems and to develop innovative big data solutions.
- Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra.
- Responsible for analyzing and cleansing raw data by performing Hivequeries and running Pig scripts on data.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Performed importing data from various sources to the Cassandra cluster using Sqoop. Worked on creating data models for Cassandra from Existing Oracle data model.
- Used Spark - Cassandra connector to load data to and from Cassandra.
- Worked in Spark and Scala for Data Analytics. Handle ETL Framework in Spark for writing data from HDFS to Hive.
- Used Scala based written framework for ETL.
- Developed multiple spark streaming and core jobs with Kafka as a data pipe-line system
- Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS.
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage.
- Imported data from AWSS3 into Spark RDD, Performed transformations and actions on RDD's.
- Extensively use Zookeeper as job scheduler for Spark Jobs.
- Worked on Talend with Hadoop. Worked in migrating from Informatica Talend jobs.
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Developed Kafka producer and consumer components for real time data processing.
- Worked on physical transformations of data model which involved in creating Tables, Indexes, Joins, Views and Partitions.
- Involved in Cassandra Data modeling to create key spaces and tables in multi Data Center DSE Cassandra DB.
Confidential, North Brunswick, NJ
Hadoop/Big data Admin
Responsibilities:
- Provided Administration, management and support for large scale Big Data platforms on Hadoop eco-system.
- Involved in Cluster Capacity planning, deployment and Managing Hadoop for our data platform operations with a group of Hadoop architects and stakeholders.
- Responsible for building scalable distributed data solutions using Hadoop.
- Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
- Identified data organized into logical groupings and domains, independent of any application or system.
- Worked in data modeling on Hbase and Cassandra.
- Experienced in working on integrating data stores with legacy systems.
- Experienced in data modeling.
- Experienced in best practices for HBase and relational data model design.
- Strong experience with SQL and Data modeling.
- Experienced with Big Data Machine learning tools Anaconda, Tensor Flow, Mahout, SparkML, Jupyter and Steam
- Deep Learning frameworks such as TensorFlow, Caffe, Caffe2, PyTorch, or MxNet.
- Machine Learning and Artificial Intelligence techniques and tools to include neural networks, deep learning, regression, classification and clustering
- Experience in Install and configure monitoring tools.
- Worked on Hbase replication setup between PRD and BCP clusters.
- Experienced in data visualization tool Tableau, Qliksense.
- Experienced in Hbase table data copy using HBase snapshots.
- Capacity planning, Architecting and designing Hadoop cluster from scratch
- Designing service layout with HA enabled
- Performed pre-installation and post-installation benchmarking and performance testing’s
- Experienced with NiFi.
- Installing, migrating and upgrading multiple MapR clusters
- Designed and implemented the Disaster Recovery mechanism for data, eco-system tools and applications
- Orchestrated data and service High availability within and across the clusters
- Performed multiple rigorous DR testing
- Training, mentoring and supporting team members
- Developing reusable configuration management platform in Ansible and GitHub.
- Moving the Services (Re-distribution) from one Host to another host within the Cluster to facilitate securing the cluster and ensuring High availability of the services
- Working to implement MapR stream to facilitate realtime data ingestion to meet business needs
- Implementing Security on MapR cluster using BOKS and by encrypting the data on fly
- Identifying the best solutions/ Proof of Concept leveraging Big Data & Advanced Analytics that meet and exceed the customer's business, functional and technical requirements
- Created and published various production metrics including system performance and reliability information to systems owners and management.
- Performed ongoing capacity management forecasts including timing and budget considerations.
- Coordinated root cause analysis (RCA) efforts to minimize future system issues.
- Experience in mentor, develop and train junior staff members as needed.
- Provided off hours support on a rotational basis.
- Store unstructured data in semi structure format on HDFS using HBase.
- Used Change management and Incident management process following organization guidelines.
- Continuous monitoring and managing the HADOOP cluster through MapR Control System, Splunk, Spyglass, Kibana, Grafana, Collectd and Geneos.
- Responded to resolve database access and performance issues.
- Planed and coordinated data migrations between systems.
- Performed database transaction and security audits.
- Established appropriate end-user database access control levels.
- On-call availability for rotation on nights and weekends.
- Upgraded MapR 4.1.0 to 5.2.0 version.
- Experience in hbase replication and maprdb replication setup between two clusters.
- Good knowledge of Hadoop cluster connectivity and security.
- Experience in MapRDB, Spark, Elastic searchand Zeppelin.
- Experience in Apache Hive, Drill, Solr, Kafka, Oozie, Presto, Phoenix and HBASE.
- Involved in POCs like application monitoring tool Unravel.
- Experience in working Techlongy health refresh (THR) projects
- Experience in configuration management tool Ansible.
- Responding to database related alerts and escalations and working with database engineering to come up with strategic solutions to recurring problems.
Confidential, Denver, CO
Hadoop Administrator
Responsibilities:
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Handle the installation and configuration of a Hadoop cluster.
- Currently working as admin in Horton works (HDP 2.2.4.2) distribution for 4 clusters ranges from POC to PROD.
- Experience in working with different Hadoop distributions like CDH and Horton works.
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
- Experienced with the Spark improving the performance and optimization of the existing algorithms Hadoop in using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Experienced with Hadoop ecosystems such as Hive, HBase, Sqoop, Kafka, Oozie, ATLAS etc.
- Build and maintain scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Handle the data exchange between HDFS and different Web Applications and databases using Flume and Sqoop.
- Experienced in AWS and EWR clusters.
- Experience in integrating AD/LDAP users with Ambari and Ranger.
- Good experience in implementing Kerberos & Ranger in Hadoop Ecosystem.
- Experience in configuring policies in Ranger to provide the security for Hadoop services (Hive,
- HBase, Hdfsetc.)
- Good Understanding of Rack Awareness in the Hadoop cluster.
- Experience in using Monitoring tools like Cloudera manager and Ambari.
- Monitor the data streaming between web sources and HDFS.
- Worked in Kerberos and how it interacts with Hadoop and LDAP.
- Worked on kafka distributed, partitioned, replicated commit log service and provides the functionality of a messaging system.
- Close monitoring and analysis of the Map Reduce job executions on cluster at task level.
- Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
- Experience in a software intermediary that makes it possible for application programs to interact with each other and share data.
- Worked in Kerberos, Active Directory/LDAP, Unix based File System.
- Changes to the configuration properties of the cluster based on volume of the data being processed and performance of the cluster.
- Setting up Identity, Authentication, and Authorization.
- Maintaining Cluster in order to remain healthy and in optimal working condition.
- Handle the upgrades and Patch updates.
- Set up automated processes to analyze the System and Hadoop log files for predefined errors and send alerts to appropriate groups.
- Load data from various data sources into HDFS using Flume.
- Worked extensively on Hive and PIG.
- Worked on large sets of structured, semi-structured and unstructured data.
- Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Participated in development and execution of system and disaster recovery processes.
- Formulated procedures for installation of Hadoop patches, updates and version upgrades.
- Automated processes for troubleshooting, resolution and tuning of Hadoop clusters.
- Set up automated processes to send alerts in case of predefined system and application level issues.
Confidential, North Bergen, NJ
Linux Administrator
Responsibilities:
- Installation & Configurations of Red Hat Enterprise Linux, Fedora Linux and Centos, Maintenance &System Administration.
- Installation Configuration of LVM (Logical Volume Manager) to manage volume group, logical and physical partitions.
- Documented the standard procedure for installation and deployment of logical volume manager.
- Installation, configuration and support of DHCP, SSH, SCP, FTP, DNS services.
- Configuration and administration of NFS and Samba.
- Maintained and monitored all of company server operating systemsand application patch level, disk space and memory usage, user activities on day-to-day basis.
- User administration on RHEL systems management& archiving.
- Involved with Hardware team to replace Memory, CPU, NIC, Disks and HBA cards.
- Installed and Upgraded Firmware, BIOS and also NIC Firmware
- Installed and Configured Multiple OS using VMware ESXi on IBM, Dell servers.
- Monitored daily backups and maintained offsite backup for recovery in case of a systemscrash.
- Configured networking services and protocols such as DNS, SSH, DHCP, TCP/IP Attended calls related to customer queries and complaints, offered solutions to them.
- Worked with DBA team for database performance issues, network related issue on Linux / Unix Servers and with vendors for hardware related issues.
- Expanded file systemsusing Logical Volume Manager.
- Troubleshooting NetBackup issues (Routing tables and NIC)
- Managed and upgraded UNIX's server services such as Bind DNS.
- Configuration and administration of Web (Apache), DHCP and FTP Servers in Linux and Solaris servers.
- Supported the backup environments running VERITAS Net Backup 6.5.
- Responsible for setting cron jobs on the servers
- Decommissioning of the old servers and keeping track or decommissioned and new servers using inventory list. Handling problems or requirements as per the ticket (Request Tracker) created. Participated in on-call rotation to provide 24X7 technical supports.