Hadoop Developer Resume
SUMMARY
- Experienced professional and certified Hadoop developer and Computer and Information Systems graduate with background in designing, Installing, building, administrating large scale Hadoop production Clusters, software development, project management, Linux administration specializing in database design and development.
- Hands - on experience in Python scripting, Python STL, Pylons, Django, UI and web development
- Experience in handling, installing, configuring Hadoop Ecosystem and architecture (HDFS, Spark, MapReduce, YARN, Pig, Hive, HBase, Zookeeper, Hue, JSON Sqoop, Flume, Oozie).
- Analyzing log files for Hadoop and eco system services and finding root cause.
- Knowledge on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
- Multitasker and experienced in application configurations, code compilation, packaging, building, automating, managing and releasing code from one environment to another environment.
- Handled VMWare (vSphere), virtualized environments and/or cloud providers such as Microsoft Azure, AWS, etc.
- Good experience on configuring, managing the backup, designing, building, deployment, security, system hardening, securing services and disaster recovery for Hadoop data.
- Well versed in using software development methodologies like Water Fall, Agile (SCRUM), and Test-Driven Development and Service orientation architecture.
- Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing.
- Experience in configuring Zookeeper to provide Cluster coordination services.
- Worked with Sqoop in Importing and exporting data from different databases like MySQL, Oracle into HDFS and Hive.
- Preprocessing using Hive and Pig.
- Experience in deploying and managing the multi-node development, testing and production.
- Good knowledge of IT Service Management and related tools, exceptional organizational skills and attention to detail.
- Experience in Installing Firmware Upgrades, kernel patches, systems configuration, performance tuning on Unix/Linux systems.
- Experience in all phases of Software Testing Life Cycle (STLC), Software Development Life Cycle (SDLC), Bug life Cycle and Methods like Waterfall and Agile.
- Knowledge of IT processes.
- Strong troubleshooting and debugging skills and ability to handle multiple tasks and can work independently as well as in a team.
- Motivated problem solver.
- Professionally published.
TECHNICAL SKILLS
Operating systems: Linux (Ubuntu, Fedora, Kali, Centos), OSX, WIN, Sun Solaris
Big Data: Hadoop, Hive, Pig, Flume, Storm, Splunk, Zookeeper, MapReduce, YARN, Hbase, Oozie, Sqoop, Kafka, Spark.
Languages: C, Java, C#.NET, PL/SQL, Python, R, TSQL, ANSI SQL, MDX, C#, and XML.
Tools: JIRA, PuTTy, WinSCP, FileZilla.
Software: VMware, MS Office, Project, Visio
Database: Oracle, SQL Server, MySQL
Protocols: TCP/IP, UDP, FTP, SNMP, LDAP and LDAPS(TLS/SSL)
ERP: PeopleSoft, People Tools, Application Design, Security maintenance, Report manager.
Methodologies: Agile, UML, Design Patterns
Reporting tools: Tableau, BI Publisher, PS-Query, SQR, MS-Excel, SSIS, SSRS, SSAS
Bioinformatics DB: NCBI, SWISSPROT, and RCSB for protein and nucleotide mapping using FASTA sequences.
PROFESSIONAL EXPERIENCE
Hadoop Developer
Confidential
Responsibilities:
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Provided Hadoop, OS, Hardware optimizations.
- Facilitated to read data from file system into a Spark RDD
- Chaired an installation, configuration and deployment of product software on new edge nodes that connect and contact Hadoop cluster for data acquisition.
- Hortonworks Ambari, Apache Hadoop on Redhat, and Centos as data storage, retrieval, and processing systems.
- Setting up Kerberos principals in KDC server and testing HDFS, Hive, Pig and MapReduce access for the new users and creating key tabs for service ID's using keytab scripts.
- Assessed business rules, collaborated with stakeholders and performed a source-to-target data mapping, design and review.
- Experienced in implementing Spark RDD transformations, actions to implement business analysis.
- Configure existing interpreter instances using Zeppelin notebook.
- Created a DataFrame from a hive table using SQLContext from a SparkContext.
- Constructed a HiveContext to work with the data stored in Hive from Spark applications.
- Accessed tables in the Hive Metastore using HiveContext and wrote queries in HiveQL.
- Configured Oozie for workflow automation and coordination.
- Setting up MapR metrics with NoSQL database to log metrics data.
- Worked on NoSQL databases including HBase.
- Involved in designing and implementation of a toolset to simplify provisioning and support of a large cluster environment.
- Hands on experience in Spark architecture and its integrations like Spark SQL, Data Frames and datasets API’s.
- Imported SparkSessions, Rows and Functions from Pyspark.Sql using Spark 2.0
- Created SparkSessions
- Made POC with Real Time Streaming using Kafka into HDFS
- Worked with Sqoop in Importing and exporting data from different databases like MySQL, Oracle into HDFS and Hive.
- Worked on setting up high availability for major production cluster and designed automatic failover control using Zookeeper and quorum journal nodes.
- Configured ZooKeeper to implement node coordination, in clustering support.
- Completed MapReduce jobs using python scripts on CentOS.
- Worked on HIVE view to write SQL scripts.
- Monitored cluster health using Teradata viewpoint
- Overhauled performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Diagnosed ETL workflows by optimizing.
- Responsible for managing the data coming from different sources.
- Administrator for Pig, Hive and HBase installing updates, patches and upgrades, involved in creating Hive tables, and loading and analyzing data using Hive queries.
- Formulated minor and major upgradation.
- Incorporated with the team to generate scripts using HiveQL and Impala.
- Involved in configuring Node-Labels for YARN to isolate resources at a node level - separating the nodes specific for YARN applications and HBASE separately
- Automated workflows using shell scripts pull data from various databases into Hadoop
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Experience in the different Source / Target Systems like Flat files, Multi file systems (MFS), Multiple files (Batch of files), Ab initio queues (Continuous flow Graphs), and Relational Database systems (RDBMS) - Oracle.
- Extensively worked in creating Data marts using Ab Initio GDE 3.2.3/ CO>OP 3.2.2.3, ETL Functions and related OLAP tools.
- Involved in reviewing performance stats and query execution/explain plans, and recommended changes for tuning the Hive/Impala queries
Environment: Hadoop, MapReduce, Hive, PIG, Sqoop, Spark, Teradata, Hortonworks, Oozie, Flume, HBase, Ab Initio, Nagios, Ganglia, Hue, Cloudera Manager, Zookeeper, Cloudera, Oracle, Kerberos and RedHat 7.4, Zeppelin
Linux Administrator
Confidential
Responsibilities:
- Managed installation, configuration, upgrades, and patch systems running OS such as Red Hat, Fedora, CentOS, and Oracle Solaris.
- Involved in maintaining a large server and network infrastructure of Solaris, Linux, and Windows environment.
- Administered Red Hat Enterprise Linux 6.x and 7.x servers running on physical hardware and in virtualized environments using VMware ESXi 5.x
- Handled Linux-based Virtualization implementations such as VM Ware and Xen on Red Hat.
- Involved in developing shell scripts used to start and stop custom Java applications in Linux environment
- Involved in resolving issues in the continuous integration environment raised by the development teams and tracking them to closure.
- Involved in developing shell scripts used to monitor production applications
- Maintained configuration and security of the UNIX/LINUX operations systems with the enterprise's computing environment.
- Troubleshoot of technical issues related to tier 3 Storage and Quantum tape libraries, reported, and logged all media and drive errors. Worked with vendors to resolve hardware and software issues.
- Planned and tested for disaster recovery and business continuity activities.
- Created and managed VMs (virtual server) and involved in the maintenance of the virtual server.
- Investigated, monitored, and coordinated resolution of integration touchpoints and technical issues.
- Performed administrative tasks such as system startup/shutdown, backups, Printing, Documentation, User Management, Security, Network management, configuration of dumb terminals.
- Installed, Configured and Troubleshooted Tivoli Storage Manager.
- Advocated well with customers of varying levels of technical expertise in high-pressure situations and complex environments.
Environment: Red Hat Linux (RHEL 6/7), Solaris 10, VMWare, Global File System, Red hat Cluster Servers
System Administrator
Confidential
Responsibilities:
- Create and manage users in Active directory, Exchange and Office 365.
- Document all pertinent end user identification, including name, department, and contact information.
- Create, setup and manage email accounts for new staff members with Microsoft Outlook.
- Configuring, allocating, expanding, pool move of SAN.
- Performed upgrades and Full Disk Encryptions on PC's.
- Prepare /provide analysis incident reports and their impact to the operational management
- Installation, configuration & maintenance of different printers in the office environment.
- Diagnosing and Troubleshooting the problems in PC, Networks and maintaining the network in optimized manner
- Basic trouble shooting for network connectivity.
- Recruiting and Training New employees to leadership Positions and relate and translate complex data to all level of management for optimum understanding
- Build rapport and elicit problem details from help desk customers.
