Big Data/hadoop Developer Resume
King Of Prussia, PennsylvaniA
SUMMARY
- 8+ Years of extensive IT experience with 6 years of experience as a Hadoop Developer and 3 years of experience as Linux Administrator and Confidential Big Data Appliance.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like HDFS, MapReduce, Hive, Impala, Sqoop, Pig, Oozie, Zookeeper, Spark, Solr, Hortonworks, Hue, Flume, Accumulo, Storm & Yarn distributions.
- Experienced in design and developing ETL Processes in AWS Glue to extract transform and load data from S3 to Redshift database
- Experience in application development using Python, Pyspark and SQL
- Experience with DevOps tools like Terraform and GIT code versioning tool
- Experience in performing backup and Disaster Recovery of NameNode metadata and important sensitive data residing on cluster.
- Experience in Apache NIFI which is a Hadoop technology and Integrating Apache NIFI and Apache Kafka.
- Experience in administrating Confidential Big Data Appliance to support (CDH) operations.
- Experience in developing MapReduce Programs using Apache Hadoop for analyzing the big data as per the requirement
- Extensive experience in migrating data from legacy systems into the AWS cloud Environment using Confidential EC2, AWS, Confidential EMR
- Experience with cloud: Hadoop - on-Azure, AWS/EMR, C loudera Manager (also direct-Hadoop-EC2 (non EMR))
- Experience in Performance Tuning of Yarn, Spark, and Hive.
- Experience in Building event driven Microservices with Kafka Ecosystem.
- Hands on experience with AWS ( Confidential Web Services), Elastic Map Reduce (EMR), Storage S3, EC2 instances and Data Warehousing.
- Experience in Configuring Apache Solr memory for production system stability and performance.
- Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop and troubleshooting for any issues.
- Strong experience on Hadoop distributions like Cloudera and HortonWorks.
- Experience in Hadoop Administration activities such as installation, configuration, and management of clusters in Cloudera (CDH), Distributions using Cloudera Manager & Ambari. Experience of SQL Server database administration, OLTP application.
- Knowledge of SQL Server performance tuning, backup and recovery methods.
- Working knowledge in python and Scala to use spark. Good Understanding on NameNode HA architecture.
- Experience in designing, developing, and ongoing support of a data warehouse environments.
- Experience in monitoring the health of cluster using Ambari, Nagios, Ganglia and Cron jobs.
- Cluster maintenance and Commissioning /Decommissioning of data nodes.
- Experience in importing and exporting data between HDFS/Hive and Relational Database Management systems using Sqoop
- Experience in Hadoop and Big Data Ecosystem including Hive, HDFS, Spark, Kafka, MapReduce, Sqoop, Oozie and Zookeeper
- Knowledge of writing hive queries to generate reports using Hive Query Language
- Hands on experience with SPARK SQL for complex data transformations using Scala programming language
- Good understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNodes and MapReduce concepts.
- Experience developing and implementing Spark programs in Scala using Hadoop to work with Structured and Semi-structured data.
- Proficient in using SQL, ETL, Data Warehouse solutions and databases in a business environment with large-scale, complex datasets.
- Experience with Configuring Security in Hadoop using Kerberos / NTLM protocol.
- Implemented security controls using Kerberos principals, ACLs, Data encryptions using dm-crypt to protect entire Hadoop clusters.
- Experience in restricting the user data using Sentry.
- Experience in directory services like LDAP & Directory Services Database.
- Expertise in setting up SQL Server security for Active Directory and non-active directory environment using security extensions.
- Assisted development team in identifying the root cause of slow performing jobs / queries.
- Experience in setting cluster in Confidential EC2 & S3 including the automation of setting & extending the clusters in AWS Confidential cloud.
- Expertise in installation, administration, patches, upgrade, configuration, performance tuning and troubleshooting of Red hat Linux, SUSE, CentOS, AIX, Solaris. Perform GitHub operations to manage source code for deployments
- Used in-memory analytics with Apache Spark on Confidential EMR (Elastic Map Reduce).
- Experience Schedule Recurring Hadoop Jobs with Apache Oozie and Control M Tool.
- Experienced working on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), set EMR (Elastic MapReduce). Extracted files from Cassandra and MongoDB through Sqoop and placed in HDFS and processed.
- Experience in Jumpstart, Kickstart, Infrastructure setup and Installation Methods for Linux.
- Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job.
- Hands on practice in Implementing Hadoop security solutions such as LDAP, Sentry, Ranger, and Kerberos for securing Hadoop clusters and Data.
- Good knowledge in troubleshooting skills, understanding of system's capacity, bottlenecks, basics of memory, CPU, OS, storage, and network.
- Expert in implementing advanced procedures and applications using Scala along with Akka and Play Framework and various API’s.
- Experience in administration activities of RDBMS data bases, such as MS SQL Server.
- Experience in Hadoop Distributed File System and Ecosystem (MapReduce, Pig, Hive, Sqoop, YARN and HBase).
- Planned, documented, and supported high availability, data replication, business persistent, fail-over, and fallback Solutions.
- Knowledge of NoSQL databases such as HBase, Cassandra, MongoDB.
- Experience using bug tracking tools like Jira and Bit-Bucket to check in and check out code changes.
- Experienced in using remote desktop connections/RDP, WinSCP, Putty (Unix/ Linux Terminal Emulator Tool) and TeamViewer to connect and make changes in remote systems.
- Exposure to Microsoft Azure in the processing of moving the on-prem data to azure cloud.
- Expertise in CI/CD pipeline deployment using Jenkins.
- Provided 24/7 technical support to Production and development environments.
- Familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
TECHNICAL SKILLS
Hadoop Eco-System Tool: : AWS (S3, EC2, EMR, Lambda, Cloudwatch, RDS) MapReduce, Yarn HDFS, Pig, Hive, HBase, Sqoop, Zookeeper, Oozie, Hue, Nifi, Storm, Kafka, Solr, Spark, Flume.
Databases: : MySQL, Confidential 10g/11g, MangoDB, postgres, HBase, NoSQL.
Platforms: : Linux (RHEL, Ubuntu), Open Solaris, AIX.
Scripting languages: : Shell Scripting, Bash Scripting, HTML scripting, Python.
Web Servers: : Apache Tomcat, Windows Server 2003, 2008, 2012.
Security Tool's: : LDAP, Sentry, Ranger and Kerberos.
Cluster Management Tools: : Cloudera Manager, HDP Ambari, Hue, Unravel, Hadoop, MapReduce, CM 6.3, CDH 5.14, HDFS, Spark2, MapReduce, Yarn, Impala, Pig, Hive, Sqoop, Oozie, Kafka, Flume, Solr, Sentry, Centos 7.4, PostgreSQL, HBase, Kerberos, Scala, Python, Shell Scripting.
Cloud Technologies: : Confidential Web Services, Azure Cloud.
PROFESSIONAL EXPERIENCE
Big Data/Hadoop Developer
Confidential, King of Prussia, Pennsylvania
Responsibilities:
- Worked on administration and management of large-scale Hadoop clusters and nodes and was part of capacity planning team in scaling these clusters for future needs.
- Used Cloudera Manager continuous monitoring and managing of the Hadoop cluster for working application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Developed data pipelines using Sqoop, Pig and Hive to ingest customer member data, clinical, biometrics, lab and claims data into HDFS to perform data analytics.
- Engineered programs in Spark using Scala and Spark SQL for Data processing.
- Installing and Configuring Hadoop ecosystem (HDFS/Spark/Hive/Yarn/Kafka) using EMR manager and securing the cluster with Kerberos and encryption servers (KTS &KMS)
- Worked on connecting the Cassandra database to the Confidential EMR File System for storing the database in S3.
- Responsible for day-to-day activities which include Hadoop support, Cluster maintenance, creation/ removal of nodes, Cluster Monitoring/ Troubleshooting, Manage and review Hadoop log files, Backup restoring and capacity planning.
- Worked on production support and operations and Managed the offshore team.
- Implemented applications with Scala along with Akka and Play framework.
- Created DR for both production clusters and scheduled replication jobs to copy data between the clusters. Made backups and Recovered data using HDFS snapshots.
- Worked in designing and deployment of Hadoop cluster and different BigData analytic tools, including Pig, Hive, Oozie, Zookeeper, Sqoop, Flume, Impala, Cassandra with Horton work distribution.
- Developed Json Scripts for deploying the Pipeline in Azure Data Factory (ADF) that processes the data using the Cosmos Activity.
- Performed several POCs in AWS Glue and Pyspark to parse position-based files, analysing and Designing of ETL processes to transform, validate and load diverse data formats into AWS.
- Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse and moving data from on premises to Azure cloud.
- Upgraded Spark2 to 2.4.0.2 version and set pyspark2 to use python 3.4.
- Configured the HUE and CM SSL on the cluster using s.
- Worked with the vendor Infoworks for Data migration between 2 large-scale Hadoop clusters.
- Involved in setting up the KAFKA MirrorMaker on the production clusters to make sure the same set of streaming data reaching production cluster is replicated to DR cluster.
- Push streaming data from hadoop to Akka collector and to kafka topic and process data using scala scripts in spark.
- Configured Kafka Brokers to receive all the streaming data and then connected it to Flume to send data to Hbase and Hdfs.
- Implemented usage of Confidential EMR for processing Big Data across a Hadoop Cluster of virtual servers on Confidential Elastic Compute Cloud (EC2) and Confidential Simple Storage Service (S3).
- Deployed the project on Confidential EMR with S3 connectivity for setting backup storage.
- Worked on unravel to monitoring the performance of Datawarehouse cluster and install chef on all cluster for automation.
- Creating reports for cluster health status and troubleshoot the issues.
- Worked on NoSQL databases including HBase and MongoDB, configured MySQL Database to store Hive metadata.
- Worked with Informatica team in connecting Informatica Stand Alone Servers with the cluster metadata for visualizing the data and generate reports.
- Define and build the RunnableGraphs needed for implementing the solution using Akka Streams.
- Used Akka framework that enables concurrent processing while loading the data lake.
- Used Storage account and the Data Lake on Azure.
- Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
- Enabled security to the cluster using Kerberos and integrated clusters with LDAP/AD at Enterprise level. Secured the directories with ACL's and Encryption Zones. Add quotas to the directory.
- Restricted the User Access on the data using Sentry by providing adequate read/write permissions.
- Configured the cluster to use PostgreSQL database for storage.
- Primarily involved in Data Migration process using Azure by integrating with GitHub repository and Jenkins.
- Managed datasets using Pandas DataFrames and MySQL, queried MySQL database using Python
- Created and maintained various Shell and Python scripts for automating various processes.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Worked with developers and data analysts in running hive and Impala queries and even configured the resources needed for those jobs.
- Configured various service resource parameters in the cluster to make sure jobs run quicker and doesn't fail in the cluster.
- Worked on Remedy ticketing for workorder, Incidents, Changes, problem investigation, Service Request etc.
Environment: HDFS, Spark2, MapReduce, Yarn, AWS (S3 EC2 EMR Lambda), Azure, Cloudera, Akka Streams, Impala, Pig, Hive, Sqoop, Oozie, Kafka, Flume, Solr, Sentry, Centos, NoSQL, MongoDB, PostgreSQL, HBase, Kerberos, Scala, Python, Shell Scripting.
Big Data/Hadoop Developer
Confidential, San Diego, California
Responsibilities:
- Installing and Working on Hadoop clusters for different teams, supported users to use Hadoop platform and resolve tickets and issues they run into and provide to users to make Hadoop usability simple and updating them for best practices.
- Participated from start of the project from designing and gathering requirements from the business stakeholders and finalizing the stack components and implementation.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Cloudera Manager is installed on Confidential Big Data Appliance to help in (CDH) operations.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Building a real-time data pipeline using Spark Streaming and Kafka.
- Upgraded the Hadoop cluster CDH5.9 to CDH 5.12.
- Developed Scripts in shell/python to automate daily tasks or query data on AWS resources.
- Worked on Installing cluster, Commissioning & Decommissioning of DataNodes, NameNode Recovery, Capacity Planning, and Slots Configuration. Creating collection within Apache Solr and Installing the Solr service through the Cloudera Manager Installation wizard.
- Enabled Sentry and Kerberos to ensure data protection. Configuring and Setting Up an ODBC Application for Impala.
- Working on Confidential Big Data SQL. Integrate big data analysis into existing applications
- Using Confidential Big Data Appliance Hadoop and NoSQL processing and integrating data in Hadoop and NoSQL with data in Confidential Database
- Experience in setting cluster in Confidential EC2 & S3 including the automation of setting & extending the clusters in AWS Confidential cloud.
- Maintains and monitors database security, integrity, and access controls. Provides audit trails to detect potential security violations.
- Monitored cluster for performance, networking, and data integrity issues.
- Responsible for troubleshooting issues in the execution of MapReduce jobs,
- Install OS and administrated Hadoop stack with CDH5.12 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning.
- Scripting Hadoop package installation and configuration to support fully automated deployments.
- Designing, developing, and ongoing support of a data warehouse environments.
- Converting Map Reduce programs into Spark transformations using Spark RDD's and Scala.
- Worked on Hive for further analysis and for generating transforming files from different analytical formats to text files.
- Created Hive External tables and loaded the data into tables and query data using HQL.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Managing the cluster resources by implementing capacity scheduler by creating queues.
- Hands on experience on writing CDC (Change Data Capture) in spark by using Scala Programing for incremental loads/Delta Records
- Created View's in Mongo as per the business logic for New Jersey member summary Market by using hive table.
- Worked on data quality and optimized the queries in spark for the performance issue
- Create and Schedule jobs through CONTROL-M for all application teams End to End flow.
- Worked with Nifi for managing the flow of data from source to HDFS.
- Experience in Apache NIFI which is a Hadoop technology and Integrating Apache NIFI and Apache Kafka.
- Experienced on adding/installation of new components and removal of them through Ambari.
- Monitoring systems and services through Ambari dashboard to make the clusters available for the business.
- Experienced in Ambari-alerts (critical & warning) configuration for various components and managing the alerts.
- Implemented Name Node HA in all environments to provide high availability of clusters.
- Experienced in managing and reviewing log files. Managed and reviewed Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster.
- Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
- Establishing Connection with ODBC connection to SQL Server.
- Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of databases.
- Developed Map Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Worked with Infrastructure teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Backed up data on regular basis to a remote cluster using distcp.
- Cluster coordination services through Zookeeper. Loaded the dataset into Hive for ETL Operation.
- Worked on Oozie workflow engine for job scheduling. Implemented security (Kerberos) for various Hadoop clusters
- Helped implement monitoring and alerting for multiple big data clusters
- 24/7 On-call rotation, and helped troubleshoot big data issues
- Performed additional tasks outside of Hadoop, such as supporting other Linux infrastructure.
Environment: Cloudera Manager, AWS (S3, EC2, Lambda) Confidential, SQL Server, Hue. MapReduce, HDFS, Sqoop, Hive, Spark, Scala, Oozie, Flume, HBase, Control-M, Hadoop, Mongo DB, Bamboo, Dockers, Shell Scripting, Red hat, Kerberos
Hadoop Developer/Administrator
Confidential
Responsibilities:
- Installing and Working on Hadoop clusters for different teams, supported 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide to users to make Hadoop usability simple and updating them for best practices.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Cloudera Manager is installed on Confidential Big Data Appliance to help in (CDH) operations.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Worked on Installing cluster, Commissioning & Decommissioning of DataNodes, NameNode Recovery, Capacity Planning, and Slots Configuration.
- Worked on the Spark SQL and Spark Streaming modules of Spark and used Scala and Python to write code for all Spark use cases.
- Creating collection within Apache Sol and Installing the Solar service through the Cloudera Manager Installation wizard.
- Enabled Sentry and Kerberos to ensure data protection
- Working on Confidential Big Data SQL. Integrate big data analysis into existing applications
- Using Confidential Big Data Appliance Hadoop and NoSQL processing and also integrating data in Hadoop and NoSQL with data in Confidential Database
- Maintains and monitors database security, integrity, and access controls. Provides audit trails to detect potential security violations.
- Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos Principal for the Cloudera Manager Server, Enabling Kerberos Using the Wizard.
- Monitored cluster for performance, networking, and data integrity issues.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Install OS and administrated Hadoop stack with CDH5.9 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning.
- Scripting Hadoop package installation and configuration to support fully automated deployments.
- Designing, developing, and ongoing support of a data warehouse environments.
- Deployed the Hadoop cluster using Kerberos to provide secure access to the cluster.
- Converting Map Reduce programs into Spark transformations using Spark RDD's and Scala.
- Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters
- Worked on Hive for further analysis and for generating transforming files from different analytical formats to text files.
- Created Hive External tables and loaded the data into tables and query data using HQL.
- Developed the Linux shell scripts for creating the reports from Hive data.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
- Imported data to Hadoop ecosystem from different databases using different big data analytic tools
- Developed the Sqoop scripts to make the interaction between Pig & Hive and MySQL Database
- Performed transformations, cleansing and filtering on large datasets such as Structured data from Confidential databases, Semi Structured data Involved in managing and reviewing Hadoop log files
- Involved in loading data from LINUX file system to HDFS.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data
- Experience in creating Hive tables to store the processed results in a tabular format, optimizing Hive tables using optimization techniques like partitions and bucketing to provide better performance with HiveQL queries and creating custom user defined functions in Hive.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume
- Exported the business required information to RDBMS using Sqoop to make the data available for BI teams and generated reports based on data using Tableau. Installed Oozie workflow engine to run multiple Hive and Pig jobs.
Environment: MapReduce, Hive, PIG, Sqoop, Spark, Oozie, Flume, HBase, Cloudera Manager, Sentry, Confidential Server X6, SQL Server, Solr, Zookeeper, Cloudera, Kerberos and RedHat Linux.
Linux System Administrator
Confidential, Houston,TX
Responsibilities:
- Installation, configuration and administration of Red Hat Linux servers and support for Servers and regular upgrades of Red Hat Linux Servers using kick start-based network installation.
- Provided 24x7 System Administration support for Red Hat Linux 3.x, 4.x servers and resolved trouble tickets on shift rotation basis.
- Configured HP ProLiant, Dell Power edge, R series, and Cisco UCS and Confidential p-series machines, for production, staging and test environments.
- Creating, cloning Linux Virtual Machines, templates using VMware Virtual Client 3.5 and migrating servers between ESX hosts.
- Configured Linux native device mappers (MPIO), EMC power path for RHEL 5.5, 5.6, 5.7.
- Performance monitoring utilities like IOSTAT, VMSTAT, TOP, NETSTAT and SAR.
- Worked on Support for Aix matrix sub system device drivers.
- Worked on with the computing by both physical and virtual from the desktop to the data center using the SUSE Linux. Expertise in Build, Install, load and configure boxes.
- Worked with the team members to create, execute, and implement the plans.
- Experience in Installation, Configuration, and Troubleshooting of Tivoli Storage Manager.
- Remediating failed backups, take manual incremental backups of failing servers.
- Upgrading TSM from 5.1.x to 5.3.x. Worked on HMC Configuration and management of HMC Console which included up gradation, micro partitioning.
- Installation of adapter cards cables and configuring them.
- Worked on Integrated Virtual Ethernet and building up of VIO servers.
- Install SSH Keys for Successful login of SRM data into the server without prompting password for daily backup of vital data such as processor utilization, disk utilization, etc.
- Provide redundancy with HBA card, Ether channel configuration and network devices.
- Coordinating with application and database team for troubleshooting the application.
- Coordinating with SAN team for allocation of LUN's to increase file system space.
- Configuration and administration of Fiber Card Adapter's and handling AIX part of SAN.
Environment: Red Hat Linux (RHEL 3/4/5), Solaris, Logical Volume Manager, Sun & Veritas Cluster Server, VMWare, Global File System, Red hat Cluster Servers