Hadoop Admin/developer Resume
Dallas, TX
SUMMARY:
- Over 8+ years of IT experience as a Developer, Designer & quality Tester with cross platform integration experience using Hadoop Ecosystem, Java and Software Functional Testing.
- Expertise in writing Hadoop Jobs for analyzing structured and unstructured data using HDFS, Hive, HBase, Pig, Spark, Kafka, Scala, Oozie and Talend ETL.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and MapReduce concepts.
- Experience in working with different kind of MapReduce programs using Hadoop for working with Big Data analysis.
- Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java.
- Experience in importing/exporting data using Sqoop into HDFS from Relational Database Systems and vice - versa.
- Extensive knowledge and experience on real time data streaming techniques like Kafka, Storm and Spark Streaming.
- Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including PIG, HIVE, Sqoop, Oozie, Flume and zookeeper.
- Experience in data analytics tools Tableau, BI360 etc.
- Programming experience with Python, Java, Scala, Bash.
- Good Knowledge in providing support to data analyst in running Pig and Hive queries.
- Experience in writing shell scripts to dump the shared data from MySQL servers to HDFS.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Knowledge in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
- Experience in working with various Cloudera distributions (CDH4/CDH5), Hortonworks and Amazon EMR Hadoop Distributions.
- Extensively worked on Hive and Sqoop for sourcing and transformations.
- Experience in automating the Hadoop Installation, configuration and maintaining the cluster by using the tools like Puppet.
- Hands on experience on AWS and Azure platform.
- Hands on experience knowledge in NoSQL databases like HBase, Cassandra, Mongo db.
- Experience in working with flume to load the log data from multiple sources directly into HDFS.
- Strong debugging and problem-solving skills with excellent understanding of system development methodologies, techniques and tools.
- Wrote Flume configuration files for importing streaming log data into HBase with Flume.
- Processing this data using Spark Streaming API with Scala.
- Worked in complete Software Development Life Cycle (analysis, design, development, testing, implementation and support) in different application domain involving different technologies varying from object-oriented technology to Internet programming on Windows NT, Linux and UNIX/ Solaris platforms and RUP methodologies.
- Involved in writing shell scripts, Ant scripts for Unix OS for application deployments to production region.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Hands on experience in developing the applications with Java, J2EE, J2EE - Servlets, JSP, EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD, XSLT, PL/SQL, Oracle10g and MS-SQL Server RDBMS.
- Having very good POC and Development experience on Apache Flume, Kafka, Spark, Storm, and Scala.
- Good understanding in using data ingestion tools- such as Kafka, Sqoop and Flume.
- Good working knowledge on Hadoop hue ecosystems.
- Good knowledge in evaluating big data analytics libraries and use of Spark-SQL for data exploratory.
- Exceptional ability to quickly master new concepts and capable of working in-group as well as independently with excellent communication skills.
- Benchmarking Hadoop clusters to validate the hardware before and after installation to tweak the configurations to obtain better performance.
- Strong experience in analyzing large amounts of data sets writing PySpark scripts and Hive queries.
- Extensive experience in working with structured data using Hive QL, join operations, writing custom UDF's and experienced in optimizing Hive Queries.
- Extensive experiences in working with semi/unstructured data by implementing complex map reduce programs using design patterns.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database.
- Experience in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, telnet sources etc.
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop MapReduce, Hive, Spark jobs.
- Involved in moving all log files generated from various sources to HDFS and Spark for further processing.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster.
- Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
PROFESSIONAL EXPERIENCE:
Hadoop Admin/Developer
Confidential - Dallas, TX
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Hadoop installation, Configuration of multiple nodes using Clouder platform.
- Installed and configured a Hortonworks HDP 2.2 using Ambari and manually through command line. Cluster maintenance as well as creation and removal of nodes using tools like Ambari, Cloudera Manager Enterprise and other tools.
- Handling the installation and configuration of a Hadoop cluster.
- Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Involved in developer activities of installation and configuring Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberos) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data ( Data Encryption at Rest)
- Handling the data exchange between HDFS and different web sources using Flume and Sqoop.
- Monitoring the data streaming between web sources and HDFS and functioning through monitoring tools.
- Close monitoring and analysis of the MapReduce job executions on cluster at task level.
- Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
- Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning Scripting Hadoop package installation and configuration to support fully-automated deployments.
- Day-to-day operational support of our Cloudera Hadoop clusters in lab and production, at multi-petabyte scale.
- Changes to the configuration properties of the cluster based on volume of the data being processed and performed by the cluster.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with Spark installed and configured.
- Setting up automated processes to analyze the system and Hadoop log files for predefined errors and send alerts to appropriate groups and an Excellent working knowledge on SQL with databases.
- Commissioning and De-commissioning of data nodes from cluster in case of problems.
- Setting up automated processes to archive/clean the unwanted data on the cluster, in particular on Name Node and Secondary Name node.
- Set up and managing HA Name Node to avoid single point of failures in large clusters.
- Discussions with other technical teams on regular basis regarding upgrades, process changes, any special processing and feedback.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Administering and Maintaining Cloudera Hadoop Clusters Provision physical Linux systems, patch, and maintain them.
Environment: Hadoop, Confluent Kafka, Hortonworks HDF, HDP, NIFI, Linux, Splunk, Yarn, Clouder 5.13, Spark, Tableau.
Hadoop Architect/Developer
Confidential - Chicago, IL
Responsibilities:
- Collaborate in identifying the current problems, constraints and root causes with data sets to identify the descriptive and predictive solution with support of the Hadoop HDFS, MapReduce, Pig, Hive, and Hbase and further to develop reports in Tableau.
- Architect the Hadoop cluster in Pseudo distributed Mode working with Zookeeper and Apache and storing and loading the data from HDFS to AmazonAWSS3 and backing up and Created tables in AWS cluster with S3 storage.
- Evaluated existing infrastructure, systems, and technologies and provided gap analysis, and documented requirements, evaluation, and recommendations of system, upgrades, technologies and created proposed architecture and specifications along with recommendations
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Installed and Configured Sqoop to import and export the data into MapR-FS, HBase and Hive from Relational databases.
- Administering large MapR Hadoop environments build and support cluster set up, performance tuning and monitoring in an enterprise environment.
- Installed and Configured MapR-zookeeper, MapR-cldb, MapP-jobtracker, MapR-tasktracker, MapRresourcemanager, MapR-node manager, MapR-fileserver, and MapR-webserver.
- Installed and configured Knox gateway to secure HIVE through ODBC, WebHcat and Oozie services.
- Load data from relational databases into MapR-FS filesystem and HBase using Sqoop and setting up MapR metrics with NoSQL database to log metrics data.
- Close monitoring and analysis of the MapReduce job executions on cluster at task level and optimized Hadoop clusters components to achieve high performance.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data into HDFS for analysis.
- Integrated HDP clusters with Active Directory and enabled Kerberos for Authentication.
- Worked on commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning and installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Worked on creating the Data Model for HBase from the current Oracle Data model.
- Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services.
- Leveraged Chef to manage and maintain builds in various environments and planned for hardware and software installation on production cluster and communicated with multiple teams to get it done.
- Monitoring the Hadoop cluster functioning through MCS and worked on NoSQL databases including HBase.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs and worked with Linux server admin team in administering the server hardware and operating system.
- Worked closely with data analysts to construct creative solutions for their analysis tasks and managed and reviewed Hadoop and HBase log files.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports and worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
- Automated workflows using shell scripts pull data from various databases into Hadoop.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment:, Hadoop, HDFS, Map Reduce, Hive, HBase, Kafka, Zookeeper, Oozie, Impala, Java, Cloudera, Oracle, Teradata SQL Server, Python, UNIX Shell Scripting, ETL, Flume, Scala, Spark, Sqoop, Python, AWS, S3, EC2, Kafka, Oracle, MySQL, Hortonworks, YARN, Python
Hadoop Admin/Developer
Confidential - Kenilworth, NJ
Responsibilities:
- Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
- Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
- Performed S3 buckets creation, policies and also on the IAM role based polices and customizing the JSON template.
- Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
- Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management.
- Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
- Worked in AWS environment for development and deployment of Custom Hadoop Applications.
- Worked closely with the data modelers to model the new incoming data sets.
- Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution.
- Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Performed real time analysis on the incoming data.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.
Hadoop Administrator
Confidential - Chicago, IL
Responsibilities:
- Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure
- Cluster maintenance as well as creation and removal of nodes.
- Evaluation of Hadoop infrastructure requirements and design/deploy solutions (high availability, big data clusters.
- Cluster Monitoring and Troubleshooting Hadoop issues
- Manage and review Hadoop log files
- Works with application teams to install operating system and Hadoop updates, patches, version upgrades as required
- Created NRF documents which explains the flow of the architecture, which measure the performance, security, memory usage, dependency.
- Setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
- Help maintain and troubleshoot UNIX and Linux environment.
- Experience analyzing and evaluating system security threats and safeguards.
- Experience in Importing and exporting data into HDFS and Hive using Sqoop.
- Developed Pig program for loading and filtering the streaming data into HDFS using Flume.
- Experienced in handling data from different data sets, join them and preprocess using Pig join operations.
- Developed Map-Reduce programs to clean and aggregate the data
- Developed HBase data model on top of HDFS data to perform real time analytics using Java API.
- Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
- Imported and exported data from Teradata to HDFS and vice-versa.
- Strong understanding of Hadoop eco system such as HDFS, MapReduce, HBase, Zookeeper, Pig, Hadoop streaming, Sqoop, Oozie and Hive
- Implement counters on HBase data to count total records on different tables.
- Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
- We used Amazon Web Services to perform big data analytics.
- Implemented Secondary sorting to sort reducer output globally in map reduce.
- Implemented data pipeline by chaining multiple mappers by using Chained Mapper.
- Created Hive Dynamic partitions to load time series data
- Experienced in handling different types of joins in Hive like Map joins, bucker map joins, sorted bucket map joins.
- Created tables, partitions, buckets and perform analytics using Hive ad-hoc queries.
- Experienced import/export data into HDFS/Hive from relational data base and Tera data using Sqoop.
- Handling continuous streaming data comes from different sources using flume and set destination as HDFS.
- Integrated spring schedulers with Oozie client as beans to handle cron jobs.
- Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters
- Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Worked on spring framework for multi-threading.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, RDBMS/DB, Flat files, Teradata, MySQL, CSV, Avro data files. JAVA, J2EE.
Confidential - New York, NY
SDET
Responsibilities:
- Involved in almost all the phases of SDLC.
- Executed test cases manually and logged defects using Clear Quest
- Automated the functionality and interface testing of the application using Quick Test Professional (QTP)
- Design, Develop and maintain automation framework (Hybrid Framework).
- Analyze the requirements and prepare automation scripts scenario
- Develop test data for Regression testing using QTP
- Wrote Test cases on IBM rational Manual Tester
- Conducted Cross Browser testing on Different Platform
- Client Application Testing, Web based Application Performance, Stress, Volume and Load testing of the system using Load Runner 9.5.
- Analyzed performance of the application program itself under various test loads of many simultaneous Users.
- Analyzed the impact on server performance CPU usage, server memory usage for the applications of varied numbers of multiple, simultaneous users.
- Inserted Transactions and Rendezvous points into Web Users
- Created User Scripts using VuGen and used Controller to generate and executed Load Runner Scenarios
- Complete involvement in Requirement Analysis and documentation on Requirement Specification.
- Prepared use-case diagrams, class diagrams and sequence diagrams as part of requirement specification documentation.
- Involved in design of the core implementation logic using MVC architecture.
- Used Apache Maven to build and configure the application.
- Developed JAX-WS web services to provide services to the other systems.
- Developed JAX-WS client to utilize few of the services provided by the other systems.
- Involved in developing EJB 3.0 Stateless Session beans for business tier to expose business to services component as well as web tier.
- Implemented Hibernate at DAO layer by configuring hibernate configuration file for different databases.
- Developed business services to utilize Hibernate service classes that connect to the database and perform the required action.
- Developed JavaScript validations to validate form fields.
- Performed unit testing for the developed code using JUnit.
- Developed design documents for the code developed.
- Used SVN repository for version control of the developed code.
Environment: SQL, Oracle 10g, Apache Tomcat, HP Load Runner, IBM Rational Robot, Clear quest, Java, J2EE, HTML, DHTML, XML, JavaScript, Eclipse, WebLogic, PL/SQL and Oracle.
