We provide IT Staff Augmentation Services!

Hadoop Admin Resume

3.00/5 (Submit Your Rating)

New York, NY

PROFESSIONAL SUMMARY:

  • Over 6 years of IT experience as a Developer, Designer and QA Tester with cross - platform integration experience using Hadoop Ecosystem, Java and functional test automation
  • Hands on experience in installing, configuring and using Hadoop Ecosystem - HDFS, MapReduce, Pig, Hive, Oozie, Flume, HBase, Spark, Sqoop, Flume and Oozie
  • Strong understanding of various Hadoop services, MapReduce and YARN architecture
  • Experienced in importing-exporting data into HDFS using SQOOP
  • Experienced in loading data to Hive partitions and creating buckets in Hive
  • Developed MapReduce jobs to automate transfer the data from HBase
  • Expertise in analysis using PIG and HIVE and wrote MapReduce programs
  • Experienced in developing UDFs for Hive, PIG using Java
  • Strong understanding of NoSQL databases like HBase, MongoDB and Cassandra
  • Scheduling all Hadoop / Hive / Sqoop/ HBase jobs using Oozie
  • Experienced in setting cluster in Amazon EC2 and S3 including automated setting and extension the clusters in AWS cloud
  • Good understanding of scrum methodologies, test-driven development (TDD) with continuous integration and continuous delivery (CI-CD)
  • Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused and adaptive learner with excellent interpersonal, technical and communication skills
  • Experienced in defining detailed application software test plans, including organization, participant, schedule, test and application coverage scope
  • Experienced in gathering and defining functional and UI requirements for software applications
  • Experienced in real-time analytics with Apache Spark RDD, Data Frames and Streaming API
  • Used Spark Data Frame API over Cloudera platform to perform analytics on Hive data
  • Experienced in integrating Hadoop with Kafka and expertise in uploading Clickstream data from Kafka to HDFS
  • Expertise in utilizing Kafka for messaging and publishing subscribe messaging system

PROFESSIONAL EXPERIENCE:

HADOOP ADMIN

Confidential, NEW YORK, NY

Responsibilities:

  • Worked on developing architecture document and proper guidelines
  • Worked on installing Kafka on Virtual Machine.
  • Created topic for different users
  • Installed Zookeepers, brokers, schema registry, control Center on multiple machine.
  • Setup ACL/SSL security for different users and assign users to multiple topics
  • Develop security for users and they can connect with SSL security
  • Assign access to users by multiple user’s login.
  • Created documentation, processes, server diagrams, prepared server requisition documents and upload them in SharePoint
  • Used Puppet for automation of deployment to the server
  • Monitor errors, warning on the server using Splunk.
  • Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
  • Created POC on AWS based on the service required by the project
  • Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform, NIFI
  • Set up Hortonworks Infrastructure from configuring clusters to Node
  • Installed Ambari server on the clouds
  • Setup security using Kerberos and AD on Hortonworks clusters
  • Managing the configuration of the cluster to the meet the needs of analysis whether I/O-bound or CPU bound
  • Worked on setting up high availability for major production cluster. Performed Hadoop version updates using automation tools.
  • Working on setting up 100 node production cluster and a 40 nodes backup cluster at two different data centers
  • Automated the setup of Hadoop Clusters and creation of Nodes
  • Monitor the improvement of CPU utilization and maintain it.
  • Performance tuning and manage growth of the O/S, disk usage, and network traffic
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from LINUX file system to HDFS.
  • Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products
  • Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
  • Worked on installing cluster, commissioning & decommissioning of Data Node, Name Node recovery, capacity planning, and slots configuration
  • Implemented test scripts to support test driven development and continuous integration.
  • Optimization and Tuning the application
  • Created User Guide Development and Training overviews for supporting teams
  • Provide troubleshooting and best practices methodology for development teams. This includes process automation and new application onboarding
  • Design monitoring solutions and baseline statistics reporting to support the implementation
  • Experience with designing and building solutions for data ingestion both real time & batch using Sqoop/PIG/Impala/Kafka.
  • Extremely good knowledge and experience with Map Reduce, Spark Streaming, SparkSQL for data processing and reporting.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Used Apache Kafka for importing real time network log data into HDFS.
  • Developed business specific Custom UDF's in Hive, Pig.
  • Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Optimized Map Reduce code by writing Pig Latin scripts.
  • Import data from external table into HIVE by using load command
  • Created table in hive and use static, dynamic partition for data slicing mechanism
  • Working experience with monitoring cluster, identifying risks, establishing good practices to be followed in shared environment
  • Good understanding on cluster configurations and resource management using YARN
  • Created Natural Language Processing model to classify customer attitude by their reviews using r and SPSS and created dashboards, reducing effort needed to manually review documents
  • Utilized machine learning technique
  • Built Factor Analysis and Cluster Analysis models using Python to classify products into different target groups and to identify newly effective features
  • Performed partitional clustering by k-means clustering using Scikit package where similar customers are grouped together

Environment: Hadoop, Confluent Kafka, Hortonworks HDF, HDP, NIFI, Linux, Splunk, Java, Puppet, Apache Yarn, Pig, Spark, Tableau, Machine Learning

HADOOP DEVELOPER/ARCHITECT

Confidential, New York, NY

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
  • Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform
  • Set up Hortonworks Infrastructure from configuring clusters to Node
  • Installed Ambari server on the clouds
  • Setup security using Kerberos and AD on Hortonworks clusters/Cloudera CDH
  • Assign access to users by multiple users’ login.
  • Installed and configured CDH cluster, using Cloudera manager for easy management of existing Hadoop cluster.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Extensively using Cloudera manager for managing multiple clusters with petabytes of data.
  • Having knowledge on documenting processes, server diagrams, preparing server requisition documents
  • Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
  • Managing the configuration of the cluster to the meet the needs of analysis whether I/O bound or CPU bound
  • Worked on setting up high availability for major production cluster. Performed Hadoop version updates using automation tools.
  • Working on setting up 100 node production cluster and a 40 node backup cluster at two different data centers
  • Performance tune and manage growth of the O/S, disk usage, and network traffic
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from LINUX file system to HDFS.
  • Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products
  • Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
  • Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on tuning the performance of MapReduce Jobs.
  • Responsible to manage data coming from different sources.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Experience in managing and reviewing Hadoop log files.
  • Job management using Fair scheduler.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Using PIG predefined functions to convert the fixed width file to delimited file.
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig job
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from Python using Python-MySQL connector MySQL dB package to retrieve information.
  • Developed various algorithms for generating several data patterns. Used JIRA for bug tracking and issue tracking.
  • Developed Python/Django application for Analytics aggregation and reporting.
  • Used Django configuration to manage URLs and application parameters.
  • Generated Python Django Forms to record data of online users
  • Used Python and Django creating graphics, XML processing, data exchange and business logic
  • Created Oozie workflows to run multiple MR, Hive and pig jobs.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Develop Spark code using Scala and Spark-SQL for faster testing and data processing
  • Involved in the development of Spark Streaming application for one of the data source using Scala, Spark by applying the transformations.
  • Import the data from different sources like HDFS/MYSQL into SparkRDD.
  • Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
  • Developed applications of Machine Learning, Statistical Analysis and Data Visualizations with challenging data Processing problems in sustainability and biomedical domain.
  • Compiled data from various sources public and private databases to perform complex analysis and data manipulation for actionable results.
  • Gathers, analyzes, documents and translates application requirements into data models and Supports standardization of documentation and the adoption of standards and practices related to data and applications.
  • Developed and implemented predictive models using Natural Language Processing Techniques and machine learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, Random Forests, K-means clustering, KNN, PCA and regularization for data analysis.
  • Used clustering and K-nn algorithms to categorize business expenses, improving budget.
  • Closely monitored the operating and financial results against plans and budgets.
  • Used R to develop regression modeling for data analysis.
  • Increased pace & confidence of learning algorithm by combining statistical methods; provided expertise and assistance in integrating advanced analytics into ongoing business processes.
  • Parsed data, producing concise conclusions from raw data in a clean, well-structured and easily maintainable format.
  • Interpreted complex simulation data using statistical methods.

Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python, Machine Learning, NLP(Natural Language Processing)

HADOOP DEVELOPER

Confidential, NEW YORK, NY

Responsibilities:

  • Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
  • Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
  • Performed S3 buckets creation, policies and also on the IAM role based polices and customizing the JSON template.
  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
  • Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management.
  • Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Worked closely with the data modelers to model the new incoming data sets.
  • Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution.
  • Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
  • Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Performed real time analysis on the incoming data.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.

HADOOP DEVELOPER

Confidential, New York, NY

Responsibilities:

  • Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure
  • Cluster maintenance as well as creation and removal of nodes.
  • Evaluation of Hadoop infrastructure requirements and design/deploy solutions (high availability, big data clusters.
  • Cluster Monitoring and Troubleshooting Hadoop issues
  • Manage and review Hadoop log files
  • Works with application teams to install operating system and Hadoop updates, patches, version upgrades as required
  • Created NRF documents which explains the flow of the architecture, which measure the performance, security, memory usage, dependency.
  • Setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
  • Help maintain and troubleshoot UNIX and Linux environment.
  • Experience analyzing and evaluating system security threats and safeguards.
  • Experience in Importing and exporting data into HDFS and Hive using Sqoop.
  • Developed Pig program for loading and filtering the streaming data into HDFS using Flume.
  • Experienced in handling data from different data sets, join them and preprocess using Pig join operations.
  • Developed Map-Reduce programs to clean and aggregate the data
  • Developed HBase data model on top of HDFS data to perform real time analytics using Java API.
  • Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
  • Imported and exported data from Teradata to HDFS and vice-versa.
  • Strong understanding of Hadoop eco system such as HDFS, MapReduce, HBase, Zookeeper, Pig, Hadoop streaming, Sqoop, Oozie and Hive
  • Implement counters on HBase data to count total records on different tables.
  • Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
  • Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
  • We used Amazon Web Services to perform big data analytics.
  • Implemented Secondary sorting to sort reducer output globally in map reduce.
  • Implemented data pipeline by chaining multiple mappers by using Chained Mapper.
  • Created Hive Dynamic partitions to load time series data
  • Experienced in handling different types of joins in Hive like Map joins, bucker map joins, sorted bucket map joins.
  • Created tables, partitions, buckets and perform analytics using Hive ad-hoc queries.
  • Experienced import/export data into HDFS/Hive from relational data base and Tera data using Sqoop.
  • Handling continuous streaming data comes from different sources using flume and set destination as HDFS.
  • Integrated spring schedulers with Oozie client as beans to handle cron jobs.
  • Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters
  • Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
  • Worked on spring framework for multi-threading.

Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, RDBMS/DB, Flat files, Teradata, Mysql, CSV, Avro data files. JAVA, J2EE.

TECHNICAL SKILLS:

Hadoop / Big Data: Hadoop MapReduce, HDFS, Zookeeper, Kafka, Hive, Pig, Sqoop, Oozie, FlumeYarn, HBase, Spark with Scala

No SQL Databases: HBase, Cassandra, mongo DB

Languages: Java, Python, UNIX Shell scripts

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL

Frameworks: MVC, Struts, Spring, Hibernate

Operating Systems: RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: SQL Server, MySQL

Tools: and IDE Eclipse, IntelliJ IDEA

We'd love your feedback!