We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Westport, CT

SUMMARY

  • Over 8 years of IT Experience in analysis, implementation and testing of enterprise - wide application, Data warehouse, client-server technologies and web-based applications.
  • Over 4 years of experienced in administrative tasks such as multi-node Hadoop installation and maintenance
  • Experience in deploying Hadoop 2.0 (YARN) and administration of HBase, Hive, Sqoop, HDFS, and MapR
  • Installed, configured, supported and managed Apache Ambari in Hortonworks Data Platform 2.5, Cloudera Distribution Hadoop 5.x, Linux, Rackspace and AWS cloud infrastructure.
  • Understand the security requirements for Hadoop and integrated with Kerberos infrastructure
  • Good knowledge on Kerberos security while successfully Maintained the cluster by adding and removal of nodes.
  • Handsome experience in Linux admin activities on RHEL & CentOS.
  • Experience in minor and major upgrades of Hadoop and Hadoop eco system.
  • Monitor Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager.
  • Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
  • Involved in a bench marking Hadoop / HBase cluster file systems various batch jobs and workloads.
  • Set up the Linux environments, Password less SSH, creating file systems, disabling firewalls and installing Java.
  • Set up MySQL master and slave replications and helped applications to maintain their data in MySQL Servers.
  • Experienced in job scheduling using different schedulers like FAIR, CAPACITY and FIFO and cluster co-ordination through a DISTCP tool.
  • Hands on experience in analyzing Log files for Hadoop ecosystem services and finding the root cause.
  • Experience in Amazon AWS cloud Administration and actively involved highly available, Scalability, cost effective and fault tolerant systems using multiple AWS services.
  • This project involves File transmission and electronic data interchange, trades capture, verify, process and routing operations, Banking Reports Generation, Operational management.
  • Experience in dealing with the Hadoop cluster and integration with its Ecosystem like HIVE, HBase, Pig, Sqoop, Spark, Oozie, Flume etc.
  • Experienced in AWS CloudFront, including creating and managing distributions to provide access to the S3 bucket or HTTP server running on EC2 instances.
  • Good working knowledge of Vertica DB architecture, column orientation and High Availability.
  • Performed systems analysis for several information systems documenting and identifying performance and administrative bottlenecks.
  • Monitor health of the platforms and Generate Performance Reports and Monitor and provide continuous improvements.

PROFESSIONAL EXPERIENCE

HADOOP DEVELOPER

Confidential, Westport, CT

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Installed OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning Scripting Hadoop package installation and configuration to support fully-automated deployments.
  • Installed and configured and maintained Hortonworks HDP 2.2 using Ambari and manually through CLI.
  • Writing Queries using PostgreSQL, Implement and maintain database code in the form of stored procedures, scripts, queries, views, triggers, etc.
  • Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
  • Experience in establishing ETL standards, best practices and strategy.
  • Develop scripts to automate the execution of ETL using shell scripts under Unix environment.
  • Involved in developer activities of installation and configuring Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Importing and exporting data into HDFS and Hive using Sqoop and Flume.
  • Monitoring the data streaming between web sources and HDFS and functioning through monitoring tools.
  • Day-to-day operational support of our Cloudera Hadoop clusters in the lab and production, at multi-petabyte scale.
  • Used Spark-Streaming APIs to perform in-memory transformations and actions for building data models which gets the data from Kafka in near real-time and persists into Cassandra Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFs using both data frames, SQL datasets, RDDs in Spark 1.6 for data aggregation, queries and writing data back into the OLTP system through Sqoop.
  • Experienced in a performance tuning of Spark applications for setting right batch interval time, correct level of parallelism and memory tuning.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's. Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during the ingestion process itself.
  • Analyzed the SQL scripts and sub-queries and designed the solution to implement using Pyspark
  • Experienced in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Involved in creating a Spark cluster in HDInsight by create Azure compute resources with Spark installed and configured.
  • Setting up automated processes to analyze the system and Hadoop log files for predefined errors and send alerts to appropriate groups and an Excellent working knowledge on SQL with databases.
  • Commissioning and De-commissioning of data nodes from a cluster in case of problems.
  • Setting up automated processes to archive/clean the unwanted data on the cluster, in particular on Name Node and Secondary Name node.
  • Discussions with other technical teams on a regular basis regarding upgrades, process changes, any special processing and feedback.
  • Handled Azure Storage like Blob Storage and File Storage and setup Azure CDN and load balancers
  • Involved in Analyzing system failures, identifying root causes, and the recommended course of actions. Documented the systems processes and procedures for future references.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Enable the processing, management, storage and analysis of data using data fabric.
  • Leverage the data and utilized machine learning algorithm.

Environment: Hadoop, Confluent Kafka, Hortonworks HDF, HDP, NIFI, Linux, Splunk, Yarn, Cloudera 5.13, Spark, Tableau, Microsoft Azure, Data fabric, Data Mesh.

HADOOP DEVELOPER

Confidential, New York, NY

Responsibilities:

  • Worked on analyzing Cloudera Hadoop and Hortonworks cluster and different big data analytic tools including Pig, Hive and Sqoop
  • Installed and configured CDH cluster, using Cloudera manager for easy management of existing Hadoop cluster.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
  • Working on setting up 100 node production cluster and a 40-node backup cluster at two different data centers
  • Performance tune and manage growth of the O/S, disk usage, and network traffic
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
  • Experienced in managing and reviewing Hadoop log files.
  • Using PIG predefined functions to convert the fixed width file to delimited file.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and the Pig job
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from Python using the Python-MySQL connector MySQL dB package to retrieve information.
  • Used Django configuration to manage URLs and application parameters.
  • Created Oozie workflows to run multiple MR, Hive and pig jobs.
  • Setup Azure Content Delivery Network (CDN), Azure DNS, Load balancer DDoS Protection in the environment.
  • Experience in Implementation of DAG and high availability.
  • Experience with several tools which helps in migration like ID fix, On-Ramp tool, Microsoft Remote connectivity analyzer, Microsoft network bandwidth analyzer, SCCM etc.
  • Worked on data fabrics to covers multiple sources of data - in the cloud, on-premise, at the edge and other storage locations.
  • Design maintains security and reliable access of data irrespective of the storage location by using Data fabrics.

Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python, Machine Learning, Microsoft Azure.

HADOOP DEVELOPER

Confidential, McLean, VA

Responsibilities:

  • Launched and configured Amazon EC2 Cloud Instances and S3 buckets using AWS, Ubuntu Linux and RHEL
  • Installed application on AWS EC2 instances and configured the storage on S3 buckets
  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
  • Developed Pig scripts to transform the raw data into intelligent data as specified by business users.
  • Worked in the AWS environment for development and deployment of Custom Hadoop Applications.
  • Worked closely with the data modelers to model the new incoming data sets.
  • Involved in start to end the process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
  • Expertise in designing and deployment of the Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, Sqoop, flume, Spark, Impala, Cassandra with Horton work Distribution.
  • Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Import the data from different sources like HDFS/HBase into the Spark RDD.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Performed real time analysis on the incoming data.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.

JAVA Developer

Confidential, McLean, VA

Responsibilities:

  • Developed the Web Based Rich Internet Application (RIA) using J2EE (Spring framework) and Macromedia/Adobe Flex.
  • Developing Applications using the latest JAVA/J2EE technologies such as Cassandra, JAVA 8, Junit.
  • Extensively worked in the designing and implementation of multi-tier applications using JAVA 8, J2EE, JDBC, JSP, HTML5, Spring Boot, AngularJS, Servlets, JavaBeans with Eclipse.
  • Used the new Java 8 features, such as Lambda expressions, Streams and Functional Interfaces.
  • Designed and developed using the Java collection API with Java 8 features and other java processes to fulfill business case requirement, such clients account statements and clients running balance based on each transaction.
  • Replaced existed Angular JS code with Angular 4 code, which decreased number of lines of code for a web application which increased performance.
  • Expertise in the Big data architecture with the Hadoop File system and its eco system tools MapReduce, HBase, Hive, Agile, Pig, Zookeeper, Oozie, Flume, Avro, Impala, Apache spark and Spark Streaming and Spark SQL
  • Worked on Modularization of the JDK under Project Jigsaw on Java 8. RAML specifications/build experience with MuleSoft.
  • Extensively worked on both the Enterprise and Community edition of MULE ESB and also configured MULE API manager and RAML.
  • Worked on RAML and REST based Web services for Mule ESB flows, also on Mule Soft MMC and Enterprise release capabilities.
  • Developed software's for AWS in Java 8 (using Spring Framework 5.0, MySQL 5.6, AWS Aurora, Lambda, API Gateway, S3, SNS, SQS, DynamoDB, EC2, EBS, Akamai WAF (web application firewall) and Apache Tomcat web server.)
  • Developed restful Web-services using the Grails framework in Python.
  • Used Spring RESTful API to create RESTful Web Services, set JSON data type between front-end and the middle-tier controller
  • Developed Spring Restful/Microservices and implemented Spring Eureka, Netflix, Ribbon as part of Services Discovery using Apache Axis.
  • Implemented REST web service in Scala using Akka for CBPMAN log tracking application.
  • Worked in developing front end technologies such as java script, Angular 2.0+, jQuery, HTML, CSS, JSON, JSP and Struts 1.0/2.0
  • Team leader on numerous projects utilizing Java, Java EE, Enterprise Java Bean, and Apache Struts Web applications to create fully-integrated client management systems.
  • Deployed Spring Boot based microservices in Docker and the Amazon EC2 container using Jenkins
  • Worked with Splunk and ELK stack for creating monitoring and analytics solutions.
  • Developed Microservices using Spring MVC, Spring Boot, and Spring Cloud.
  • Used Microservices architecture, with Spring Boot based services interacting through a combination of REST and Spring Boot.
  • Built Spring Boot microservices for the delivery of software products across the enterprise

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.

Hire Now