We provide IT Staff Augmentation Services!

Hadoop Developer/big Data Developer Resume

0/5 (Submit Your Rating)

Jacksonville, FL

SUMMARY

  • Over 8 years of IT Experience in analysis, implementation and testing of enterprise - wide application, Data warehouse, client-server technologies and web-based applications.
  • Over 6 years of experience developing using apache spark for data transformations and processing
  • Over 6 years of experienced in administrative tasks such as multi-node Hadoopinstallation and maintenance
  • Experience in deploying Hadoop2.0 (YARN) and administration of Hbase, Hive, Sqoop, HDFS, and MapR
  • Installed, configured, supported and managed Apache Ambari in Hortonworks Data Platform 2.5, Cloudera Distribution Hadoop 5.x, Linux, Rackspace and AWS cloud infrastructure.
  • Understand the security requirements for Hadoopand integrated with Kerberos infrastructure
  • Good knowledge on Kerberos security while successfully Maintained the cluster by adding and removal of nodes.
  • Handsome experience in Linux admin activities on RHEL & CentOS.
  • Experience in extracting, transforming and loading (ETL) data with Hadoop and Spark
  • Experience in minor and major upgrades of Hadoop and Hadoop eco system.
  • Monitor Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager.
  • Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
  • Involved in bench marking Hadoop / Hbase cluster file systems various batch jobs and workloads.
  • Set up the Linux environments, Password less SSH, creating file systems, disabling firewalls and installing Java.
  • Set up MySQL master and slave replications and helped applications to maintain their data in MySQL Servers.
  • Experienced in job scheduling using different schedulers like FAIR, CAPACITY and FIFO and cluster co-ordination through DISTCP tool.
  • Hands on experience in analyzing Log files for Hadoop ecosystem services and finding root cause.
  • Experience in Amazon AWS cloud Administration and actively involved highly available, Scalability, cost effective and fault tolerant systems using multiple AWS services.
  • This project involves File transmission and electronic data interchange, trades capture, verify, process and routing operations, Banking Reports Generation, Operational management.
  • Experience in dealing with Hadoop cluster and integration with its Ecosystem like HIVE, HBase, Pig, Sqoop, Spark, Oozie, Flume etc.
  • Experienced in AWS CloudFront, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
  • Good working knowledge of Vertica DB architecture, column orientation and High Availability.
  • Performed systems analysis for several information systems documenting and identifying performance and administrative bottlenecks.

PROFESSIONAL EXPERIENCE

HADOOP DEVELOPER/BIG DATA DEVELOPER

Confidential, Jacksonville, FL

Responsibilities:

  • Work on large scale team as sole Hadoop/big data developer to support data transformations using Spark
  • Work with DB2, Postgresql and MSQL databases to source healthcare data for spark jobs
  • Building and maintaining daily incremental load program for delta data
  • Involved in data validation for spark transformation jobs
  • Writing spark code in Scala using Gradle 4.6, Eclipse Scala IDE, to turn programs into jars
  • Hands on work with Json, Avro, Parquet and Csv files
  • Developed business relationships and integrated with other I.T. departments to ensure successful implementation and support of project efforts
  • Building data pipelines and data marts to move and store customer marketing data
  • Developing ETL jobs to extract data for business analysis and customer user experience
  • Collaborating with Business System Analysts to receive correct mapping requirements for code
  • Collaborating and communicating with QA team to properly test and deploy code
  • Communicating with business to transfer business logic needs into actual code
  • Provided scope, sizing and estimates time required to complete code and provide information to the project manager for input to the project plan
  • Determined impacts and integration points and participated in capacity planning with project manager
  • Created application design specification documents from which code will be written
  • Interfaces with external vendors and customers through crosswalk mapping with different and often complex architecture
  • Created and modified code for moderately complex system design that may span platforms
  • Ensures code complies with architectural and SDLC standards
  • Respond and resolved production support issues with programs
  • Developing in an agile methodology with daily scrums and bi - weekly sprints

Environment: Hadoop, Hortonworks, Spark, Hive, Hbase, Eclipse Scala Ide (Gradle), SQL,Winscp,Putty,Unix,MSQLl,RDBMS

HADOOP DEVELOPER

Confidential, Westport, CT

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Installed OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning Scripting Hadoop package installation and configuration to support fully - automated deployments.
  • Installed and configured and maintained Hortonworks HDP 2.2 using Ambari and manually through CLI.
  • Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
  • Involved in developer activities of installation and configuring Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Importing and exporting data into HDFS and Hive using Sqoop and Flume.
  • Monitoring the data streaming between web sources and HDFS and functioning through monitoring tools.
  • Day-to-day operational support of our Cloudera Hadoop clusters in lab and production, at multi-petabyte scale.
  • Involved in creating Spark cluster in HDInsight by create Azure compute resources with Spark installed and configured.
  • Setting up automated processes to analyze the system and Hadoop log files for predefined errors and send alerts to appropriate groups and an Excellent working knowledge on SQL with databases.
  • Commissioning and De-commissioning of data nodes from cluster in case of problems.
  • Setting up automated processes to archive/clean the unwanted data on the cluster, in particular on Name Node and Secondary Name node.
  • Discussions with other technical teams on regular basis regarding upgrades, process changes, any special processing and feedback.
  • Handled Azure Storage like Blob Storage and File Storage and setup Azure CDN and load balancers
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Enable the processing, management, storage and analysis of data using data fabric.
  • Leverage the data and utilized machine learning algorithm.

Environment: Hadoop, Confluent Kafka, Hortonworks HDF, HDP, NIFI, Linux, Redshift, Splunk, Yarn, Cloudera 5.13, Spark, Tableau, Microsoft Azure, Data fabric, DataMesh.

HADOOP DEVELOPER /Admin

Confidential, New York, NY

Responsibilities:

  • Worked on analyzing Cloudera Hadoop and Hortonworks cluster and different big data analytic tools including Pig, Hive and Sqoop
  • Installed and configured CDH cluster, using Cloudera manager for easy management of existing Hadoop cluster.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
  • Working on setting up 100 node production cluster and a 40 node backup cluster at two different data centers
  • Performance tune and manage growth of the O/S, disk usage, and network traffic
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
  • Experienced in managing and reviewing Hadoop log files.
  • Using PIG predefined functions to convert the fixed width file to delimited file.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig job
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from Python using Python - MySQL connector MySQL dB package to retrieve information.
  • Used Django configuration to manage URLs and application parameters.
  • Created Oozie workflows to run multiple MR, Hive and pig jobs.
  • Setup Azure Content Delivery Network (CDN), Azure DNS, Load balancer DDoS Protection in the environment.
  • Experience in Implementation of DAG and high availability.
  • Experience with several tools which helps in migration like ID fix, On-Ramp tool, Microsoft Remote connectivity analyzer, Microsoft network bandwidth analyzer, SCCM etc.
  • Worked on data fabrics to covers multiple sources of data - in the cloud, on-premise, at the edge and other storage locations.
  • Design maintains security and reliable access of data irrespective of the storage location by using Data fabrics.

Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python, Machine Learning, Microsoft Azure.

HADOOP DEVELOPER/Admin

Confidential, McLean, VA

Responsibilities:

  • Launched and configured Amazon EC2 Cloud Instances and S3 buckets using AWS, Ubuntu Linux and RHEL
  • Installed application on AWS EC2 instances and configured the storage on S3 buckets
  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
  • Developed Pig scripts to transform the raw data into intelligent data as specified by business users.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Worked closely with the data modelers to model the new incoming data sets.
  • Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, Sqoop, flume, Spark, Impala, Cassandra with Horton work Distribution.
  • Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Performed real time analysis on the incoming data.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.

SDET

Confidential, New York, NY

Responsibilities:

  • Responsible for implementation and ongoing setting up and administration of Hadoop infrastructure
  • Analyzed technical and functional requirements documents and design and developed QA Test Plan/Test cases, Test Scenario by maintaining E2E flow of process.
  • Developed testing script for internal brokerage application that is utilized by branch and financial market representatives to recommend and manage customer portfolios; including international and capital markets.
  • Designed and Developed Smoke and Regression automation script and Automation of functional testing framework for all modules using Selenium and WebDriver.
  • Configured Selenium WebDriver, TestNG, Maven tool, Cucumber, and BDD Framework and created Selenium automation scripts in java using TestNG.
  • Performed Data - Driven testing by developing Java based library to read test data from Excel & Properties files.
  • Extensively performed DB2 database testing to validate the trade entry from mainframe to backend system.
  • Developed data driven framework with Java, Selenium WebDriver and Apache POI which is used to do the multiple trade order entry.
  • Developed internal application using Angular.js and Node.js connecting to Oracle on the backend.
  • Expertise in debugging issues occurred in front end part of web-based application which is developed using HTML5, CSS3, Angular JS, Node.JS and Java.
  • Applied various testing technique in test cases to cover all business scenario for quality coverage.
  • Executed tests in System & integration Regression testing In Testing environment.
  • Conducted Defect triage meeting, Defect root cause analysis, track defect in HP ALM Quality Center, manage defect by follow up open items, and retest defects with regression testing.
  • Provide QA/UAT sign off after closely reviewing all the test cases in Quality Center along with receiving the Policy sign off the project.

Environment: HP ALM, Selenium WebDriver, JUnit, Cucumber, Angular JS, Node.JS Jenkins, GitHub, Windows, UNIX, Agile, MS SQL, IBM DB2, Putty, WinSCP, FTP Server, Notepad++, C#, DB Visualizer.

We'd love your feedback!