Hadoop Developer/big Data Developer Resume
Jacksonville, FL
SUMMARY
- Over 8 years of IT Experience in analysis, implementation and testing of enterprise - wide application, Data warehouse, client-server technologies and web-based applications.
- Over 6 years of experience developing using apache spark for data transformations and processing
- Over 6 years of experienced in administrative tasks such as multi-node Hadoopinstallation and maintenance
- Experience in deploying Hadoop2.0 (YARN) and administration of Hbase, Hive, Sqoop, HDFS, and MapR
- Installed, configured, supported and managed Apache Ambari in Hortonworks Data Platform 2.5, Cloudera Distribution Hadoop 5.x, Linux, Rackspace and AWS cloud infrastructure.
- Understand the security requirements for Hadoopand integrated with Kerberos infrastructure
- Good knowledge on Kerberos security while successfully Maintained the cluster by adding and removal of nodes.
- Handsome experience in Linux admin activities on RHEL & CentOS.
- Experience in extracting, transforming and loading (ETL) data with Hadoop and Spark
- Experience in minor and major upgrades of Hadoop and Hadoop eco system.
- Monitor Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager.
- Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Involved in bench marking Hadoop / Hbase cluster file systems various batch jobs and workloads.
- Set up the Linux environments, Password less SSH, creating file systems, disabling firewalls and installing Java.
- Set up MySQL master and slave replications and helped applications to maintain their data in MySQL Servers.
- Experienced in job scheduling using different schedulers like FAIR, CAPACITY and FIFO and cluster co-ordination through DISTCP tool.
- Hands on experience in analyzing Log files for Hadoop ecosystem services and finding root cause.
- Experience in Amazon AWS cloud Administration and actively involved highly available, Scalability, cost effective and fault tolerant systems using multiple AWS services.
- This project involves File transmission and electronic data interchange, trades capture, verify, process and routing operations, Banking Reports Generation, Operational management.
- Experience in dealing with Hadoop cluster and integration with its Ecosystem like HIVE, HBase, Pig, Sqoop, Spark, Oozie, Flume etc.
- Experienced in AWS CloudFront, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
- Good working knowledge of Vertica DB architecture, column orientation and High Availability.
- Performed systems analysis for several information systems documenting and identifying performance and administrative bottlenecks.
PROFESSIONAL EXPERIENCE
HADOOP DEVELOPER/BIG DATA DEVELOPER
Confidential, Jacksonville, FL
Responsibilities:
- Work on large scale team as sole Hadoop/big data developer to support data transformations using Spark
- Work with DB2, Postgresql and MSQL databases to source healthcare data for spark jobs
- Building and maintaining daily incremental load program for delta data
- Involved in data validation for spark transformation jobs
- Writing spark code in Scala using Gradle 4.6, Eclipse Scala IDE, to turn programs into jars
- Hands on work with Json, Avro, Parquet and Csv files
- Developed business relationships and integrated with other I.T. departments to ensure successful implementation and support of project efforts
- Building data pipelines and data marts to move and store customer marketing data
- Developing ETL jobs to extract data for business analysis and customer user experience
- Collaborating with Business System Analysts to receive correct mapping requirements for code
- Collaborating and communicating with QA team to properly test and deploy code
- Communicating with business to transfer business logic needs into actual code
- Provided scope, sizing and estimates time required to complete code and provide information to the project manager for input to the project plan
- Determined impacts and integration points and participated in capacity planning with project manager
- Created application design specification documents from which code will be written
- Interfaces with external vendors and customers through crosswalk mapping with different and often complex architecture
- Created and modified code for moderately complex system design that may span platforms
- Ensures code complies with architectural and SDLC standards
- Respond and resolved production support issues with programs
- Developing in an agile methodology with daily scrums and bi - weekly sprints
Environment: Hadoop, Hortonworks, Spark, Hive, Hbase, Eclipse Scala Ide (Gradle), SQL,Winscp,Putty,Unix,MSQLl,RDBMS
HADOOP DEVELOPER
Confidential, Westport, CT
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Installed OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning Scripting Hadoop package installation and configuration to support fully - automated deployments.
- Installed and configured and maintained Hortonworks HDP 2.2 using Ambari and manually through CLI.
- Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Involved in developer activities of installation and configuring Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop and Flume.
- Monitoring the data streaming between web sources and HDFS and functioning through monitoring tools.
- Day-to-day operational support of our Cloudera Hadoop clusters in lab and production, at multi-petabyte scale.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with Spark installed and configured.
- Setting up automated processes to analyze the system and Hadoop log files for predefined errors and send alerts to appropriate groups and an Excellent working knowledge on SQL with databases.
- Commissioning and De-commissioning of data nodes from cluster in case of problems.
- Setting up automated processes to archive/clean the unwanted data on the cluster, in particular on Name Node and Secondary Name node.
- Discussions with other technical teams on regular basis regarding upgrades, process changes, any special processing and feedback.
- Handled Azure Storage like Blob Storage and File Storage and setup Azure CDN and load balancers
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Enable the processing, management, storage and analysis of data using data fabric.
- Leverage the data and utilized machine learning algorithm.
Environment: Hadoop, Confluent Kafka, Hortonworks HDF, HDP, NIFI, Linux, Redshift, Splunk, Yarn, Cloudera 5.13, Spark, Tableau, Microsoft Azure, Data fabric, DataMesh.
HADOOP DEVELOPER /Admin
Confidential, New York, NY
Responsibilities:
- Worked on analyzing Cloudera Hadoop and Hortonworks cluster and different big data analytic tools including Pig, Hive and Sqoop
- Installed and configured CDH cluster, using Cloudera manager for easy management of existing Hadoop cluster.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
- Working on setting up 100 node production cluster and a 40 node backup cluster at two different data centers
- Performance tune and manage growth of the O/S, disk usage, and network traffic
- Responsible for building scalable distributed data solutions using Hadoop.
- Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
- Experienced in managing and reviewing Hadoop log files.
- Using PIG predefined functions to convert the fixed width file to delimited file.
- Involved in scheduling Oozie workflow engine to run multiple Hive and Pig job
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from Python using Python - MySQL connector MySQL dB package to retrieve information.
- Used Django configuration to manage URLs and application parameters.
- Created Oozie workflows to run multiple MR, Hive and pig jobs.
- Setup Azure Content Delivery Network (CDN), Azure DNS, Load balancer DDoS Protection in the environment.
- Experience in Implementation of DAG and high availability.
- Experience with several tools which helps in migration like ID fix, On-Ramp tool, Microsoft Remote connectivity analyzer, Microsoft network bandwidth analyzer, SCCM etc.
- Worked on data fabrics to covers multiple sources of data - in the cloud, on-premise, at the edge and other storage locations.
- Design maintains security and reliable access of data irrespective of the storage location by using Data fabrics.
Environment: Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Cloudera Manager, Apache Yarn, Python, Machine Learning, Microsoft Azure.
HADOOP DEVELOPER/Admin
Confidential, McLean, VA
Responsibilities:
- Launched and configured Amazon EC2 Cloud Instances and S3 buckets using AWS, Ubuntu Linux and RHEL
- Installed application on AWS EC2 instances and configured the storage on S3 buckets
- Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
- Developed Pig scripts to transform the raw data into intelligent data as specified by business users.
- Worked in AWS environment for development and deployment of Custom Hadoop Applications.
- Worked closely with the data modelers to model the new incoming data sets.
- Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, Sqoop, flume, Spark, Impala, Cassandra with Horton work Distribution.
- Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Performed real time analysis on the incoming data.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.
SDET
Confidential, New York, NY
Responsibilities:
- Responsible for implementation and ongoing setting up and administration of Hadoop infrastructure
- Analyzed technical and functional requirements documents and design and developed QA Test Plan/Test cases, Test Scenario by maintaining E2E flow of process.
- Developed testing script for internal brokerage application that is utilized by branch and financial market representatives to recommend and manage customer portfolios; including international and capital markets.
- Designed and Developed Smoke and Regression automation script and Automation of functional testing framework for all modules using Selenium and WebDriver.
- Configured Selenium WebDriver, TestNG, Maven tool, Cucumber, and BDD Framework and created Selenium automation scripts in java using TestNG.
- Performed Data - Driven testing by developing Java based library to read test data from Excel & Properties files.
- Extensively performed DB2 database testing to validate the trade entry from mainframe to backend system.
- Developed data driven framework with Java, Selenium WebDriver and Apache POI which is used to do the multiple trade order entry.
- Developed internal application using Angular.js and Node.js connecting to Oracle on the backend.
- Expertise in debugging issues occurred in front end part of web-based application which is developed using HTML5, CSS3, Angular JS, Node.JS and Java.
- Applied various testing technique in test cases to cover all business scenario for quality coverage.
- Executed tests in System & integration Regression testing In Testing environment.
- Conducted Defect triage meeting, Defect root cause analysis, track defect in HP ALM Quality Center, manage defect by follow up open items, and retest defects with regression testing.
- Provide QA/UAT sign off after closely reviewing all the test cases in Quality Center along with receiving the Policy sign off the project.
Environment: HP ALM, Selenium WebDriver, JUnit, Cucumber, Angular JS, Node.JS Jenkins, GitHub, Windows, UNIX, Agile, MS SQL, IBM DB2, Putty, WinSCP, FTP Server, Notepad++, C#, DB Visualizer.