Hadoop Developer Resume Santa Clara, CA - Hire IT People

PROFESSIONAL SUMMARY:

5+ years of IT industry experience with 4 years of experience in dealing with Apache Hadoop components like HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Oozie, Zookeeper, HBase, Cassandra, MongoDB and Amazon Web Services.
3+ years of experience in the Application Development and Maintenance of SDLC projects using Java technologies.
Good experience working with Hortonworks Distribution, Cloudera Distribution and MapR Distribution
Very good understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Namenode, and MapReduce concepts.
Developed applications for Distributed Environment using Hadoop, Mapreduce and Python.
Experience in data extraction and transformation using MapReduce jobs.
Proficient in working with Hadoop, HDFS, writing PIG scripts and Sqoop scripts.
Performed data analysis using Hive and Pig.
Expert in creating Pig and Hive UDFs using Java in order to analyze the data efficiently.
Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice - versa.
Strong understanding of NoSQL databases like HBase, MongoDB & Cassandra.
Strong understanding of Spark real time streaming and SparkSQL and experience in loading data from external data sources like MySQL and Cassandra for Spark applications.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Well versed with job workflow scheduling and monitoring tools like Oozie
Developed MapReduce jobs to automate transfer of data from HBase.
Practical knowledge on implementing Kafka with third-party systems, such as Spark and Hadoop.
Loaded streaming log data from various webservers into HDFS using Flume.
Experience in using Sqoop, Oozie and Cloudera Manager.
Hands on experience in application development using RDBMS, and Linux shell scripting.
Have experience with working on Amazon EMR and EC2 Spot instances
Experience in integrating Hadoop with Ganglia and have good understanding of Hadoop metrics and visualization using Ganglia.
Support development, testing, and operations teams during new system deployments.
Solid understanding of relational database concepts.
Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
Hands on experience in Tableau to generate Hadoop data report.
Good team player and can work efficiently in multiple team environments and multiple products. Easily adaptable to the new systems and environments

TECHNICAL SKILLS:

Programming languages: C, C++, Java, Python, Scala, R

HADOOP/BIG DATA: MapReduce, Spark, SparkSQL, PySpark, SparkR, Pig, Hive, Sqoop, Hbase, Flume, Kafka Cassandra, Yarn, Oozie, Zookeeper, Elastic Search

Databases: MySQL, PL/SQL, Mongo DB, HBase, Cassandra.

Operating Systems: Windows, Unix, Linux, Ubuntu.

Web Development: HTML, JSP, JavaScript, JQuery, CSS, XML, AJAX.

Reporting Tools: Tableau

Web/Application Servers: Apache Tomcat, Sun Java Application Server

IDE Tools: IntelliJ, Eclipse, NetBeans

Scripting: BASH, JavaScript

Version Controls: GIT, SVN

Cloud Services: Amazon Web Services

Monitoring Tools: Nagios, Ganglia

Build Tools: Maven

PROFESSIONAL EXPERIENCE:

Confidential, Santa Clara, CA

Hadoop Developer

Responsibilities:

Implemented workflows to process around 400 messages per second and push the messages to the DocumentDB as well as Event Hubs.
Developed a custom message producer which can produce about 4000 messages per second for scalability testing.
Implemented call-back architecture and notification architecture for real time data.
Implemented spark streaming in scala to process the JSON Messages and push them to the kafka topic.
Created Custom Dashboards Using Aplication Insights and Aplication Insights Query Language to process metrics sent to AI and create dashboards on top of it in AZURE.
Created real time streaming dashboards in PowerBi using Stream Analytics to push dataset to PowerBi.
Developed a custom message consumer to consume the data from the kafka producer and push the messages to service bus and event hub (Azure Components).
Written Auto scalable functions which will consume the data from Azure Service Bus or Azure Event Hub and send the data to DocumentDB.
Written spark Application to capture the change feed from the DocumentDB using java API and write updates to the new DocumentDB.
Implemented Zero Down Time deployment for the entire production pipelines in Azure.
Implemented CICD pipelines to build and deploy the projects in Hadoop environment.
Experienced in implementing the pipelines in Jenkins.
Used Custom Receiver, socket stream, File stream and Directory stream in spark streaming.
Used APP Insights, DocumentDB, Service Bus, Azure Data Lake Store, Azure Blob Store, Event HUB, Azure Functions.
Used Python to run the ansible playbook which will deploy the logic apps to azure.

Environment: Hadoop, Hive, HDFS, Azure, AWS, spark Streaming, spark-sql, scala, python, Java, webserver’s, Maven Build, Jenkins, Ansible.

Confidential, San Francisco

Hadoop Developer

Responsibilities:

Collaborating with our CD team to design, deploy, manage and operate scalable, highly available, and fault tolerant systems on AWS.
Ensure data integrity and data security within our production system.
Develop and deliver within our Continuous Delivery Framework.
Shifting legacy applications to AWS.
Handle billions of log lines coming from several clients and analyze those using big data technologies like Hadoop (HDFS), Apache Kafka and Apache Storm.
Continuous improvement of code to handle more events coming into the cluster.
Scaling the cluster accordingly to handle sudden spike in the incoming logs.
Monitoring the entire cluster in Ambari and troubleshooting the storm supervisors, Kafka brokers and zoo keeper.
Query for huge sets of data for the event generation in No SQL database such as Vertica.
Query for feeds/regexes in MS SQL for URL module in the cluster.
Implement log metrics using log management tool such as Elastic Search, Log stash, Kibana (ELK stack) and visualize them in dashboards like Grafana and Kibana.
Use of YAML templating to send the metrics through Filebeat.
Migrate existing architecture to Amazon Web Services and utilize several technologies like Kinesis, RedShift, AWS Lambda, Cloud watch metrics.
Query in Amazon Athena with the alerts coming from S3 buckets and finding out the alerts generation difference from the Kafka cluster and Kinesis cluster.
Extensive use of Python for managing services in AWS using boto library.
Use of cloud orchestration technologies like Terraform to spin up the clusters.
Use terraform to setup security groups and CloudWatch metrics in AWS.
Aggressive unit testing of java code using Junit and Mockito.

Environment: AWS, MongoDB, HDFS, Sqoop, Oozie, Maven, IntelliJ, GIT, UNIX Shell scripting, Linux, Agile development

Confidential, Stamford, CT

Hadoop Developer

Responsibilities:

Responsible for gathering all required information and requirements for the project.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
Real time streaming the data using Spark with Kafka.
Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scale.
Worked on debugging, performance tuning of Hive & Pig Jobs.
Involved in loading data from LINUX file system to HDFS.
Importing and exporting data into HDFS using Sqoop and Kafka.
Experience working on processing unstructured data using Spark and Hive.
Involved in writing custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
Created Hive tables and implemented Partitioning, Dynamic Partitions, Buckets on the tables.
Supported Map Reduce Programs those are running on the cluster.
Gained experience in managing and reviewing Hadoop log files.
Involved in writing code with Scala which has support for functional programming.
Involved in scheduling Oozie workflow engine to run multiple pig jobs.
Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
Monitored and scheduled the UNIX scripting jobs.
Gained knowledge in NoSQL database with Cassandra and MongoDB.
Experience in Agile Programming and accomplishing the tasks to meet deadlines
Exported the result set from Hive to MySQL using Shell scripts.
Actively involved in code review and bug fixing for improving the performance.

Environment: Hadoop Cloudera Distribution(CDH4), Java 7, Hadoop 2.5.2, Spark, SparkSQL, MLib, R programming, Scala, Cassandra, IoT, MapReduce, Apache Pig 0.14.0, Apache Hive 1.0.0, HDFS, Sqoop, Oozie, Kafka, Maven, Eclipse, Nagios, Ganglia, Zookeeper, GIT, UNIX Shell scripting, Oracle 11g/10g, Linux, Agile development.

Confidential

Java Developer

Responsibilities:

Involved in Analysis, Design, Implementation and Bug Fixing Activities.
Designing the initial Web-WAP pages for a better UI as per the requirement.
Involved in Functional & Technical Specification documents review and also the code review.
Undergone training on the Domain Knowledge.
Involved in design of basic Class Diagrams, Sequence Diagrams and Event Diagrams as a part of Documentation.
Discussions and meetings with the Business Analysts for understanding the functionality involved in Test Cases review.
Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
Prepared the Support Guide containing the complete functionality.

Environment: Core Java, ApacheTomcat5.1, Oracle 9i, Java Script, HTML, PL/SQL, Rational Rose, Windows XP, UNIX.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Santa Clara, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship