- 5+ years of IT industry experience with 4 years of experience in dealing with Apache Hadoop components like HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Oozie, Zookeeper, HBase, Cassandra, MongoDB and Amazon Web Services.
- 3+ years of experience in the Application Development and Maintenance of SDLC projects using Java technologies.
- Good experience working with Hortonworks Distribution, Cloudera Distribution and MapR Distribution
- Very good understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Namenode, and MapReduce concepts.
- Developed applications for Distributed Environment using Hadoop, Mapreduce and Python.
- Experience in data extraction and transformation using MapReduce jobs.
- Proficient in working with Hadoop, HDFS, writing PIG scripts and Sqoop scripts.
- Performed data analysis using Hive and Pig.
- Expert in creating Pig and Hive UDFs using Java in order to analyze the data efficiently.
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice - versa.
- Strong understanding of NoSQL databases like HBase, MongoDB & Cassandra.
- Strong understanding of Spark real time streaming and SparkSQL and experience in loading data from external data sources like MySQL and Cassandra for Spark applications.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Well versed with job workflow scheduling and monitoring tools like Oozie
- Developed MapReduce jobs to automate transfer of data from HBase.
- Practical knowledge on implementing Kafka with third-party systems, such as Spark and Hadoop.
- Loaded streaming log data from various webservers into HDFS using Flume.
- Experience in using Sqoop, Oozie and Cloudera Manager.
- Hands on experience in application development using RDBMS, and Linux shell scripting.
- Have experience with working on Amazon EMR and EC2 Spot instances
- Experience in integrating Hadoop with Ganglia and have good understanding of Hadoop metrics and visualization using Ganglia.
- Support development, testing, and operations teams during new system deployments.
- Solid understanding of relational database concepts.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
- Hands on experience in Tableau to generate Hadoop data report.
- Good team player and can work efficiently in multiple team environments and multiple products. Easily adaptable to the new systems and environments
Programming languages: C, C++, Java, Python, Scala, R
HADOOP/BIG DATA: MapReduce, Spark, SparkSQL, PySpark, SparkR, Pig, Hive, Sqoop, Hbase, Flume, Kafka Cassandra, Yarn, Oozie, Zookeeper, Elastic Search
Databases: MySQL, PL/SQL, Mongo DB, HBase, Cassandra.
Operating Systems: Windows, Unix, Linux, Ubuntu.
Reporting Tools: Tableau
Web/Application Servers: Apache Tomcat, Sun Java Application Server
IDE Tools: IntelliJ, Eclipse, NetBeans
Version Controls: GIT, SVN
Cloud Services: Amazon Web Services
Monitoring Tools: Nagios, Ganglia
Build Tools: Maven
Confidential, Santa Clara, CA
- Implemented workflows to process around 400 messages per second and push the messages to the DocumentDB as well as Event Hubs.
- Developed a custom message producer which can produce about 4000 messages per second for scalability testing.
- Implemented call-back architecture and notification architecture for real time data.
- Implemented spark streaming in scala to process the JSON Messages and push them to the kafka topic.
- Created Custom Dashboards Using Aplication Insights and Aplication Insights Query Language to process metrics sent to AI and create dashboards on top of it in AZURE.
- Created real time streaming dashboards in PowerBi using Stream Analytics to push dataset to PowerBi.
- Developed a custom message consumer to consume the data from the kafka producer and push the messages to service bus and event hub (Azure Components).
- Written Auto scalable functions which will consume the data from Azure Service Bus or Azure Event Hub and send the data to DocumentDB.
- Written spark Application to capture the change feed from the DocumentDB using java API and write updates to the new DocumentDB.
- Implemented Zero Down Time deployment for the entire production pipelines in Azure.
- Implemented CICD pipelines to build and deploy the projects in Hadoop environment.
- Experienced in implementing the pipelines in Jenkins.
- Used Custom Receiver, socket stream, File stream and Directory stream in spark streaming.
- Used APP Insights, DocumentDB, Service Bus, Azure Data Lake Store, Azure Blob Store, Event HUB, Azure Functions.
- Used Python to run the ansible playbook which will deploy the logic apps to azure.
Environment: Hadoop, Hive, HDFS, Azure, AWS, spark Streaming, spark-sql, scala, python, Java, webserver’s, Maven Build, Jenkins, Ansible.
Confidential, San Francisco
- Collaborating with our CD team to design, deploy, manage and operate scalable, highly available, and fault tolerant systems on AWS.
- Ensure data integrity and data security within our production system.
- Develop and deliver within our Continuous Delivery Framework.
- Shifting legacy applications to AWS.
- Handle billions of log lines coming from several clients and analyze those using big data technologies like Hadoop (HDFS), Apache Kafka and Apache Storm.
- Continuous improvement of code to handle more events coming into the cluster.
- Scaling the cluster accordingly to handle sudden spike in the incoming logs.
- Monitoring the entire cluster in Ambari and troubleshooting the storm supervisors, Kafka brokers and zoo keeper.
- Query for huge sets of data for the event generation in No SQL database such as Vertica.
- Query for feeds/regexes in MS SQL for URL module in the cluster.
- Implement log metrics using log management tool such as Elastic Search, Log stash, Kibana (ELK stack) and visualize them in dashboards like Grafana and Kibana.
- Use of YAML templating to send the metrics through Filebeat.
- Migrate existing architecture to Amazon Web Services and utilize several technologies like Kinesis, RedShift, AWS Lambda, Cloud watch metrics.
- Query in Amazon Athena with the alerts coming from S3 buckets and finding out the alerts generation difference from the Kafka cluster and Kinesis cluster.
- Extensive use of Python for managing services in AWS using boto library.
- Use of cloud orchestration technologies like Terraform to spin up the clusters.
- Use terraform to setup security groups and CloudWatch metrics in AWS.
- Aggressive unit testing of java code using Junit and Mockito.
Environment: AWS, MongoDB, HDFS, Sqoop, Oozie, Maven, IntelliJ, GIT, UNIX Shell scripting, Linux, Agile development
Confidential, Stamford, CT
- Responsible for gathering all required information and requirements for the project.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Real time streaming the data using Spark with Kafka.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scale.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Involved in loading data from LINUX file system to HDFS.
- Importing and exporting data into HDFS using Sqoop and Kafka.
- Experience working on processing unstructured data using Spark and Hive.
- Involved in writing custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
- Created Hive tables and implemented Partitioning, Dynamic Partitions, Buckets on the tables.
- Supported Map Reduce Programs those are running on the cluster.
- Gained experience in managing and reviewing Hadoop log files.
- Involved in writing code with Scala which has support for functional programming.
- Involved in scheduling Oozie workflow engine to run multiple pig jobs.
- Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
- Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
- Monitored and scheduled the UNIX scripting jobs.
- Gained knowledge in NoSQL database with Cassandra and MongoDB.
- Experience in Agile Programming and accomplishing the tasks to meet deadlines
- Exported the result set from Hive to MySQL using Shell scripts.
- Actively involved in code review and bug fixing for improving the performance.
Environment: Hadoop Cloudera Distribution(CDH4), Java 7, Hadoop 2.5.2, Spark, SparkSQL, MLib, R programming, Scala, Cassandra, IoT, MapReduce, Apache Pig 0.14.0, Apache Hive 1.0.0, HDFS, Sqoop, Oozie, Kafka, Maven, Eclipse, Nagios, Ganglia, Zookeeper, GIT, UNIX Shell scripting, Oracle 11g/10g, Linux, Agile development.
- Involved in Analysis, Design, Implementation and Bug Fixing Activities.
- Designing the initial Web-WAP pages for a better UI as per the requirement.
- Involved in Functional & Technical Specification documents review and also the code review.
- Undergone training on the Domain Knowledge.
- Involved in design of basic Class Diagrams, Sequence Diagrams and Event Diagrams as a part of Documentation.
- Discussions and meetings with the Business Analysts for understanding the functionality involved in Test Cases review.
- Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
- Prepared the Support Guide containing the complete functionality.
Environment: Core Java, ApacheTomcat5.1, Oracle 9i, Java Script, HTML, PL/SQL, Rational Rose, Windows XP, UNIX.