- Overall 4+ years professional IT experience in Software Development, This includes 2 years of experience in Java/J2EE back end Application development and 2+ year of experience in ingestion, storage, querying, processing and analysis of Big Data using Hadoop technologies and solutions.
- Over 1 year of experience in Spark SQL and Spark Streaming.
- Deep understanding of Hadoop Architecture and various components such as HDFS, Name Node, Secondary Name Node, Data Node, Resource Manager, Node Manager, Yarn, and MapReduce concepts.
- Experience in installation, configuration, management and deployment of Hadoop Cluster, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Hbase, Zookeeper, Kafka, and Spark.
- Hands on experience in developing and deploying enterprise base applications using major components in Hadoop ecosystem like Hadoop 2.x, Yarn, Hive, Pig, MapReduce, Hbase, Flume, Sqoop, Kafka, Spark, Oozie, Cassandra and Zookeeper.
- Experience in creating Spark Contexts, Spark SQL Contexts and Spark Streaming Context to process huge sets of data.
- Experience in performing SQL and hive operations using Spark SQL.
- Performed real time analytics on streaming data using Spark Streaming.
- Created Kafka Topics and distributed to different consumer applications.
- Expertise in working with MapReduce programs using Apache Hadoop for working with Big Data.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database System.
- Experience in NoSQL databases like Hbase and Cassandra for data extraction and storing huge volumes of data.
- Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Experience with databases like MySQL and SQL Server.
- Knowledge of multivendor operating systems including Linux, Windows and Unix Shell Script.
- Strong programming skills in Core Java, J2EE, Scala and Python technologies.
- Extensive use of Open Source Software such as Web/Application Servers like Apache Tomcat and Eclipse.
- Expertise skills in Java Multithreading, Object - Oriented Design Patterns, Exception Handling, Servlet, Garbage Collection, JSP, HTML, Struts, Hibernate, Enterprise Java Beans, JDBC and XML related technologies.
- Have the motivation to take independent responsibility as well as ability to contribute and be a productive team member.
Big Data Ecosystem: HDFS, MapReduce, Hive, Yarn, Flume, Pig, Sqoop, Hbase, Spark, Kafka, Oozie, Zookeeper, Cassandra
Hadoop Distributions: Cloudera CDH, Docker
Languages: Java, Scala, Python, SQL, HiveQL, Spark SQL
Databases: MySQL, NoSQL(Hbase, Cassandra)
Confidential, Fremont, CA
Big Data Engineer
- Developed Data Processing pipline
- Implemented various data importing and exporting jobs into HDFS and Hive using Sqoop
- Created analytics and reports from data using HiveQL
- Transform and created RDDs, DataFrame using Spark
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data
- Involved in creating Hive tables, loading the data using it and in writing Hive queries to analyze the data
- Importing and exporting data into HDFS and Hive using Sqoop
- Experience working on processing unstructured data using Hive
- Created partitioned tables in Hive
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way
Environment: Cloudera, Hive, MapReduce, Sqoop, Spark, Scala, Python
Confidential, Mountain View, CA
Big Data Analyst
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Flume, Kafka, Cassandra, Spark, Redis with Docker Container.
- Data Ingestion using Open source Hadoop distribution to process Structured, Semi Structured and Unstructured datasets using Open source Apache tools like Flume to the Hadoop System
- Developed Spark code using Python and Spark-SQL for faster testing and data processing.
- Experienced with batch processing of data sources using Apache Spark.
- Developed Kafka producer and consumers to provide data for Cassandra to do backup storage and for spark to do streaming processing
- Stored the processed data from Spark into Redis cache
- Used Node.js to do data visualization
- Assigned multiple jobs using MESOS, and ran the whole project on AWS
Environment: FTP, Flume, Kafka, Cassandra, Spark Streaming, Redis, Node.js, MESOS, AWS, Docker
Confidential, Miami, FL
- Contributed to a big data solution for hurricane center to mask PII(personal identified information) in the non-production instances with encrypted values, so that is not visible
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Sqoop and Hive
- Brought the data from MySQL to HDFS using Sqoop on Cloudera CDH
- Created Hive inner tables and external tables
- Written Hive UDF functions in eclipse for masking the required columns, and added the function to Hive to do the masking job
- Extensively worked with Cloudera Distribution Hadoop, CDH 5.x.
Environment: Cloudera CDH, HDFS, Sqoop, Hive, MySQL, eclipse
Jr. Java Developer
- Involved in Design, Development, Testing and Integration of the application
- Participated in development of user interface modules using HTML, CSS and JSP
- Hands on writing SQL queries
- Experience in coding, maintaining, and administering Servlets, and JSP components to be deployed on Apache Tomcat application servers
- Involved in fixing bugs and unit testing with test cases using JUnit
- Database access was done using JDBC. Accessed stored procedures using JDBC
- Worked on bug fixing and enhancements on change requests
- Coordinated tasks with clients, support groups and development team
- Participated in weekly design reviews and walkthroughs with project manager and development teams
Environment: Java, Eclipse, HTML, CSS, JSP, JDBC, SQL and Tomcat