- Around 6 years of experience in application development and design using emerging technologies like Hadoop, NoSQL and Java/J2EE.
- Strong experience in requirements gathering, design and development, application migration and maintenance phases of the Software Development Lifecycle (SDLC).
- Experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, ZOOKEEPER, SQOOP, Hue, Spark, Kafka Solr, Git, Maven, AVRO, JSON and CHEF.
- Technically skilled at developing new applications on Hadoop according to business needs and converting existing applications to Hadoop environment
- Exposure in analyzing data using HiveQL, HBase and Map Reduce programs in Java.
- Experience in Machine Learning and Data Science and in using new tools and technologies to drive improvements throughout entire software development lifecycle.
- Well versed on using Sqoop to import data into HDFS from RDBMS and vice - versa.
- Understanding of managed distributions of Hadoop like Cloudera and Hortonworks.
- Proficient knowledge on Apache Spark and programming Scala to analyze large datasets using Spark Streaming and Kafka to process real time data.
- Experience in managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
- Expertise in Cluster co-ordination services through Zookeeper.
- Developed Spark applications using Scala and Python.
- Involved in HBase CRUD operations in both Java API and shell commands.
- Proficiency on indexes, scalability and query language supporting using Cassandra.
- Involved in creating HIVE tables, Partitioning, Bucketing, loading data and writing HIVE queries.
- Designed and implemented Hive UDF's using Java for evaluation, filtering, loading and storing of data.
- Knowledge in installation, configuration, supporting and managing Hadoop clusters using Apache Cloudera (CDH3, CDH4) distributions and Amazon web services.
- Solid understanding on the working of EC2 and S3 in Amazon Web Services (AWS).
- Proficiency in multiple databases like MongoDB, Cassandra, MYSQL, Oracle 9i, 10g, 11g and MS SQL Server .
- Experienced in Core JAVA with strong understanding and working knowledge of object-oriented programming concepts (OOP), Multi-threading, Collections Framework, Exception handling, I/O system & JDBC.
- Well versed working experience in Scrum/Agile framework and Waterfall methodologies.
- Good Inter personnel skills and ability to work as part of a team. Exceptional ability to learn, master new technologies and to deliver outputs in short deadlines.
- Map Reduce
- Service Now.SQL
- Spark Streaming
- Spark MLib
- Java 8
- Waterfall Model
- MVC Struts
- IBM WebSphere/Ascential DataStage 8.7
- Oracle 8.1.2/8.5
- SQL server.
- CSS 3
Confidential - Charlotte, NC
- Exposure on usage of Apache Kafka to develop data pipeline of logs as a stream of messages using producers and consumers in HDFS.
- Responsible for developing prototypes and proof of concepts for the selected solutions and implementing complex big data projects with a focus on collecting, parsing, managing, analyzing and visualizing large sets of data using multiple platforms.
- Developed Hive UDF's to bring all the customers email id into a structured format.
- Involved in converting Hive/SQL queries into Spark transformations using Spark data frames, Scala and Python.
- Performed unit testing for Spark and Spark Streaming with Scala Test and Junit.
- Used Scala to develop Scala coded Spark projects and executed using Spark-submit.
- Leverage Tableau to perform visualizations on the collected data.
- Performed importing data from various sources to the Cassandra cluster using Sqoop. Worked on creating data models for Cassandra from Existing Oracle data model.
- Used Spark - Cassandra connector to load data to and from Cassandra.
- Used Spark SQL to fetch and generate reports on Cassandra table data.
- Set up Solr for distributing indexing and search
- Developed Spark scripts by using Scala Shell commands as per the requirement.
- Developed UDF's using both Data Frames/ SQL and RDD in Spark for data Aggregation queries and reverting into OLTP through Sqoop.
- Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
- Optimized Hive queries to extract the customer information from HDFS or Cassandra.
- Automated the data flow using Nifi / ControlM.
- Loaded two different datasets sources like Oracle, MySQL to HDFS and Hive respectively on daily basis.
- Worked in Agile environment with active scrum participation.
Environment: Map Reduce, HDFS, Spark, Scala, Apache Kafka, Hive, Sqoop, Nifi, Solr, Cassandra, UNIX Shell Scripting, MySQL, Eclipse
Confidential - Houston, TX
Big Data Developer
- Processed the Web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis, also extracted files from MongoDB through Flume and processed.
- Wrote the MapReduce jobs to parse the web logs which are stored in HDFS.
- Involved in optimizing Hive Queries, joins to get better results for Hive ad-hoc queries.
- Implemented Partitioning, Bucketing in Hive for better organization of the data.
- Created hive queries for extracting data and sending them to clients.
- Developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
- Involved in creating Oozie workflow and Coordinator jobs for Hive jobs to kick off the jobs on time for data availability.
- Hands on experience in developing Map Reduce programs using Apache Hadoop for analyzing the Big Data.
- Wrote multiple Hive queries to convert the processed data to multiple file formats including XML, JSON and CSV file formats.
- Created Mappings using Talend Open Studio for Evaluation and POC.
- Spearheaded the POCs for the AWS ecosystem via the AWS Management console.
- Used Agile methodology for project management and Git for source code control.
Environment: HDFS, MapReduce, Hive, Sqoop, Flume, Oozie, Talend, MongoDB, Java, SQL scripting, Linux shell scripting, Eclipse
Confidential - St.Louis, MO
Big Data Developer
- Experience in Importing and exporting data into HDFS and Hive using Sqoop.
- Involved in the design team for designing the flow architecture.
- Experienced in handling data from different data sets, join them and preprocess using Pig join operations.
- Moving Bulk amount data into HBase using Map Reduce Integration.
- Developed HBase data model on top of HDFS data to perform real time analytics using Java API.
- Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
- Implement counters on HBase data to count total records on different tables.
- Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
- Created Hive Dynamic partitions to load time series data.
- Created tables, partitions, buckets and perform analytics using Hive ad-hoc queries.
- Integrated spring schedulers with Oozie client as beans to handle cron jobs.
- Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, HDFS, Map Reduce, Hive, HBase, Sqoop, RDBMS/DB, Flat files, MySQL, CSV, Avro data files
- Extensively used Core Java, Servlets, JSP and XML.
- Used DB2 Database to store the system data
- Used Apache log 4j Logging framework for logging of trace and Auditing.
- Developed RESTful Web services client to consume JSON messages using Spring JMS configuration. Developed the message listener code.
- Developed the business components using EJB Session Beans.
- Created JSP pages for the Customer module of the application.
- Involved in developing in all the tiers of J2EE application.
- Performed code reviews. Used unit testing for all the components using Junit.
- Developed and deployed project in a team that followed a Software management procedure as Rational Unified Process(RUP) combined with Pair Programming and Test-Driven Development (TDD).
Environment: Java, JSP, Servlets, XML, Web Sphere, SQL Server 2003, PLSQL, Windows XP, SVN, ANT
- Implemented various Core Java concepts such as Multi-Threading, Exception Handling, Collection APIs to implement various features and enhancements.
- Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, analysis, design and development.
- Refined application framework for data flow and data handling.
- Generated the application using Eclipse IDE.
- Meeting stringent Deadlines and analyzing the specific process requirements.
- Involved in AJAX implementation.
- Performed unit testing for the developed code using JUnit.
- Worked on defining and improving a home-grown testing framework to support integration and performance testing.
- Implemented Waterfall model practices according to the application requirements.