- I have 9 years of IT experience in Software Development, Having 5 years of experience in Big DataHadoop and NoSQL technologies in various domains.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Experience in Hadoop Ecosystem including Spark, Hive, Pig, HBase, Oozie, Sqoop and Kafka.
- Experience working with NoSQL database including MongoDB and HBase.
- Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture. Having good knowledge in Spark and Kafka.
- Hands on experience with Spark Core, Spark SQL, Spark Streaming using Scala and Python.
- Excellent understanding of Hadoop architecture and its components such as HDFS, Name Node, Data Node and MapReduce programming paradigm.
- Worked on Performance Tuning of Hadoop jobs by applying techniques such as Map Side Joins, Partitioning and Bucketing. Having good knowledge in NoSQL databases like MongoDB, Cassandra.
- Worked with Multiple File Formats like Avro, Parquet, CSV, JSON, Sequential, ORC etc
- Experience utilizing Java tools in Business, Web, and Client-Server environments including Java, Jdbc, Servlets, Jsp, Struts Framework, Jasper Reports and Sql.
- Experienced in using Version Control Tools like Subversion, Git.
Apache Hadoop Ecosystem: HDFS, MapReduce, Oracle, MySQL, Microsoft, YARN, Hive, Pig, Sqoop, ZooKeeper, flume, \SQL Server, PostgreSQL Kafka, Spark, Oozie, Parquet, Avro, ORC, Sequential, CSV, JSON
NoSQL Databases: MongoDB, Cassandra, HBaseXML, HTML, CSS, Javascipt, Jquery
Languages: Java, SQL, Python, Scala, c, c++, c#Linux, Windows, Unix
Confidential, Middletown, NJ
- Ingested incremental Batch Data from MySQL database and Teradata to HDFS using Sqoop at scheduled intervals
- Involved in ingesting real time data to HDFS using Kafka and implemented the Oozie job for daily imports.
- Worked on Amazon Web Services (AWS) using Elastic map reduce (EMR) for data processing with S3 for storage.
- Used Different SparkModules like Sparkcore, SparkSQL, SparkStreaming, SparkData sets and Data frames.
- Involved in converting the files in HDFS into RDD's from multiple data formats and performing Data Cleansing using RRD Operations.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Wrote complex queries and User Defined functions (UDFs) for custom functionality in hive using Python.
- Worked with various HDFS file formats like Avro, ORC, Sequence File and various compression formats like Snappy, gzip etc.
- Integrated Oozie with the rest of Hadoop stack supporting several types of jobs as well as the system specific jobs (such as Java programs and shell scripts).
Environment: HDFS, Spark, Hive, Sqoop, Kafka, AWS EMR, AWS S3, Oozie, Spark Core, SPARK SQL, Maven, Scala, SQL, Linux, YARN, IntelliJ, Agile Methodology
Confidential, Brenham, TX
Big Data Engineer
- Utilized SQOOP, Kafka, Flume and Hadoop File System API’s for implementing data ingestion pipelines from heterogenous data Sources.
- Created storage with Amazon S3 for storing data. Worked on transferring data from Kafka topic into AWS S3 storage.
- Worked on real time streaming, performed transformations on the data using Kafka and Spark Streaming.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
- Created data pipeline for different events of ingestion, aggregation and load consumer response data from AWS S3 bucket into Hive external tables and generated views to serve as feed for tableau dashboards.
- Worked on various data formats like AVRO, Sequence File, JSON, Map File, Parquet and XML.
- Used Apache NiFi to automate data movement between different Hadoop components and perform conversion of raw XML data into JSON, AVRO.
Environment: Hadoop, HDFS, AWS,, Scala, Kafka, MapReduce, YARN, Spark, Pig, Hive, Scala, Java, NiFi, HBase, IMS Mainframe, Maven.
Big Data Developer
Confidential, Conway, AR
- Ingested Batch Files into HDFS using shell scripting.
- Used flume to ingest near-real-time data and perform necessary transformations and aggregations on the fly and persisted the data in Hive.
- Used Hadoop's Pig, Hive and Map Reduce for analyzing the data and to help by extract data sets for meaningful information.
- Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data, such as removing irrelevant information or merging many small files into a handful of very large, compressed files using pig pipelines in the data preparation stage.
- Extensively used PIG to communicate with Hive using HCatalog.
- Implemented exception tracking logic using Pig scripts.
- Saved the analyzed data to the Hive Tables for visualization and to generate reports for the BI team.
- Good understanding of ETL tools and how the ETL operations can be applied in a Big Data environment.
Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, Oozie, Core Java, Python, Eclipse, Flume, Cloudera, Oracle, UNIX Shell Scripting.
- Developed the application using Spring Framework that leverages Model View Controller (MVC) architecture, Spring security and Java API.
- Implemented design patterns such Singleton, Factory pattern and MVC
- Deployed the applications on IBM WebSphere Application Server.
- Worked on Java script, CSS Style Sheet, Richfaces, JQuery.
- Wrote SQL queries to extract data from the Oracle & MySQL databases.
- Involved in Junit Testing for all test case scenarios
- Used CVS for version control across common source code used by developers.
- Applied J2EE Design Patterns such as Factory, Singleton, and Business delegate, DAO, Front Controller Pattern and MVC.
- Developed EJB component to implement business logic using Session and Message Bean.
- Excellent working experience with Oracle10g including storage and retrieving data using Hibernate.
- Building and Deployed the application in WebLogic Application Server.
- Developed and executed Unit Test cases using JUnit framework by supporting TDD.
- Provided extensive pre-delivery support using Bug Fixing and Code Reviews.
Environment: J2EE, JDK 1.5, Spring 2.5,struts 1.2, JSP, Servlets, EJB 3.0, Hibernate 3.0, Oracle 10g, PL/SQL,CSS, Ajax, HTML, java script, Log4j, JUnit, SOAP, Webservices.