- 7 years of professional experience in information technology, which includes 4 years of experience in the development of Bigdata and Hadoop ecosystem components.
- Over 3 years of extensive experience in JAVA, J2EE Technologies, Database development and Data analytics.
- Hands on experience in development of Big Data projects using Hadoop, Hive, Sqoop, Oozie, PIG, Flume, and MapReduce open source tools/technologies.
- Experience in writing Pig Latin, HiveQL scripts and extended their functionality using User Defined Functions (UDF’s).
- Hands on experience with performance optimization techniques for data processing in Hive, Impala, Spark, Pig, Map - Reduce.
- Written complex Map-Reduce code by implementing custom writable and writable comparable for analysis of large datasets.
- Had a very good exposure working with various File-Formats (Parquet, Avro, JSON) and Compressions (Snappy, Bzip & Gzip).
- Hands on experience with Spark Core, Spark SQL, and Data Frames/Data Sets/RDD API.
- Developed applications using Spark for data processing.
- Replaced existing map-reduce jobs and Hive scripts with Spark Data-Frame transformation and actions.
- Capable of using AWS utilities such as EMR, S3 and cloud watch to run and monitor Hadoop and spark jobs on AWS.
- Good knowledge on Spark architecture and real-time streaming using Spark.
- Fluent with the core Java concepts like I/O, Multi-threading, Exceptions, RegEx, Collections, Data-structures and serialization.
- Experience in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
- Experience in Java, JSP, Servlets, Web Logic, Web Sphere, Java Script, JQuery, XML, and HTML.
- Experience in writing stored procedures and complex SQL queries using relational databases like Oracle, SQL Server, and MySQL.
- Knowledge on ETL methods for data extraction, transformation and loading in corporate-wide ETL solutions and Data warehouse tools for reporting and data analysis.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Well-Versed with Agile and Waterfall methodologies.
- Strong team player with good communication, analytical, presentation and interpersonal skills.
Big Data Ecosystem: Hadoop, MapReduce, HDFS, HBase, Cassandra, Mongo DB Zookeeper, Hive, Pig, Sqoop, Flume and Oozie.
Operating Systems: Windows, UNIX, LINUX.
Programming Languages: C, Java, PL/SQL, Scala
Hadoop Distribution: Cloudera, Hartonworks.
Java/J2EE Technologies: Java, J2EE, JDBC.
Database: Oracle, MS Access, MySQL, SQL, No SQL.
IDE: Eclipse, IntellIj, SBT.
Methodologies: J2EE Design patterns, Scrum, Agile, Water Flow
Version Control: SVN, Git, GitHub, BITBUCKET
Confidential, Atlanta, GA
- Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
- Created Partitioning and Bucketing on Hive tables in Parquet File Formats with Snappy compression
- Involved in running all the hive scripts through hive, Impala, Hive on Spark, and some through Spark SQL using Scala .
- Involved in performance tuning of Hive for design, storage, and query perspectives.
- Collected the JSON data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
- Developed Spark core and Spark SQL scripts using Scala for faster data processing.
- Worked with Spark - SQL context to create data frames to filter input data for model execution.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, the correct level of Parallelism and memory tuning.
- Developed Kafka consumer to consume data from Kafka topics .
- Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
- Integrated Hive with Tableau to generate reports for the end user.
- Developed shell scripts for running Hive scripts in Hive and Impala .
- Orchestrated number of Sqoop, Hive scripts using Oozie workflow, and scheduled using Oozie coordinator.
- Used Jira for bug tracking, BitBucket to check-in, and checkout code changes.
Environment: HDFS, Yarn, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL Linux Shell Scripting, Hortonworks.
Confidential, Minneapolis, MA
Hadoop Developer/Spark Developer
- Involved in Importing and exporting the data into HDFS and Hive using Sqoop and Kafka.
- Converted complex Teradata and Netezza SQL into HiveQL.
- Developed ETL using Hive, Oozie, shell scripts and Sqoop. Used Scala for coding the components, & Utilized Scala pattern matching in coding.
- Used Flume to collect, aggregate and store the weblog data into HDFS .
- Designed NoSQL schemas in HBase .
- Developed MapReduce ETL in Java and Pig .
- Loaded log data into HDFS using Flume .
- Developed simple to complex MapReduce Jobs using Hive and Pig.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs .
- Implemented Partitioning, Dynamic Partition, Bucket in HIVE .
Environment: Map Reduce, HDFS, Hive, Pig, Sqoop, Scala, Oozie, SQL, Flume, Python, Shell Script, DataStage, Horton works.
Confidential, Tampa, FL
- Loaded the data using Sqoop from different RDBMS Servers like Teradata and Netezza to Hadoop HDFS Cluster.
- Performed Sqoop Incremental imports by using Oozie based on every day.
- Involved in creating Hive tables, loading with data, and writing hive queries which will run internally in the map-reduce pattern.
- Performed Optimizations of Hive Queries using Map-side joins, dynamic partitions, and Bucketing.
- Responsible for executing Hive queries using Hive Command Line under Cloudera Manager.
- Implemented Hive Generic UDF’s to implement business logic around custom data types.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Coordinated the Pig and Hive scripts using Oozie workflow.
- Loaded the data into HBase from HDFS.
- Loaded and transformed large sets of structured, semi-structured, and unstructured data that includes Avro, sequence files, and XML files.
Environment: Hadoop, Cloudera, Big Data, HDFS, MapReduce, Sqoop, Oozie, Pig, HiveLinux, Java, Eclipse.
- Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, analysis, design, and development.
- Analyze large datasets to provide strategic direction to the company.
- Collected the logs from the physical machines and integrated into HDFS using Flume.
- Involved in analyzing the system and business.
- Developed SQL statements to improve back-end communications.
- Loaded unstructured data into Hadoop File System (HDFS).
- Created reports and dashboards using structured and unstructured data.
- Involved in importing data from MySQL to HDFS using SQOOP.
- Involved in writing Hive queries to load and process data in Hadoop File System.
- Involved in creating Hive tables, loading with data, and writing hive queries which will run internally in map reduce.
- Involved in working with Impala for data retrieval process.
- Sentiment Analysis on reviews of the products on the client's website.
- Developed custom Map-Reduce programs to extract the required data from the logs.
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
- Involved in Full Life Cycle Development in Distributed Environment using Java and J2EE Framework.
- Designed the application by implementing Struts Framework based on MVC Architecture.
- Implemented the Web Service client for the login authentication, credit reports and applicant information Apache Axis 2 Web Service.
- Developed framework for data processing using Design patterns, Java, XML.
- Used the lightweight container of the Spring Framework to provide architectural flexibility for Inversion of Controller (IOC).
- Used Hibernate ORM framework with Spring framework for data persistence and transaction management.
- Designed and developed Session beans to implement the Business logic.
- Developed EJB components that are deployed on Web Logic Application Server.
- Written unit tests using JUnit Framework and Logging is done using Log4J Framework.
- Designed and developed various configuration files for Hibernate mappings.
- Designed and documented REST/HTTP APIs, including JSON data formats and API versioning strategy.
- Developed Web Services for sending and getting data from different applications using SOAP messages.
- Actively involved in code reviews and bug fixing.
- Applied CSS (Cascading Style Sheets) for entire site for standardization of the site.