- Over 8+ years of work experience in IT field, involved in all phases of software development lifecycle while working in different projects
- Very strong experience in processing, analyzing large sets of structured, semi - structured and unstructured data and supporting systems application architecture
- Extensive experience in writing Hadoop jobs for data analysis as per the business requirements using Hive and Pig
- Expertise in creating Hive Internal/External Tables/Views using shared Meta store
- Developed custom UDFs in Pig and Hive to extend their core functionality
- Hands on experience in transferring incoming data from various application servers into HDFS, Hive, HBase using Apache Flume
- Have experience of working on Snow - flake and Vertica data warehouse
- Worked extensively on SQOOP to import and export data from RDBMS to HDFS and vice-versa
- Proficient in big data ingestion and streaming tools like Sqoop, Kafka and Spark
- Experience of working on data formats like Avro, Parquet
- Hands on experience in Sequence files, RC files, Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement
- Experience creating real-time data streaming solutions using Apache Spark core, Spark SQL, Kafka, spark streaming and Apache Storm
- Worked on Oozie to manage and schedule the jobs on Hadoop cluster
- Implemented AWS provides a variety of computing and networking services to meet the needs of applications
- Knowledge of developing analytical components using Scala
- Experience in managing and reviewing Hadoop log files
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, hive, HBase, Spark and Sqoop
- Experience in setting up Hive, Pig, HBase, and SQOOP on Ubuntu Operating system
- Strong experience in design and development of relational database concepts with multiple RDBMS databases including Oracle 10g, MySQL, MS SQL Server & PL/SQL
- Proficient in using data visualization tools like Tableau, Raw and MS Excel
- Experience in developing web interfaces using technologies like XML, HTML, DHTML and CSS
- Implemented functions, stored procedures, triggers using PL/SQL
- Good understanding of ETL processes and Data warehousing
- Strong experience in writing UNIX shell scripts
- Working in different projects provided exposure and good understanding of different phases in SDLC
- Deploying the code via Jenkins
- Merging the code to master after validating to BitBucket (similar to GIT)
Hadoop/Big Data:: Hadoop 1x/2x(Yarn), HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Kafka, Spark, Storm, Zookeeper, Scala, Oozie, Ambari, Tez, R
Development Tools: Eclipse, IBM DB2 Command Editor, TOAD, SQL Developer, VM Ware
Programming/Scripting Languages:: Java, C++, Unix Shell Scripting, Python, SQL, Pig Latin, Hive QL
Databases:: Oracle 11g,10g,9i, MySQL, SQL Server 2005,2008, PostgreSQL& DB2
NoSQL Databases: HBase, Cassandra, Mongo DB
Visualization:: Tableau, Raw and MS Excel
Frameworks:: Hibernate, JSF 2.0, Spring
Version Control Tools:: Sub Version (SVN), Concurrent Versions System (CVS) and IBM Rational Clear Case
Methodologies:: Agile/ Scrum, Waterfall
Operating Systems:: Windows, Unix, Linux and Solaris
Confidential, Houston, TX
- Experienced in development using Cloudera distribution system
- As a Hadoop Developer, my responsibility is manage the data pipelines and data lake
- Have experience of working on Snow - flake data warehouse
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
- Designed custom Spark REPL application to handle similar datasets
- Used Hadoop scripts for HDFS (Hadoop File System) data loading and manipulation
- Performed Hive test queries on local sample files and HDFS files
- Used AWS services like EC2 and S3 for small data sets
- Developed the application on Eclipse IDE
- Developed Hive queries to analyze data and generate results
- Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation
- Used Scala to write code for all Spark use cases
- Analyzed user request patterns and implemented various performance optimization measures including implementing partitions and buckets in HiveQL
- Assigned name to each of the columns using case class option in Scala
- Developed multiple Spark Sql jobs for data cleaning
- Created Hive tables and worked on them using Hive QL
- Assisted in loading large sets of data (Structure, Semi Structured, and Unstructured) to HDFS
- Developed Spark SQL to load tables into HDFS to run select queries on top
- Developed analytical component using Scala, Spark and Spark Stream
- Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports
Environment: Linux, Apache Hadoop Framework, HDFS, YARN (MR2.X), HIVE, HBASE, AWS (S3, EMR), Scala, Spark, SQOOP
Confidential, Charlotte, NC
- Importing data using Sqoop into HDFS vice versa.
- Worked on loading and transformation of large sets of structured, semi structured and unstructured data into Hadoop System.
- Responsible to manage data coming from different data sources.
- Load data from various data sources into HDFS using Flume.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Created HIVE tables and provided analytical queries for business user analysis
- Extensive knowledge on PIG scripts using bags and tuples.
- Created tables in HIVE by partitioning and bucketing for granularity and optimization of HIVEQL.
- Involved in identifying job dependencies to design workflow for Oozie and resource management for YARN.
- Capturing data from existing databases that provide SQL interfaces using Sqoop.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Pig, Hive and written Pig and Hive UDFs.
- Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in map way.
Environment: Cloudera, HBase, Java, Hive, Pig, Sqoop, Oozie, Oracle, SVN, Kafka, GitHub, JIRA, Talend.
- Interacted with the Business users to identify the process metrics and various key dimensions and measures and involved in the complete life cycle of the project.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Developed Map Reduce jobs in Java for data cleaning and preprocessing.
- Good knowledge in using Apache NIFI to automate the data movement.
- Used Map Reduce to ingest customer behavioral data and financial histories into HDFS.
- Used Pig as ETL tool for transforming and pre-aggregations before storing data into HDFS.
- Responsible for defining the data flow within Hadoop eco system and direct the team in implement them and exported the result set from Hive to MySQL using Shell scripts.
- Handled importing of data from various data sources, performed transformations.
- Involved in creating tables, partitioning, bucketing of table.
- Configured Flume agents on different data sources to capture the streaming log data.
- Implemented usage of Amazon EMR for processing Big Data across Hadoop cluster in virtual servers in EC2 and S3.
- Experience with different data formats like Avro, Parquet, ORC and compressions like Snappy and Z-zip.
- Implemented POC in persisting click stream data with Apache Kafka.
- Optimized existing algorithms in Hadoop using Spark SQL.
- Troubleshooting and solving migration issues and production issues.
Environment: Hadoop, HDFS, Hive, Sqoop, Java, Spark, AWS, Horton works, Kafka, Cassandra, UNIX, Tableau.
- Extensively involved in different stages of Agile Development Cycle including Detailed Analysis, Design, Develop and Test.
- Implemented the Back-End Business Logic using Core Java technologies including Collections, Generics, Exception Handling, Java Reflection and Java I/O.
- Wrote and specified Spring Annotation Configuration to define Beans and View Resolutions to configure Spring beans, dependencies and the services needed by beans.
- Used Spring IC to implement dynamic dependency injection and Spring AOP to implement crosscutting concerns such as transaction management.
- Wrote Mapping Configuration files to implement ORM Mappings in the Persistence Layer.
- Using Hibernate DAO support extended Dao Implementation.
- Hibernate Configuration files were written to connect Oracle database and fetch data.
- The Hibernate Query Cache was implemented using EhCache to improve the performance.
- Implemented web services with RESTful standards with the support of JAX-RS APIs.
- Confirmation of registration and monthly statements are sent to users by integrating and implementing JavaMail API.
- Manipulated database data with SQL queries, including setting up stored procedures and triggers.
- Utilized Node.js and MongoDB to generate tendency charts of the application for Payment History.
- Performed JUnit test cases to test the service layers of the application.
- Used JIRA to track the projects and GIT to ensure version control.