- 8 + years of work experience in IT Industry in Analysis, Design, Development and Maintenance of various software applications mainly in Hadoop - Cloudera, Horton Works, Oracle Business Intelligence, Business Objects, Oracle (SQL, PL/SQL), Data Base Administrator in UNIX and Windows environments in industry verticals like Banking, Financial, Pharmacies, Financial Assets, Fixed Income, Equities, Telecom & Health Insurance.
- 8 + years of experience in Hadoop 1.x/2.x, HDFS, HBase, Spark, Sqoop 2.x, Scala, Hive 0.7.1, Kafka; Flume; Java 1.6, Linux, Eclipse Juno, security - Kerberos, Impala, XML, JSON, Maven, SVN, NiFi, SaaS, Amazon Redshift & AZURE
- Worked with data in multiple file formats including Avro, ORC and Text/ CSV. Develop Spark code using Scala and Spark-SQL for faster testing and data processing.
- Experienced in Extraction, Transformation, and Loading (ETL) processes based on business need using Falcon and Oozie workflows to execute multiple Java, Hive, Shell and SSH actions.
- Good understanding of NoSQL databases like HBase and Cassandra.
- Hands on experience in Stream processing frameworks such as Storm, Spark Streaming.
- Solid understanding and extensive experience in working with different databases such as Oracle, SQL Server, MySQL and writing Stored Procedures, Functions, Joins and Triggers for different Data Models.
- Excellent Java development skills using J2EE, Servlets, Junit and familiar with popular frameworks such as Spring, MVC and AJAX.
- Extensive experience in PL/SQL, developing stored procedures with optimization techniques.
- Experienced with distributed message brokers (such as Kafka).
- Excellent team player, with pleasant disposition and ability to lead a team and a proven track record.
- Solid experience in Agile Methodology - stories, sprints, Kanban, Scrum & tasks
Big Data Technology: Apache Hadoop; Hadoop Clusters, Hadoop Common, Hadoop Distributed File System; Replication; Cloudera Cluster; Hadoop Pig; Map Reduce; Cassandra, no sql, MongoDB, Scala, Kafka, Strom, Strom Streaming, Flume, NiFi, SaaS, Mongoose, Tableau, Predixion Insight, Informatica, Relational, hierarchical and graph databases, distributed data file systems, data federation and query optimization
RDBMS: Oracle 11g/10g, DB2 8.0/7.0 & MS-Server 2005
Data Modeling: Dimensional Data Modeling, Star Join Schema Modeling, Snow Flake Modeling, FACT and Dimensions Tables, Physical and Logical Data Modeling, Erwin 3.5.2/3.x & Toad
Programming: UNIX Shell Scripting, SQL, PL/SQL, VB & C.
Operating Systems: Windows 2000, UNIX AIX.
- Involved in requirement analysis, design, coding and implementation. Sqooping data from various sources like Teradata, Oracle and SQL Server. Working on processing, analyzing fixed length EBCDIC, delimited EBCDIC, delimited ASCII, Avro & Parquet file formats. Worked on logging framework to log application level data into Hadoop for future analysis. Responsible for data integrity checks, duplicate checks to make sure the data is not corrupted. Writing MapReduce, Spark, Hive & Pig for processing & analyzing data. Developing python scripts for auto generation of HQL queries to reduce manual effort. Writing UDF's & UDA's for extended functionality in Pig & Hive. Working on data quality check framework for reporting to business. Scheduling workflows though oozie & AutoSys Schedulers. Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala. Experience with NoSQL databases like HBase, and Cassandra as well as other ecosystems like Zookeeper, Oozie, Impala, Storm, AWS Redshift etc. Installed Hadoop, Map Reduce, HDFS, AWS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing. Worked with scrum teams to achieve enhanced levels of Agile transformation Implemented Spark using Scala and Spark SQL for faster testing and processing of data. Real time streaming the data using Spark with Kafka. Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Built real time pipeline for streaming data using Kafka and Spark Streaming. Executing parameterized Pig, Hive, impala, and UNIX batches in Production. Big Data management in Hive and Impala (Table, Partitioning, ETL, etc.). Created the BTEQ and MLOAD scripts to load the data from Hadoop to Teradata Target system by making use of Sqoop utility. Written python scripts for internal testing which pushes the data reading form a file into Kafka queue which in turn is consumed by the Storm application. Worked on Kafka, Kafka-Mirroring to ensure that the data is replicated without any loss. Developed Kafka producer and consumers, Spark and Hadoop MapReduce jobs. Installed and configured Hive and written Hive UDFs. Use Impala to determine statistical information about Operational Data.
Environment: CDH 5.4.5, Map-Reduce, Spark, AVRO, Parquet, Hive, Java (jdk1.7), Python, Teradata, Sql Server, Oozie, Autosys.
- Understanding and analyzing business requirements of the Application and getting the clarification by finding gaps in the Requirement.
- Help and support in solving queries and provide solutions for defects using Java/J2EE technology as required.
- Analyzing the pre / post - production issues to provide optimal fix.
- Maintaining technical documentation for software and systems.
- Create, validate, and maintain scripts to load data into databases. Create, validate and maintain scripts to load data using Sqoop
- Create Oozie workflows and coordinators to automate Sqoop jobs weekly and monthly.
- Develop scripts to encrypt sensitive data in databases.
- Develop, validate and maintain HIVEQL queries.
- Using Impala for running the legacy system queries.
- Written MapReduce programs to validate the data
- Schema design on HBase and cleaning data
- Written HIVE queries for analytics on user's data.
- Writing Spark Data Frame application to read from HDFS and analyze records
Environment: s: Cloudera distribution, Hadoop Stack (HIVE, PIG, HCatlog, Impala, Sqoop, Oozie), Spark, Java JDK 1.5(legacy system compilation), 1.6, 1.7, 1.8, Hibernate, Eclipse Kepler, Web Logic 10.0, SQL Server 2010, GIT, Windows 7.
- Responsible for building scalable distributed data solutions using Hadoop.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Setup and benchmarked Hadoop clusters for internal use.
- Developed Simple to complex Map/reduce Jobs using Java programming language that are implemented using Hive and Pig.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior. Used UDF's to implement business logic in Hadoop.
- Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
- Experienced on loading and transforming of large sets of structured and semi structured data.
- Managing and Reviewing Hadoop Log Files, deploy and Maintaining Hadoop Cluster.
- Involved in implementation of JBoss Fuse ESB 6.1
- Consumed REST based web services.
Environment: Hadoop, Hive, Impala, Java, J2ee, Rest Services, MapReduce, JBoss Fuse ESB 6.1.
- Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC)
- Reviewed the functional, design, source code and test specifications
- Involved in developing the complete front - end development using Java Script and CSS
- Author for Functional, Design and Test Specifications.
- Developed web components using JSP, Servlets and JDBC
- Designed tables and indexes
- Designed, Implemented, Tested and Deployed Enterprise Java Beans both Session and Entity using WebLogic as Application Server
- Developed stored procedures, packages and database triggers to enforce data integrity. Performed data analysis and created crystal reports for user requirements
- Implemented Backend, Configuration DAO, XML generation modules of DIS
- Analyzed, designed and developed the component
- Used JDBC for database access
- Used Spring Framework for developing the application and used JDBC to map to Oracle database.
- Used Data Transfer Object (DTO) design patterns
- Unit testing and rigorous integration testing of the whole application
- Written and executed the Test Scripts using JUNIT