- Over 8 years of experience in application development and design using Hadoop echo system tools and Java /J2EE Technologies.
- Developed and built frameworks that integrate big data and advanced analytics to make business decisions.
- Extensive experience in installing, configuring and using eco system components like Hadoop Map reduce , HDFS, Hive, Pig, Flume, Sqoop and Spark.
- Preprocessed and cleansed big data for better analysis.
- Certified Cloudera Spark and Hadoop Developer
- Experience in Cloudera distributions (CDH) and Hortonworks Data Platform (HDP)
- Created various use cases using massive public big data sets. Ran various performance tests for verifying the efficacy of Map Reduce, PIG and HIVE
- Migrated to Azure cloud and created end - to-end architecture for running in Cloud.
- Have experience on ADF, ADLS, Blob Storage, HD Insights, Ranger, S3, IR, IoTHub, Stream Analytics, etc.
- Good knowledge of Amazon Web Services (AWS) components like EC2, EMR, S3, CloudWatch etc.
- Strong coding and debugging skills in Java Platform
- Experience in shipping enterprise products, web/mobile UI applications to a large customer base
- Experienced in Full Life Cycle development of software products
- Good at Servlets, JSPs and MVC framework
- Have excellent analytical and problem-solving skills and ability to learn new technologies quickly
Learning: Can rapidly adapt to new environments and designs.
Apache Hadoop: HDFS, Hive, Pig, MapReduce, Flume, Sqoop and Spark
Cloud: HDInsight, ADLS, ADF, S3, EMR, EC2, NACL, Security groups
Programming Language & Scripts: Java, J2EE, UNIX, Java Script, SQL, UML, XML, CSS, JSON
Enterprise Java: JSP, Servlets, JSF, EJB, JMS, Socket Programming, Java Beans
Software Design: Design Patterns, Data Structures, Object Oriented design
Tools & Framework: TIBCO Composite, JSF, Spring, Web Services, Selenium, JUnit, Maven, Ant
Web Servers: Weblogic, Web Sphere, Tomcat, Oracle OC4J, Oracle Weblogic Server
IDE: Eclipse, Visual Studio, XCode, GIT
Confidential, Long Beach, CA
Big Data Developer
- Worked on a live 30 node (Prod) and 6 node (UAT) big data production cluster CDH 5.13.3
- Developed and maintained the complex Claims Semantic Pipeline for weekly full load and incremental loads
- Weekly full load of claims is validated against Netezza and verified for any discrepancies
- Resolved the state issue (Universal and Medicare state) in the data set for reference for all the pipelines
- Developed the aggregated datasets and lookup columns from Claims dataset and all reference tables
- Integrated SIU pipeline into the existing Claims pipeline and retired the SIU pipeline
- Used windowing techniques and UDFs in SparkSQL
- Develop and in corporate the enhancements into the existing claims pipeline
- Monitor and maintain weekly talend job and resolve any failures to meet the SLAs
- Convert existing SQL logic to SparkSQL for Pharmacy pipeline and optimize it
- Improve the performance of Provider datasets and incorporated all the provider data into claims
- Worked with PARQUET file formats using SNAPPY compression to fasten network transfer of big data
- Created Hive tables and views using Impala. Implemented partitioning, bucketing in Hive for better organization of data
- Build Power BI dashboards to validate the data against Netezza
- Currently in the process of automating the check before the start of pipeline to validate the L0 data
- Collaborated with Data Management team on the business requirements and retirement of Netezza
- Follow Agile Scrum methodology in JIRA during project
- Gained very good business knowledge on claim processing
Confidential, Houston, TX
Big Data Developer
- Involved in the complete SDLC of Big data project that includes requirement analysis, design, coding, testing and production.
- Worked on a live 24 nodes and 4 nodes (Test) big data cluster of type Hadoop 3.6 on Linux.
- Experience working on both Non-domain and domain joined clusters.
- Worked with highly unstructured, structured and semi structured data of 30 TB in size (90 TB with replication factor of 3)
- Ingested structured data from TIBCO Composite Data Virtualization tool into ADLS using Sqoop
- Created Shell scripts to automate the Sqoop jobs.
- Developed Ambari workflows for scheduling and orchestrating the ETL process
- Worked with ORC file formats using ZLIB compression to fasten network transfer of big data
- Ingested structured big data from Teradata, Oracle, Netezza, Postgres, SQLServer into ADLS using Azure Data Factory (ADF).
- Created pipelines in ADF to create cluster, ingest, create hive tables, enable daily triggers.
- Involved in converting Hive queries into Spark transformations using Spark Structured API.
- Used PySpark (Python) and Scala for analyzing the data in Non-domain joined Spark 2.3 cluster
- Scripted Python Code to transfer data from Hive tables into Data Science Sandbox using SFTP.
- Very good experience in monitoring and managing the Hadoop cluster using Ambari.
- Created dashboards in Power BI based on the Incident record data to generate metrics and Hive tables using ODBC connection.
- Gained very good business knowledge on oil and gas industry, well pad, weather, mud pressure and exploration analysis.
- Collaborated with Digital Security, Data Scientists, Palantir and Catalog team to ensure data quality and availability.
- Follow Agile Scrum methodology in Visual Studio Team Services during the course of project.
Confidential, Orlando, FL
- Worked on a live 80 nodes Hadoop cluster running CDH5.10
- Worked with structured and semi structured data of 150 TB in size (450 TB with replication factor of 3)
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Developed Hive queries and UDFs to analyze/transform the data in HDFS.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
- Established custom MapReduce programs to analyze data and used Pig Latin to clean unwanted data
- Used Pattern matching algorithms in PIG to recognize the fraudulent customer across different sources and built risk profiles for each customer and stored the result data into HDFS
- Used Oozie to orchestrate the MapReduce jobs and worked with HCatalog to open up access to Hive's Metastore
Software Development Engineer
- Worked on 10 nodes Hadoop Cluster
- Worked on semi structured and structured data of 15TB in size (45TB with replication factor of 3)
- Loaded data from disparate data sets using Sqoop and flume.
- Used sqoop to import/export data between RDBMS and hive tables.
- Imported logs from web servers with Flume to ingest the data into HDFS.
- Created Sqoop jobs with incremental load to populate Hive External tables.
- Have a very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Writing Pig Latin Scripts to perform transformations as per the use case requirement.
- Worked with different file formats and compression techniques.
Environment: Cloudera Enterprise, Hadoop, MapReduce, Pig, Hive, Avro, Sqoop, HBase
Member Technical Staff
- Created functional and design specification documents.
- Analyzed on how to display the data/metrics collected on Enterprise Management (EM) and develop the relevant pages.
- Worked on User-Interface using JSPs and Servlets for the Enterprise Manager framework
- Discover all the Universal Content Management servers installed on the content server and identify their statuses.
- Extracted the configuration details of the server.
- Integrate the targets (SOA, WebLogic, WebCenter) to the EM Tree.
- Create Dynamic Monitoring Services (DMS) messages for the Content Management.
- Add the DMS instrumentation to the Content Server code to extract the metrics and validating and testing them.
- Identified the cached queries, active databases, documents waiting, and number of service requests in the Content Server
- Analyzed the system performance and monitor system status.
- Used Oracle Application Development Framework (ADF) for end-to-end Java-based application development.
- Resolve the issues on the server based on the priority.
Environment: Java, J2EE(Servlets), OOPS concepts, Oracle DB, JDBC
- Prepare Requirement, Functional and Design Specification documents.
- Worked on Oracle JDeveloper, which is a free integrated development environment.
- Dynamic peer discovery has to be done both statically and dynamically (Using SLP and NAPTR)
- Created Realm and Peer routing tables.
- Invoked TCP Connection to send and receive data over it.
- Test each method with JUnit.
- Used EMMA Code Coverage to help improve the coverage of the Project.
- Implemented Failover and Failback procedures
Environment: Java, J2EE (Socket Programming), Design Patterns, Seagull Traffic generator