- Strong IT experience in BigData Hadoop stack, Teradata, SQL Server, Informatica, Unix, Java and Erwin.
- 5+ years' relevant experience in Big Data Enterprise Applications development, building scalable and high performance Big Data Analytical Systems with specialization in Hadoop Platform.
- Played a pivotal role in building centralized Enterprise Data Hub using Hadoop Platform that can cater to all the data analytical needs of an Enterprise.
- Good experience in architecting real time streaming applications and batch style large scale distributed computing applications using tools like Spark, Sqoop, Flume, Map - Reduce, and Hive etc.
- Good experience in creating complex data ingestion pipelines, data transformations, data management and data governance, real time streaming engines Confidential an Enterprise level.
- Good Experience in developing and implementing big data solutions and data mining applications on Hadoop using Hive, Pig, Hbase, Cassandra, Hue, Oozie workflows and designing and implementing Java Map Reduce programs.
- Extensive hands on experience in writing complex Map Reduce jobs, Pig Scripts and Hive data modeling.
- Have hands on experience in writing Map Reduce jobs using Java-Ant/Maven.
- Well versed in installing, configuring, supporting, managing and fine-tuning Peta Byte scale Hadoop Clusters.
- Design & developed the various MOLAP, ROLAP and HOLAP cubes using the SQL Server Analysis Services.
- Possesses extensive experience using Microsoft SQL Server Reporting Service (SSRS) with report authoring, report management, report delivery and report security and Installed, configured and maintained SSRS server. Created Parameterized, Linked, Drilldown, and Drill through reports and ad-hoc reporting in SSRS.
- Experience in writing reports using MDX (Multi-Dimensional Query) from dimensions, hierarchies and measures with design and implementation of Star Schema and Snowflake Schema Dimensional model. Created ROLAP, MOLAP, and HOLAP Cubes using SQL Server Analysis Services (SSAS).
- Experience in Analysis and Design, Performance Tuning, Query Optimization, Stored procedures, functions, packages, triggers, views and indexes to implement the business logics of database in Teradata, Sql Server and loading data warehouse tables like dimensional, fact and aggregate tables using SSIS, Teradata Utilities.
- Familiar in Creating Soft Referential Integrity, Hard Referential Integrity, Secondary indexes, Join indexes, Macro's and Aggregate join indexes in Teradata.
- Experience in Relational Data Modeling and Dimensional Data Modeling, Star Schema Modeling, Physical and Logical Data Modeling, Erwin 7/7.3.
- Experience in using Teradata Administrator, Teradata Manager, Teradata PMON, Teradata SQL Assistant and writing Teradata load/export scripts like BTEQ, Fast Load, Multi Load, TPUMP, TPT and Fast Export in UNIX/Windows environments.
- Having knowledge on regression, TDD testing techniques to build real-time forecasting results based on billion rows of transaction/Customer/Store records for Confidential sales.
- Supporting adhoc analysis requests from various business units with deadline. Conduct ETL using Talend on various RDBMS platforms and quantitative/hypothesis analysis delivering solutions for ads/promotion decisions.
Sr. BigData Architect
Confidential, Atlanta, GA
- Worked closely with business to focus on "value added" features and added functionality Confidential regular and frequent intervals to encourage more usage and reduce development costs.
- Always try to build good relationship within the team, worked closely with business on multiple feeds, and delivered products successfully on time.
- Developed scripts to convert Citi Feed Text format Hive Tables to SNAPPY-ORC File format compression technique in Production environment, which reduced disk space and minimize cost in Hadoop ecosystem.
- I was involved in design and development, decommissioning existing legacy systems/applications such as Epiphany retirement and migrating the same feeds to Hadoop ecosystem and saved the cost and manual intervention in terms of day-to-day operations. This measure improved scalability and reliability in cost effective manner.
- As a part of Epiphany feeds (legacy system) retirement process, delivered and migrated Foresee, Acxiom Preferences & Email Ids Stamping feeds from Sql Server to Hadoop ecosystem and saved cost to the company.
- Involved in the development of Confidential Store applications: ECC, CCA, CGR and SVOC using Java, Spring boot, Spring STS, Maven and Jenkins.
- Developed REST API's and Micro services using Java Spring boot.
- Modeled the Epiphany -Foresee & Dynamic Campaigns data in an optimized way for easy and fast accessibility.
- Upgraded and repackaged CRM 14-Hadoop applications from Java6/Tomcat6 to Java7/Tomcat7 and deployed successfully onto Grid Stats & Hadoop Production Bastion Servers.
- Hadoop CGR changes, SVOC changes, ECC and CCA changes to process in Hadoop operational and marketing datasets.
- Delivered the Quality Project deliverables without any defects using TDD techniques onto Hadoop Production Environment on time.
- Developed Test driven development test cases in Java for full fledge functionality testing and ensure 100% quality product with no defects before product/application go live into production.
- Involved in Pair Programming, Extreme Programming, opted TDD techniques and delivered applications successfully.
Technologies Used: Java 1.7/Tomcat7, Maven, Java Spring Boot, Shell Scripting, Talend, Hive, Jenkins, Grid Stats, Eclipse/Spring STS, JUnit, Splunk, Google Cloud-Big Query, Cassandra, Oracle, Spark Scala, Postman, SVN, Git and Maestro-TWS.
Sr. Bigdata Engineer
- As part of Big Data Center of Excellence (COE) responsible for creating technical guidance, road map and strategies in delivering various big data solutions throughout the Organization.
- Worked on data migration from existing data sources to Hadoop file system.
- Developed MES Ingestion framework Unix scripts for pushing daily source files into HDFS & Hive tables.
- Understand customer business use cases and be able to translate them to analytical data applications and models to implement a solution.
- Created custom Database Encryption & Decryption UDF that could be plugged in while ingesting data to External Hive Tables for maintaining security Confidential table or column level.
- Worked on different applications such as MSP, CAPM, TELEGENCE, OPUS, COLUMBUS & Click Stream in Confidential &T.
- Developed map-reduce programs for different patterns of data on Hadoop cluster.
- Created data ingestion plans for loading the data from external sources using Sqoop, Data Router and Flume.
- Implemented dynamic partitions, bucketing and compression techniques in Hive External Tables and optimized worst performing hive queries.
- Created working POC is using Spark 1.1.0 Streaming for real time stream processing of continuous stream of large data sets.
- Developed ETL Scripts for Data acquisition and Transformation using Talend.
- Troubleshoot issues during integration, testing & production readiness phases.
- Wrote technical design document, deployment document, supporting documents and release notes.
- Involved in the entire software development cycle spanning requirements gathering, analysis, design, development, building, testing, and deployment.
- Developed Business Rules using Java and Unix Scripts.
Technologies Used: Hadoop, Pig, Hive, Jenkins, Grid Stats, Eclipse/Spring STS, JUnit, Splunk, Google Cloud-Big Query, Cassandra, Oracle, Spark Scala, Postman, SVN, Git and Maestro-TWS.
Bigdata Hadoop Developer
Confidential, Raleigh, NC
- Gathering requirements, builds logical models and provides quality documentation of detailed user requirements for the design and development of this project.
- Good experience is designing and implementing end to end Data Security and Governance within Hadoop Platform using LDAP/Kerberos etc.,
- Configured Hadoop in Linux and deploying application.
- Developed the scripts as per the requirement including Java Map Reduce Programs, Pig/Hive Scripts and Sqoop etc.
- Extracts data from Oracle to HDFS using Sqoop. Built Rules for different Product lines in Hive UDF for processing covered and uncovered product lines.
- Compiled and built the application using ANT scripts and deployed the application. Used SVN as version control system.
- Developed Ingestion framework by writing Unix automation shell scripts for pushing daily source files into HDFS.
- Developed Load and Extract ETLs and implemented History Load, Incremental Load scripts in Teradata and writing shell scripts for extract and load ETLs.
- Developed ETL logic as per Confidential standards from Source-Flat File, Flat-File-Stage, Stage-Work, Work-Work Interim tables and Work Interim tables- Target Tables using Bigdata.
- Fixing the issues from Source System to downstream data mart during the development process until the code goes live into production and providing post production support.
Technologies Used: ETL Bigdata, Hadoop, Pig, Hive, JUnit, Splunk, Google Cloud-Big Query, Cassandra, Oracle, Spark Scala.
- Communicating with business users and analysts on business requirements. Gathering and documenting the technical and business Meta data.
- Prepared ETL Scripts for Data acquisition and Transformation. Developed the various mappings using transformation like source qualifier, joiner, filter, router, Expression and lookup transformations etc. in Informatica.
- Creating conceptual, logical and physical database models for different metadata tables, views or related database structures using ERWIN.
- Developing database architectural designs, modeling, and implementation of business requirements.
- Coding using BTEQ SQL of TERADATA, Implementing ETL logic using Informatica, transferring files using SSH-Client.
- Populate or refresh Teradata tables using Fast load, Multi load & fast export utilities/scripts for user Acceptance testing and loading history data into Teradata.
- Experience in creating and writing Unix Shell Scripts (Korn Shell Scripting - KSH). Preparing test cases and performing Unit Testing and integration testing.
- Performance tuning the long running queries. Work on complex queries to map the data as per the requirements.
- Reduced Teradata space used by optimizing tables - adding compression where appropriate and ensuring optimum column definitions.
- Production Implementation and Post Production Support.
Technologies Used: Teradata, Informatica, SQL, UNIX.