- Around 6 Years of Experience in Data analyst, Data Profiling and Reports development by using Talend Studio / Big Data Integration / Enterprise version, Tableau, Oracle SQL, Sql Server,Amazon S3, Redshift and Hadoop Eco systems such as Hive, Hue, Spark SQL, Sqoop, Impala and epic data sources.
- Experience working in various industry sectors such as Core Banking, Retail Business, Tele communication/ Call Center Domain and Health Care Domain.
- Proficiency in writing complex Spark jobs/programs to create the data frame for analyzing the disparate datasets in HDFS file system.
- Working knowledge and practice working in Agile development environment.
- Proficiency on loading the Financial services Data Mart known as FSDM, and Data Lake Designs in Hadoop by using Sqoop.
- Good Hands on developing Talend DI Jobs to transfer the data from Source views to Hadoop Staging, Target Layers to perform the Fraud identification survey on the transactions.
- Strong in transferring the data from relational data base to Cloud such as Amazon S3 and Redshift by using Talend Big data Spark Jobs
- Extensively used the External tables and Manage tables in Hive environment at the time of transforming the data from multiple source system to HDFS.
- Extending Hive and Pig core functionality by writing custom UDFs, UDTF and UDAFs on handling the data based on the business requirements.
- Experienced in using Hadoop eco system Flume that helps to gather the real - time data from multiple source system such as Web Servers and Social Media and loaded the data in HDFS for further business analysis.
- Hands on Hadoop eco system YARN to performing configuring and maintaining job Schedulers.
- Hands on experience working on NoSQL databases like Hive, PostgreSQL and Casandra.
- Expertise in processing Semi-structured data such as (XML, JSON and CSV) in Hive/Impala by using Talend ETL Tool.
- Create and maintained Talend Job Run Book to trigger the Hive Data Transfers Jobs in HDFS thorough CA Scheduler.
- Developed POC projects by writing SQL scripts and queries to extract data from various data sources into the BI tool, Visualization tool, and Excel reports.
- Built and expanded the reports availability for both internal support and external customer/carrier reports, which included increasing the scope of data extracted and utilized for reporting purposes.
- Experienced in performing Unit Testing and hands on complex SQL queries as per the Business requirements in Hive, to validating the data loaded by Hadoop Sqoop, ETL tools such as Talend.
- Excellent communication - ability to tell a story through data.
Hadoop: Hadoop 2.2, HDFS, Spark 1.6, Pig 0.8, Hive0.13, Sqoop 1.4.4, Zookeeper 3.4.5, Yarn, Impala, HBase, Talend Big Data Integration.
Hadoop management & Security: Hortonworks Ambari, Cloudera Manager, Mapr 5.1
Server Side Scripting: Python, UNIX Shell Scripting
Reporting Tools: Tableau Suite of Tools 10.0, 9.2, 9.0, 8.2, 8.1, 8.0, 7, 6, Jasper, SSRS.
Data Modeling: Dimensional Data Modeling, Data Modeling, Star Join Schema Modeling,Snowflake Modeling, FACT and Dimensions Tables, Physical and Logical Data Modeling, FSDM, Data Lake.
Databases: Oracle, MS SQL Server-2005, 2008, Teradata, Casandra, No-Sql, MongoDB
Tools: Toad, SQL assistant, SQL Navigator, SQL Developer, MS - Visio, HP Quality Centre 12
Methodologies: Agile, UML, SDLC
Confidential, St Louis MO
- Responsible for gathering data from multiple sources systems like SQL Server, No Sql, My Sql,.
- Hands on developing the Technical specification artifact with data load processing to HBASE and Redshift Data base.
- Created Talend Spark jobs, which collects data from regular relational database, and load the data in to Hbase.
- Hands on GIT usage, which helps to track the code changes and used at the time of code deploying.
- Performed Production deployment by using Talend Administrator Center and scheduled the Jobs using NEXUS server.
- Responsible for Source data profiling such as cleansing and validating with the defined business rules from BRD.
- Use tjava, tjavarow and tjavaflex for java costume code.
- Loading the data into cloud by finding, the right join conditions and create datasets conducive to data analysis in the staging layer.
- Good understanding on the charter domain and project
- Documentations and Presentations on Design, Testing, deployment etc.,
- Involved in Planning, Data Modeling and design DW ETL process
- Identify Potential Opportunities for implementations in DW by accessing RDBMS and BigData environments
- Developed the ETL mappings using XML, .csv, .txt, Json sources and also loading the data from these sources into relational tables with Talend, big data
- Design, develop and deploy end-to-end Data Integration solution.
- File management using Linux, hdfs Commands and security enforcement
- Implemented CDC feature, complex transformation and mappings etc
- Managing the jobs in Tidal, Deployment plans, Monitor the Jobs, Troubleshooting etc.,
- Implementing Data Integration process with Talend BigData Integration Suite 6.4
- Ensure Data Quality, Data Cleanliness, Data Analysis, Best practices, Performance optimization etc.,
- Design, Develop and deploy jobs to load the data into Hadoop and Hive
- Written and executed Test scenarios
- Developed ETL mappings, transformations and implementing source and target definitions in Talend.
- Created the views in the source system, from there by using Talend to transfer the data into Amazon Private Cloud.
- Created reusable Talend Outlet with Global context variables to improve the data loading performance, and user parallelized concepts as well for better performance.
- Hands on utilizing runtime Global/Context Variables for parameterize job to run in multiple instances.
Environment: Talend, Spark, Hbase, SQL Server, NO SQL, Hive, HDFS, Teradata, Kafka.Confidential, Cypress long beach, CA
- Responsible for gathering data from multiple sources systems like Teradata, Oracle, and SQL Server using Sqoop.
- Design the Technical specification artifact with the required data processing plan to Hive tables.
- Finding the right join conditions and create datasets conducive to data analysis in the staging layer.
- Created the views in the source system, from there by using Talend to transfer the data into Staging Layer.
- Responsible for performing Data profiling on the selected source systems, and apply the transformations based on the requirement.
- Loaded the Qualified data in Hive Tables by using UNIX Time Key Partition in Hive Tables.
- Implemented Bucket concept with in Hive to improve the querying performance on the large data sets.
- Responsible for developing custom UDFs, UDAFs and UDTFs in Pig and Hive as per the business requirement.
- Wrote Hive queries for data analysis to meet the business requirements.
- Mimic the entire source systems schema in Hadoop by using Hive tables and worked on them using Hive SQL.
- Created RC formatted Files in HDFS layer, and created External tables to access the data.
- Worked on Avro and Parquet file formats.
- Created Spark jobs for creating the Data Frames, and created views to perform the left join by using PySpark scripts.
- Used PySpark to work on immutable data types like map/flat map/fold and transform into various data structures.
- Generated reports and created graphs using Apache Zeppelin.
- Mentored analyst and test team in writing Hive Queries.
- Worked collaboratively with all levels of business stakeholders to implement and test Big Data based analytical solution from disparate sources.
- Utilized Agile Scrum Methodology to help manage and organize a team of 5 developers with regular code review sessions.
Environment: Hadoop, HDFS, Spark, Sqoop, Hive, YARN, Pig, Python, Talend Run Book/ Job SchedulerConfidential
- Worked with the Teradata analysis team to gather the business requirements.
- Worked on importing data to Hadoop system using Sqoop.
- Created partitioned tables in Hive for best performance and faster querying.
- Transported data to HBase using Flume.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Involved in source system analysis, data analysis, data modeling to ETL (Extract, Transform and Load).
- Written Spark programs to model data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
- Developed Pig Latin scripts to extract data from web server output files to load into HDFS.
- Developed Pig UDFs to pre-process data for analysis.
- Developed Hive queries for analysts.
- Create/Modify shell scripts for scheduling various data cleansing scripts and ETL loading process.
- Prepared developer (unit) test cases and executed developer testing.
- Support and assist QA Engineers in understanding, testing and troubleshooting.
- Production Rollout Support which includes monitoring the solution post go-live and resolving any issues that are discovered by the client and client services teams.
- Designed, documented operational problems by following standards and procedures using a software reporting tool JIRA.
- Experience with professional software engineering practices and best practices for the full software development life cycle including coding standards, code reviews, source control management and build processes.
Environment: Hadoop, HDFS, Spark, Sqoop, Hive, Pig, Oozie, HBase.Confidential, Chicago IL
- Developed multiple Spark jobs in PySpark for data cleaning and preprocessing.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Involved in creating Hive tables, and loading and analyzing data using Hive queries.
- Developed simple/complex MapReduce jobs using Hive and Pig.
- Loaded and transformed large sets of structured, semi structured and unstructured data.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Worked with application teams to install Operating Systems, Hadoop updates, patches, and version upgrades as required.
- Involved in loading data from Linux file system to HDFS.
- Responsible for managing data from multiple data sources.
- Experienced in running Hadoop streaming jobs to process terabytes of XML format data.
- Assisted in exporting analyzed data to relational databases using Sqoop
- Created and maintained Technical documentation for launching Hadoop clusters and for executing Hive queries and Pig scripts
Environment: Informatica Power Point, Oracle 10g, Toad for Data analyst, MS Excel and Auto Recon.Confidential
- Gather and analyze the business requirements and then translate them to technical specifications.
- Understanding the Business requirements and map these to the Solution design.
- Strong SQL Query writing skills in Oracle 10.
- Analyzed and used various data sources - Oracle, Excel, and Web Services.
- Involved in preparation of test data to cover the end-end scenarios as per the business requirements.
- Involved in preparing functional Documents on the current process.
- Prepared complex queries and running SQL queries to verify tables at source, stage & target Database.
- Test case prepared for End to End, i.e. starting from Source DB to Target Reporting.
- Prepare defect analysis documents and Share the knowledge with the team.
- Involved in Execution of test cases and capture the success and failure analysis for tracking purpose.
- Knowledge on UNIX commands & Shell scripting.
- Involved in Regression testing for the existing functionality.
- Involved in Equivalence, Reconciliation Testing by using Informatica tool
Environment: Oracle 11i, Informatica 8.x, SQL, SQL*Plus, Windows 2003/ XP, UNIX.Confidential
- Developed Java model classes for all the metadata tables.
- Developed DB factory to connect the application to DB and fetch the required metadata table
- Developed controller classes to read the table and display it in UI through developed jsp
- Developed controller classes to upload and download the metadata table's inform of excel
- We followed spring MVC to develop our application.
Environment: SQL, Oracle 10g, Crystal Reports.