- Around 10 years of IT experience in Data Analysis, Data Modeling, Design, Development,Testing,Implementation of Data Warehousing applications & Big Data processing for Mortgage, Financial, Insurance, Health and Asset Management.
- Experience in designing, building and implementing complete Hadoop ecosystem comprising of HDFS, Hive, Pig, Sqoop, Oozie, HBase, Kafka, Python & Spark.
- Experience in developing Parallel Jobs for Extraction, Transformation, and Loading (ETL) data from various sources into Data Warehouse using IBM InfoSphere Data Stage V 8.1, 9.1, 11.3.
- Experience in IBM InfoSphere Information Anlayzer & Collibra Software.
- Experience in DB2, Oracle & Teradata databases.
- Extensive experience in loading and analyzing large datasets with Hadoop framework (HDFS, PIG, HIVE, Flume, Sqoop).
- Exposure on usage of Apache Kafka to develop data pipeline of logs as a stream of messages using producers and consumers.
- Worked on Apache Kafka with Apache Spark for data processing.
- Extensively used python in spark for Data quality checks.
- Have working experience with Spark RDD, Data Frame, Datasets and Python.
- Experience in working with various structured and semi - structured data formats like delimited, fixed width, Avro, Parquet, XML and JSON files.
- In-depth knowledge of data profiling, data quality analysis, data standardization, data cleansing, data management and data management rules development.
- Good Experience in developing, testing and implementing Spark jobs, Data Stage jobs & Mainframe jobs.
- Designed and developed Data Marts by following Star Schema and Snowflake Schema Methodology, using industry leading Data Modeling tools like ERWin.
- Good Knowledge in data warehouse concepts.
- Strong knowledge in troubleshooting of DataStage jobs and addressing production issues like performance tuning and enhancement.
- Good knowledge in SQL, UNIX and Scripting.
- Good Knowledge in core Java.
- Extensive knowledge in handling slowly changing dimensions.
- Experience in optimization techniques, deployment, scheduling and handling production data.
- Experience dealing with Business Analysts, BI Architects, Data Analysts and business partners to identify information if business requirements are being met with the solutions being developed for reporting purposes.
Hadoop Ecosystem: HDFS, HIVE, HBASE, PySpark, Sqoop, Kafka, Flume, Oozie
IBM Suite: IBM InfoSphere Data Stage V 8.1, 9.1, 11.3IBM InfoSphere Information Anlayzer & Collibra Software.
Database: Oracle 11g, DB2 and Teradata.
Scheduling Tool: Control-M, ESP and CA7.
Technology: Mainframe (JCL, COBOL, DB2, CICS, REXX, Easytrieve, Xpeditor, JHS )Version Control Tool Endeavour, RTC
Tools: SQL Developer, Quality centre & Erwin.
Languages: C, Core Java, Python.
Scripting: UNIX shell scripting, Perl scripting
Operating Systems: Windows 2003 Server and UNIX, Linux
Data Modeling: Dimensional Data Modeling, Star Schema Modeling, Snow - Flake Modeling, FACT and Dimensions Tables.
Confidential, Arlington Heights, Illinois
- Converting Datastage & Info analazer jobs into Pyspark jobs.
- Ingested data into HDFS using Sqoop and scheduled an incremental load to HDFS.
- Involved in all phases of data mining, data collection, data cleaning, validation and visualization using python & qlik.
- Designed a data warehouse using Hive external tables and created partitioned tables & queries for analysis.
- Used Sqoop to import data into HDFS and Hive from Oracle database.
- Transactional systems data consumed from Kafka and processed using spark streaming and stored in HDFS.
- Created Data Pipelines as per the business requirements and scheduled it using Oozie work flow. working with various structured and semi-structured data formats like delimited, fixed width, Parquet, XML and JSON files.
Environment: HDFS, HIVE, Sqoop, Kafka, Python, Spark, Oozie, Linux, Data Stage px9.1, 11.3 (designer, Director)Db2, Oracle, Teradata, Windows 7, SQL developer, Information Analyser, qlik view, collabera.
Confidential, Arlington Heights, Illinois
Data stage Technical Lead
- Responsible for the planning & coordinating team.
- Responsible for de-duplication, cleansing, enrichment and enhancement of customer data.
- Responsible for defining rule definitions, rule bindings, column analysis & executing rules in Info Anlayzer.
- Responsible for data ingestion, extraction, transformation, and load (ETL) processes for customer data.
- Review and provide approval for all live service impacting changes.
- Customized UNIX scripts as required for pre-processing steps and to validate input and output data elements
- Resolving data issues Conducting daily& weekly status meetings, conducting internal and external reviews.
Confidential, Rochester, MN
Senior Data Stage developer & Production Support
- Requirements Gathering and preparation of Technical Specifications, Test Cases and Design documents.
- Created DataStage jobs to extract data from sequential files, and DB2, Tuned DataStage transformations and jobs to enhance their performance.
- Extensively used processing stages (Join, Merge, Funnel, Filter, Aggregator, Sort, FTP, and Transformer), file stages (Lookup, File Set, Data Set and Sequential File)