We provide IT Staff Augmentation Services!

Data Engineer Resume

SUMMARY

  • Around 8+ years of extensive experience in ETL (Extract Transform Load), Data Integration and Data Warehousing using HIVE, Streamset,pyspark,sparkR,Teradata,Alteryx and Informatica technologies.
  • Having good expertise on Hadoop tools like Mapreduce,HIVE and pyspark.
  • Very good understanding of Teradata’s MPP architecture such as shared nothing, Nodes, AMPs, BYNET, Partitioning, Primary Indexes etc.
  • Extensive knowledge in Big Data Eco Systems HDFS, MapReduce, Hive, YARN, Apache Spark, Kafka, Storm, Flume, Oozie, and Zookeeper, Apache Nifi.
  • Extensively used Networking & Protocols TCP/IP, Telnet, HTTP, HTTPS, FTP, SNMP, LDAP, DNS.
  • Proficiency in Cloud technologies like AZURE Data Factory, Azure Data Lake Storage (ADLS), AWS Athena, AWS Glue, AZURE BLOB, AWS S3,AWS IAM,AZURE AD.
  • Extensively created and used various Teradata Set Tables, Multi - Set table, global tables, volatile tables, temp tables.
  • Extensively used different features of Teradata such as BTEQ, Fastload, Multiload, SQL Assistant, View Point, DDL and DML commands. Very good understanding of Teradata UPI and NUPI, secondary indexes and join indexes.
  • Extensive knowledge in Business Intelligence and Data Warehousing Concepts with emphasis on ETL and System Development Life Cycle (SDLC).
  • Working Knowledge of Data warehousing concepts like Star Schema and Snowflake Schema, Data Marts, Kimball Methodology used In Relational, Dimensional and Multidimensional data modelling.
  • Extensive knowledge on Data Profiling using Informatica Developer tool.
  • Implemented Slowly changing dimension types (I, II &III) methodologies for accessing the full history of accounts and transaction information designed and developed change data capture solutions (CDC) for the project, which captures and analyses changes from daily feeds to maintain history tables.
  • Proficiency in design and developing the ETL objects using Informatica Powercenter with various Transformations like Joiner, Aggregate, Expression, SQL, Lookup, Filter, Update Strategy, Stored Procedures, Router, Rank, normalizer transformations etc.
  • Involved in Data Migration projects from DB2 and Oracle to Teradata. Created automated scripts to do the migration using UNIX shell scripting, Oracle/TD SQL, TD Macros and Procedures.
  • Automated the BTEQ report generation using UNIX scheduling tools on weekly and monthly basis. Well versed with understanding of Explain Plans and confidence levels and very good understanding of Database Skew. Knowledge in Query performance tuning using Explain, Collect Statistics, Compression, NUSI and Join Indexes including Join and Sparse Indexes.
  • Extensively worked on PMON/Viewpoint for Teradata to look at performance Monitoring and performance tuning. Well versed with Teradata Analyst Pack including Statistics Wizard, Index Wizard and Visual Explain. Experience in programming with SQL and PL/SQL (Stored Procedures, Functions, Cursors, and Database Triggers).
  • Very good experience in Oracle database application development using Oracle 10g/9i/8i/x, SQL, PL/SQL, SQL Loader.
  • Strong SQL experience in Teradata from developing the ETL with Complex tuned queries including analytical functions and BTEQ scripts.
  • Extensively used Mapping Variables, Mapping Parameters, and Dynamic Parameter Files for improved performance and increased flexibility and also worked with XML Sources & Targets.
  • Data Processing Experience in Designing and Implementing Data Mart applications, mainly Transformation Process using Informatica.
  • Worked with Informatica Data Quality toolkit, Analysis, data cleansing, data matching, data conversion, exception handling, and reporting and monitoring capabilities of IDQ 8.6.1.
  • Developing workflows with Worklets, Event waits, Assignments, Conditional flows, Email and Command Tasks using Workflow Manager.
  • Experienced with identifying Performance bottlenecks and fixing code for Optimization in Informatica and Oracle.
  • Created UNIX shell scripts for Informatica post and pre session operations, database
  • Administration and day-to-day activities like, monitor network connections and database ping utilities.
  • Extensive experience in implementation of Data Cleanup Procedures, Transformation, Scripts, Stored Procedures and execution of Test plans for loading the data successfully into Targets.
  • Maintaining the Visual Source Safe (VSS), Quality matrices, Knowledge management and Defect prevention & analysis.
  • Creating Checklists for Coding, Testing and Release for a smooth, better & error free project flow

TECHNICAL SKILLS

Primary Tools: Teradata SQL, Teradata Tools and Utilities, Informatica Power Center 9.0.1/8.6/8.1 , Alteyx 10, Ab Initio (Co>Op 3.0.3.9/2.15/2.14 , GDE 3.0.4/1.15/1.14 ), IBM Information Server 9.1/8.5/8.0.1 , Oracle 10g/9i, MS SQL Server 6.5/7.0/2000 SSRS and SSIS, pythan,pyspark,R,sparkR,HDFS, HBase and Hive, StreamsetsHDFS, MapReduce. AZURE Data Factory, Azure Data Lake Storage (ADLS), AWS Athena, AWS Glue, AZURE BLOB, AWS S3,AWS IAM,AZURE AD.

Languages: Teradata SQL,PLS/SQL,Python

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, TPT, SQL Assistant, Teradata Manager

Databases: Teradata 13/12/V2R6.2, Oracle 10g/9i, DB2/UDB, SQL Server

Operating Systems: Windows 95/98/NT/2000/XP, UNIX, Linux, NCR MP-RAS UNIX

Data Modeling: Erwin, ER Studio

Scheduling tools: Control M, Autosys

PROFESSIONAL EXPERIENCE

Confidential

Data engineer

Environment: Teradata 15.0 (FastLoad, MultiLoad, FastExport, BTEQ), Teradata SQL Assistant Informatica Power Center 9, Unix, SQL, PL/SQL, Work Load Manager, MS Access, UNIX. .Hive,Streamsets,Pyspark.

Responsibilities:

  • Involve in all vital phases of software development life cycle including Business Requirements Analysis, Application Design, Development, TestingImplementation and Support for Enterprise Data Warehouse and Client/ Server applications.
  • Participate in requirement gathering meetings with Confidential and translate the client requirements into highly specified project briefs.
  • Involve in converting functional requirements into ETL design and present to client.
  • Use various Teradata load utilities for data load and UNIX shell scripting for file validation process.
  • Conduct due diligence meetings with clients, organize meetings with the clients.
  • Analysis of data warehouse data to build various data marts for reporting purposes.
  • Co-ordinate with various stakeholders during development phase for data extraction and loading.
  • Develop and support Extraction, Transformation and Load process (ETL) using InformaticaPower Center to populate Teradata tables and flat files.
  • Writing SQL Scripts to extract the data from Database and for Testing Purposes.
  • Interacting with the Source Team and Business to get the Validation of the data.
  • Involved in Transferring the Processed files from mainframe to target system.
  • Supported the code after postproduction deployment.
  • Familiar with Agile software methodologies (scrum).

Confidential, Richmond,VA

Data Engineer

Environment: Teradata 14.10 (FastLoad, MultiLoad, FastExport, BTEQ), Teradata SQL Assistant, Informatica Power Center 9, Unix, Alteryx Designer 10,Microsoft SQL Server Management Studio 2014,Oracle SQL Developer 4.1.1, SQL, PL/SQL, WorkLoad Manager, MS Access, UNIX.Hive,Streamsets,Pyspark.

Responsibilities:

  • Export and Import data into HDFS, HBase and Hive using streamset and pyspark in voyager project
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Involved in creating Hive tables and loading them with data and writing Hive queries.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Experienced in implementing Different kind of joins to integrate data from different data sets like Map and Reduce side join.
  • Developed ETL loads for daily incremental (SCD) and append mode loads.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Good experience in building pipelines using Azure Data Factory and moving the data into Azure Data Lake Store.
  • Created pipelines to move data from on-premise servers to Azure Data Lake.
  • Created Hive tables, dynamic partitions, buckets for sampling, and worked on them using Hive QL.
  • Developed spark applications in python (PySpark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.
  • Created Pyspark frame to bring data from Oracle to Azure BLOB.
  • Used various spark Transformations and Actions for cleansing the input data.
  • Experienced with optimizing techniques to get better performance from Hive Queries.
  • Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
  • Developed and Optimized Stored Procedures, Functions and Packages using SQL and PL/SQL.
  • Actively supported Business users for change requests and provided support to team members in SIT, UAT and post production by troubleshooting and solving issues.
  • Performed Data Verification, data Validation, and Data Transformations on the Input data (Text files, XML files, JSON files, CXML) before loading into target database.
  • Implemented Ad - hoc query using Hive to perform analytics on structured data.
  • Expertise in writing Hive UDF, Generic UDF's to incorporate complex business logic into Hive Queries.
  • Extended NIFI objects to support customizations using python and groovy scripts.
  • Worked on XSLT and Avro schema Registry in order transform CXML data and JSON data.
  • Developed multiple Map Reduce jobs for data cleaning and preprocessing.
  • Used Apache Nifi to automate the data movement between different Hadoop systems.
  • Conducted training for team members on NIFI.
  • Coordinated with offshore and done code review for NIFI templates and Java components
  • Load data into Hive partitioned tables.
  • Created Control-m jobs to load data and monitor it.
  • Involved in understanding the Requirements of the End Users/Business Analysts and developed strategies for ETL processes.
  • Worked on Teradata and its utilities - Bteq, tpump, fastload through Informatica.Also created complex Teradata stored procedures
  • Used Oracle SQL Developer 4.1.1 - Defining the schema, staging tables, and landing zonetables, configuring base objects, foreign-key relationships, complex joins, and building efficient views.
  • Analyzing, designing and developing ETL strategies and processes, writing ETL specifications, Informatica development, and administration and mentoring other team members.
  • Used various transformations like Filter, Expression, Sequence Generator, Update Strategy, Joiner, and SQL, Lookup (File and Database) to develop robust mappings in the Informatica Designer.
  • Involved in Performance tuning at source, target, mappings, sessions, and system levels.
  • Exhaustive testing of developed components.
  • Worked on the various enhancements activities, involved in process improvement.
  • Used Informatica client tools - Source Analyzer, Warehouse designer, Mapping designer, Transformation Developer, WorkFlow Manager, Workflow Monitor.
  • Involved in full Software Development Life Cycle (SDLC) - Business Requirements Analysis, preparation of Technical Design documents, Data Analysis, Logical and Physical database design, Coding, Testing, Implementing, and deploying to business users.
  • Providing technicalsupport and guidance to the offshore team to address complex business problems.
  • Defining the schema, staging tables, and landing zonetables, configuring base objects, foreign-key relationships, complex joins, and building efficient views.

Confidential, Beaverton, OR

Application Engineer

Environment: Teradata 14.10 (FastLoad, MultiLoad, FastExport, BTEQ), Teradata SQL Assistant, Informatica Power Center 9, Unix, Alteryx Designer 10,Microsoft SQL Server Management Studio 2014,Oracle SQL Developer 4.1.1, SQL, PL/SQL, WorkLoad Manager, MS Access, UNIX.

Responsibilities:

  • Involved in Product Engine Solution (PES), Business Requirements Analysis, preparation of Technical Design documents, Data Analysis, Coding, Testing, Implementing, and deploying to business users.
  • Involved in Emerging Market Project, Setting up new data base and roles specific to Emerging Markets,Business Requirements Analysis,Analysis of SQL Server queries and Cognos reports, Lift and Shift to Teradata from SQL Server,Worked on Point Of Sales (POS) data and Direct to Customer (DTC) data, Preparing Technical Design documents, Data Analysis, Logical and Physical database design, Coding, Testing, Implementing, and deploying to business users.
  • Responsible for architecting Hadoop clusters Translation of functional and technical requirements into detailed architecture and design.
  • Comparing the results of traditional system to Hadoop environment to identify any differences and fix them by finding the route cause.
  • Responsible for validation of Target data in Data Warehouse and Data Marts which are Transformed and Loaded using Hadoop Bigdata.
  • Implemented solutions using AWS EC2, S3, RDS, EBS, Elastic Load Balancer, Auto-scaling groups.
  • Experience in using apache NiFi to automate the data movement between different Hadoop systems.
  • Automated the process to copy files in Hadoop system for testing purpose at regular intervals.
  • Design Develop and test ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
  • Build the infrastructure required for optimal extraction, transformation, and loading (ETL) of data from a wide variety of data sources like Salesforce, SQL Server, Oracle & SAP using AWS, PySpark, Spark, Hive, Kafka and other Bigdata technologies.
  • Export and Import data into HDFS, HBase and Hive using streamset and sparkR
  • Write Map Reduce Jobs, HIVEQL, Spark
  • Worked in lights on team debugging the failures in production and resolving them
  • Involve in create Hive tables, loading with data and writing Hive queries which will run internally in MapReduce.
  • Created and implemented User Managed Data (UMD) Process for Logistic Team and Key Business Segment (KBS) Team by using Microsoft Access and Teradata.
  • Involved in adding new plant codes in Teradata for FRS team, analyzed the Source of the plant codes and Informatica work flows. Impact analysis of adding new plants in Teradata. Deployed plant codes as part of ERS Build plan spring 2016.
  • Involved in understanding the Requirements of the End Users/Business Analysts and developed strategies for ETL processes.
  • Worked on Alteryx Designer 10 - Creating,Mapping, Analyzing, monitoring and automatingnew Workflows
  • Worked on Teradata and its utilities - Bteq, tpump, fastload through Informatica.Also created complex Teradata stored procedures
  • Used Oracle SQL Developer 4.1.1 - Defining the schema, staging tables, and landing zonetables, configuring base objects, foreign-key relationships, complex joins, and building efficient views.
  • Analyzing, designing and developing ETL strategies and processes, writing ETL specifications, Informatica development, and administration and mentoring other team members.
  • Used Microsoft SQL Server Management Studio and SSIS package - Analyzing Technical Design documents, Data Analysis and analyzing complex logics in SQL server, Lift and Shift to Teradata from SQL Server.
  • Developed mapping parameters and variables to support SQL override.
  • Used various transformations like Filter, Expression, Sequence Generator, Update Strategy, Joiner, and SQL, Lookup (File and Database) to develop robust mappings in the Informatica Designer.
  • Involved in Performance tuning at source, target, mappings, sessions, and system levels.
  • Exhaustive testing of developed components.
  • Worked on the various enhancements activities, involved in process improvement.
  • Used Informatica and Alteryx client tools - Source Analyzer, Warehouse designer, Mapping designer, Transformation Developer, WorkFlow Manager, Workflow Monitor.

Confidential

Teradata Consultant

Environment: Teradata 14.0 (FastLoad, MultiLoad, FastExport, BTEQ), Teradata SQL Assistant Informatica Power Center 9, Unix, SQL, PL/SQL, Work Load Manager, MS Access, UNIX.

Responsibilities:

  • Involved in full Software Development Life Cycle (SDLC) - Business Requirements Analysis, preparation of Technical Design documents, Data Analysis, Logical and Physical database design, Coding, Testing, Implementing, and deploying to business users.
  • Providing technicalsupport and guidance to the offshore team to address complex business problems.
  • Defining the schema, staging tables, and landing zonetables, configuring base objects, foreign-key relationships, complex joins, and building efficient views.
  • Expertise in writing scripts for Data Extraction, Transformation and Loading of data from legacy systems to target data warehouse using BTEQ, FastLoad, MultiLoad, and Tpump.
  • Performed Query Optimization with the help of explain plans, collect statistics, Primary and Secondary indexes. Used volatiletable and derivedqueries for breaking up complex queries into simpler queries. Streamlined the Teradata scripts and shell scripts migration process on the UNIX box.
  • Dealt with initials, delta and Incremental data as well Migration data to load into the Teradata.
  • Worked on InformaticaPower Center tools - Designer, Repository Manager, Workflow Manager, and Workflow Monitor.
  • Using various transformations like Filter, Expression, Sequence Generator, Update Strategy, Joiner, Stored Procedure, and Union to develop robust mappings in the Informatica Designer.
  • Developing as well as modifying existing mappings for enhancements of new business requirements mappings to load into staging tables and then to target tables in EDW. Also created mapplets to use them in different mappings.
  • Export and Import data into HDFS, HBase and Hive using streamset and sparkR
  • Write Map Reduce Jobs, HIVEQL, Spark
  • Worked in lights on team debugging the failures in production and resolving them
  • Involve in create Hive tables, loading with data and writing Hive queries which will run internally in MapReduce
  • Created data models for information systems by applying formal data modeling techniques.
  • Strong expertise in physical modeling with knowledge to use Primary, Secondary, PPI, and Join Indexes.
  • Designed Fact tables and Dimension tables for star schemas and snowflake schemas using ERWIN tool and used them for building reports.
  • Performed reverse engineering of physical data models from databases and SQL scripts.
  • Working on different tasks in Workflows like sessions, events raise, event wait, e-mail, command, worklets and scheduling of the workflow.
  • Creating sessions, configuring workflows to extract data from various sources, transforming data, and loading into enterprise data warehouse.
  • Investigating failed jobs and writing SQL to debug data load issues in Production.
  • Writing SQL Scripts to extract the data from Database and for Testing Purposes.
  • Interacting with the Source Team and Business to get the Validation of the data.
  • Involved in Transferring the Processed files from mainframe to target system.
  • Supported the code after postproduction deployment.
  • Familiar with Agile software methodologies (scrum).

Hire Now