Software Engineer Resume
Bellevue, WA
SUMMARY
- 10 years of IT experience in Data Warehousing, Data Mart, ETL (Extract, Transform and Load) and Big Data technologies.
- Experience with agile and waterfall project development methodologies.
- Working with product managers, architects and data scientists to translate conceptual requirements into technical implementation requirements.
- Identify business rules for data integration, parse high level design specifications into simple ETL coding by following mapping standards.
- Experience in defining, designing, implementing and testing data engineering modules written in Spark SQL, Hive and ANSI SQL.
- Experience working in Relational and Big Data environments managing terabytes of data and information.
- Write complex transformation logic using Teradata BTEQ, Apache Hive and Apache Spark SQL to Transform the source data and load into Semantic layer for reporting purposes.
- Working with various HDFS file formats like JSON, Text, CSV, Sequence, AVRO and Parquet.
- Experience on migrating terabytes of data from Teradata into HDFS and vice - versa using Query Grid, TDCH and Apache Sqoop.
- Working on Teradata Bulk load and unload utilities like Fast Load, Multi Load, Fast Export and TPT (Teradata Parallel Transporter).
- Create UNIX shell scripts for file transfer, cleansing operations, encryption / decryption and to automate HDFS file purging.
- Schedule the ETL jobs using Unix Crontab and Workflow management tools like UC4 Automic, CA Workload Automation.
- Proficient in in-memory and parallel data processing concepts.
- Experience in performance tuning on Teradata and Spark application using techniques like collect stats, partitioning, bucketing, broadcast join, caching, dynamic allocation.
- Provide technical expertise in Data Warehousing concepts/techniques like OLAP, Data Normalization, Star schema, Snowflake schema, data modeling, data mining and data structures.
- Create ETL jobs using IBM InfoSphere DataStage and Informatica PowerCenter to extract data from multiple source systems like Microsoft SQL Server, Oracle, CSV, flat files and load into the Data Warehouse and Hadoop staging layer.
- Utilize ETL best practices and implement them for process improvement.
- Working with external and internal teams to resolve infrastructure related issues when developing integration modules.
- Using CVS and GITHUB for Software configuration management; and JIRA for issue tracking and Agile project management.
- Work on engineering support, release and on-call activities.
TECHNICAL SKILLS
Databases & Tools: Teradata 15, InfoSphere DataStage 11.5, Informatica 10.1, HDFS, Apache Hive, Apache Spark SQL, Apache Sqoop, UNIX Shell Scripting, TDCH
Methodologies: Waterfall and Agile
Other Utilities: HTML, XML, JSON, CA Workload Automation, UC4 Automic
Familiar with: Talend Data Integration, Apache Pig
Self-learning: AWS
Operating System: Unix/Linux, Microsoft Windows (server, 2000, XP, 7,10)
Programming Languages: SQL, C, Python
PROFESSIONAL EXPERIENCE
Confidential, Bellevue, WA
Software Engineer
Responsibilities:
- Interact with business stakeholders to understand the project requirements and work with product managers, architects and data scientists to translate conceptual requirements into technical implementation requirements.
- Estimate engineering effort, plan implementation, and rollout system changes.
- Worked on ADPO (Analytics Data Platform Optimization) Migration: an initiative which aims at building, evolving and maturing an open-source production-grade relational processing platform for eBay at scale, leveraging technologies like Spark and HDFS.
- Work on migrating terabytes of data from Teradata into HDFS and vice-versa using Query Grid, TDCH and Apache Sqoop.
- Working with various HDFS file formats like JSON, Text, CSV, Sequence, AVRO and Parquet.
- Write complex transformations using Spark SQL and convert existing Teradata BTEQ scripts to Spark SQL.
- Work on exploding unstructured data with complex datatypes like struct and array using
- Work in performance tuning of Spark application by using bucketing, partitioning, dynamic allocation, broadcasting etc.
- Use DistCp utility to copy data from source Hadoop cluster to Data Recovery Hadoop cluster.
- Create UNIX shell scripts for file transfer, cleansing operations, encryption / decryption and to automate HDFS file purging.
- Use GIT for version control and schedule the jobs using UC4 Automic
- Performed On-call and support activities for DW Selling subject area.
Tools: Teradata 15.10, Apache Spark SQL, Apache Hive, HDFS, UNIX Shell Scripting, UC4 Automic, Query Grid, DistCp, GIT, JIRA
Confidential, Bellevue, WA
ETL Developer
Responsibilities:
- Main responsibilities are interacting with business users for requirement gathering and to understand the existing manual process and procedures.
- Created and delivered detail design document to include the designs in the following areas for the Sprint: product definition, algorithmic changes, and inbound/outbound interfaces.
- Created user stories in Jira and attended Daily stand up and Sprint planning sessions to provide the progress of the project.
- Worked on migrating significant quantities of relational data processing workloads to the newly built open-source based Hadoop environment.
- Worked on automating the reconciliation of eBay Daily Deals data between the Data Warehouse and Salesforce system. Performed Data cleansing and data profiling operations on the data extracted from Salesforce and coded complex rules in Spark SQL for reconciling the data at multiple levels.
- Converted the Teradata ETL batch Process’s BTEQ Scripts to equivalent Spark SQL scripts and compared the data between Teradata and Hadoop.
- Done performance tuning for badly performing SQLs by Collecting statistics and caching the tables wherever required and by providing appropriate number of buckets.
- Write Unix Shell scripts for automating the process and schedule it through CA Automic Workload Automation tool (UC4).
- End-End support including development, Unit/QA testing, production deployment and on-call activities.
- Performed On-call activities on a rotation basis for DW Shared Data (between eBay and PayPal) subject area and fix any issues related to the ongoing production batch process.
Tools: Teradata 15.10, Apache Spark SQL, Apache Hive, HDFS, UNIX Shell Scripting, UC4 Automic, Query Grid, DistCp, GIT, JIRA
Confidential, Ashburn, VA
Sr. Teradata/ ETL Developer
Responsibilities:
- Participated in requirement analysis and design discussions with business partners and other impacted IT teams to fully understand the business intent of the requirements and formulated a high-level design approach that will be the basis for the detail design document.
- Worked with the project Architect team to design the physical and Logical model of the Data Mart.
- Created Teradata BTEQ scripts with complex processing logic to transform the data from Staging to Base/Summary layer.
- Loaded flat files into Teradata staging tables by using Teradata bulk load utilities like FastLoad, MultiLoad and Teradata Parallel Transporter.
- Created ETL jobs in IBM Infosphere DataStage v11.5 and Informatica 10.1 to pull the data from source systems like Oracle, Microsoft SQL Server and loaded the data into Teradata Staging layer.
- Worked on Offloading History data from Teradata to Hadoop using DataStage and TDCH.
- Created an automated framework using Teradata Connector for Hadoop (TDCH) to offload historical data from Teradata into Hive tables. Developed HiveQL Scripts to load the Delta records from Hive External staging table to Hive Managed table using ACID transactions.
- Involved in comprehending the Legacy jobs that were written in SAS and created the mapping document.
- Worked on migrating the legacy mainframe jobs written in SAS into Teradata BTEQ and loaded into the Financial Data Mart (FDM). developed UNIX Shell scripts to automate the ETL jobs and scheduled it through Cron.
- Performed Unit testing for the ETL jobs and interacted with testing team and business users for User Acceptance testing sign-off.
- Worked with Reporting team to understand the performance issues they face while pulling the reports from IBM Cognos and did performance tuning of the Teradata Summary table by adding Secondary Index, Aggregate Join Index, Partition Primary Index and by collecting statistics.
- Deployed the codes into Testing and Production server through version control system CVS (Concurrent Version System).
- Worked on Production deployment activities based on the implementation plan and fixed issues related to production batch.
Tools: DataStage 11.5, Informatica 10.1, Teradata 15.10, Oracle 11g, MSSQL 2014, Apache Hive 2.2.0, UNIX Shell Scripting, JCL, SAS, Apache Sqoop
Confidential
Sr. Teradata/ETL Developer
Responsibilities:
- Attended Inception meetings to gather business requirements and proposed design solutions, did Impact analysis to find the feasible solutions and created technical stories.
- Updated Kanban board and attended daily and weekly standups to showcase the progress of the Iteration and discussed the defects that were raised by testing team and fixed them.
- Designed and developed DataStage Jobs as per the Mapping Document to extract data from heterogeneous sources, applied transform logics to the extracted data and loaded into the Teradata Data Warehouse.
- Converted complex DataStage job designs to simpler job segments and executed through job sequencer for better performance and easy maintenance.
- Developed complex Teradata BTEQ scripts to transform the data from Teradata Staging layer to the Summary layer for reporting purposes.
- Created UNIX shell scripts to automate Teradata BTEQ scripts and DataStage jobs.
- Involved in performance tuning by interpreting performance statistics of the jobs developed.
- Involved in writing Test Plans, Test Scenarios, Test Cases and Test Scripts and performed Unit testing.
- Performed Unit testing for the ETL jobs and interacted with testing team and business users for User Acceptance testing sign-off.
- Scheduled the jobs using CA Workload automation tool.
- Provided a written production migration plan to include a list of impacted systems, migration timeline, back-out plan, point of no return, and processes to validate migrated Code for the Project.
- Coordinated with Configuration Management team in code deployments.
- Worked on production deployment activities and proactively involved in fixing production support issues.
- Managed offshore team at India Development Center.
- Done POC in Talend Data Integration, Apache Hive, Apache Pig and Apache Sqoop
Tools: Teradata 15.10, DataStage 11.5, Hive, Sqoop, Unix Shell Scripting, Service now, JIRA
Confidential
Teradata Developer
Responsibilities:
- Interacted with Onsite coordinators and Offshore lead to gather data requirements for addressing specific business needs.
- Involved in Design, development and implementation of Teradata ETL process.
- Developed Teradata BTEQ scripts by using Joins, Sub Queries, temporary tables, Set Operations, and Advanced OLAP Functions.
- Experience in loading and extracting huge amount of data to/from Teradata database by invoking Teradata utilities like Fast Load and Fast Export.
- Created re-usable/wrapper shell scripts for file level operations, cleansing activities, and for automating ETL process.
- Involved in creating weekly reports to give an overall view of the business, graphically and in pivot form.
- Experience in interacting with Business analysts working across the globe and helped them with their data needs by providing metrics from the Data Warehouse.
- Development of unit test cases and perform unit testing.
- Developed UC4 Jobs and Job plans for scheduling the ETL scripts.
- Involved in the deployment of codes to production by coordinating with Change Management team.
Tools: Teradata 13, UNIX Shell Scripting, UC4