Lead Data Engineer- Development/support/testing Resume
Houston, TX
SUMMARY
- 6 years of IT experience in ETL and Hadoop with complete software development life cycle including requirement gathering, analysis, design, development, implementation and maintenance of Data warehousing applications, also have experience with AWS and writing Python scripts.
- Experience in populating and maintaining Data warehouses and Datamarts using IBM Information Server 11.7, 11.5, 11.3, 9.1, v8.7, 8.5 (Administrator, Director, Manager, Designer).
- Extensive experience with SQL, SCALA, SPARK, HIVE, PYTHON, Snowflake and Shell Scripting.
- Experience working with Snowflake warehouse for loading and extracting data.
- Have used filters and functionalities that snowflake provides to optimize and set up the data pipeline.
- Ability to discern and adopt initiatives for the full life cycle which includes performing extensive analysis, design, cleanse, transform and Development of high performance ETL for very large - scale data sets.
- My core competencies include expertise in DataStage as a developer, designer and tester in the process of development of Extracting, Transforming and Loading (ETL) of data.
- Experience in Designing and populating Dimensional model (Star & Snowflake Schema) for Data warehouse and data marts and Reporting Data Sources.
- Experience working with Snowflake warehouse for loading and extracting data and also leverage the different tools like Snowsql, Python connector provided by Snowflake.
- Have used filters and functionalities that snowflake provides to optimize and set up the data pipeline.
- Experience in designing, developing, performance tuning, debugging, troubleshooting, and monitoring ETL jobs using DataStage Designer, Director, Manager and Administrator.
- Experienced on Data Analysis, Business Analysis, User Requirement Analysis, Data Cleansing, Source Systems Analysis and Reporting Analysis.
- In depth experience in dealing with DataStage Designer stages like Lookup, Join, Merge, Row generator, Transformer, Remove Duplicate, Sort, Peek, Change capture, Filter, Copy, Sequential File, FTP, Data Set, ODBC, Snowflake connector, Hive connector etc.
- Actively involved in data collection, group discussions and analysis. I have also played a key role in comprehensive analysis with regards to the functional flow of the projects and lead the team in database designing and maintenance.
- Extensive experience in database programming using SQL and PL/SQL that includes stored Procedures, functions and Packages.
- Hands on experience in working with Big Data Ecosystems like HDFS, Hive, Sqoop, Flume, Spark.
- Experience with fine tuning the jobs and scripts to achieve performance boost by using partition techniques.
- Well versed with complete Software Life Cycle Development process.
TECHNICAL SKILLS
ETL Tool: IBM WebSphere Information Server 11.7/11.5/11.3/9.1/8.5/8.1 , IBM Datastage 11.7/11.5/11.3/9.1/8.5/8.1
Data Modeling: Star-Schema Modeling, Snowflake-Schema Modeling, FACT and dimension tables.
Hadoop ecosystem: HDFS, HIVE, MAP REDUCE, MongoDB
RDBMS: Teradata, Oracle 12c/11g/10g, IBM DB2
ETL Scheduler: Autosys, CNTRL-M, CA7, CRONTAB.
Programming Languages: SQL, PL/SQL, C++, Python, Unix Shell Scripting.
Visualization and Other Tools: AWS, JIRA, SQL Loader, TOAD, Snowflake, Putty, Visual Studio, Eclipse, MSTR Reporting tool, Tableau, ALM Defect Management, Power BI
Version Control System: GIT, TFS, Mercurial
PROFESSIONAL EXPERIENCE
Confidential, Houston, TX
Lead Data Engineer- Development/Support/Testing
Responsibilities:
- Leading 2 Teams, Which include the support team for ADR and the Quality(testing) team for LDTI(Long Duration Targeted improvements).
- Lead the task of Co-ordinating with the business team to understand the requirements like Source, target, ETL requirement etc and try to give the best ETA possible based on bandwidth.
- Work with the development team and quality team to ensure delivery on or before ETA. (Includes doing the task myself, assigning tasks and providing KT when needed).
- Design, Develop and implement processes and controls for incremental loads using IBM Datastage 11.5 and 11.7.
- Extracted data from disparate source systems such as Oracle, Hive, Snowflake and Files (CSV).
- Apply transformations on the extracted data as per the business requirement and add additional fields to provide the valuation and load timestamps.
- Used multiple stages like Transformer, oracle connector, snowflake connector, join stage, look-up stage, etc for loading and transformations.
- Involved in writing the scripts for invoking the Sequence Job controls and loading the files to push to output environment.
- Designed Autosys Jobs to trigger the Controls and automate the process.
- Experience loading JSON data in to Snowflake by using copy statement for bulk load.
- Experience working with SnowPipe for pulling data in to snowflake stating area from different sources.
- Have built controls for the process to make sure data flows in as expected throughout the stages (landing, staging and mart).
- Currently working on migrating to Snowflake from Oracle. Designed jobs to load data to Snowflake by extracting from Oracle DB.
- Importing data stored in Amazon Web Services (S3) into HDFS.
- Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
- Have developed scala code for data Transformation.
- Co-ordinated with off-shore to provide updates and communicate the requirements that needs to be accomplished.
- Experience working with Hive queries and spark/scala for storing data in HIVE from source.
- Involved in migrating the code to production and also draft the run book for the developed process and coordinate with Support team to run the process successfully.
- Followed the agile methodology and attended scrum meetings on daily basis.
Environment: IBM Information server 11.5 and 11.7(Administrator, Designer, Director), AWS, Snowflake, Autosys, SQL Client, Oracle, Visual Studio, UNIX, ALM Defect Management tool, MSTR and Power BI Reporting tools.
Confidential, Cincinnati, Ohio
Senior Datastage Developer
Responsibilities:
- Involved in Defining the Best practice and Development standards doc for Datastage Jobs.
- Involved in gathering and analysis of business requirements from clients and end users
- Involved in design and development of server jobs, sequences using the Designer
- Involved in Defining and designing the process for Extraction, Transformation and Loading (ETL) Data from various source systems to Data warehouse.
- Playing a proactive role Creating and using Datastage Shared Containers, Local Containers for DS jobs and retrieving Error log information.
- Guiding and helping other Team Members in the Development Process.
- Deploying solutions that maximize consistency and usability of the data.
- Designed the simple and complex data flow for incremental loads of different ETL interfaces.
- Created Standard Rule Sets for accommodating various Country Codes, Addresses and Currencies
- Extensively used the Slowly Changing Dimensions Stage for loading Dimension tables and Fact tables.
- Developed data extractions, transformations and routines.
- Used Teradata utilities Fast export, fast load, and multi load in Datastage
- Created Parameter sets, Unix Shell Scripts and Routines to read the Parameter Files from the database tables and passing these values to the job during runtime.
Environment: IBM Information server 8.1, 8.5,8.7 and 9.1(Administrator, Designer, Director), Oracle11g, Python, SQL Server, SQL Client, DB2, UNIX, Windows 7, CA7,Putty,TFS, Power BI
Confidential, Las Vegas, NV
Data Engineer
Responsibilities:
- One of my key responsibilities as a Lead is to understand the requirements from different departments and solve them in most efficient way possible.
- Designing the schema and created DDL Scripts for ETL Metadata tables to store the run time metrics of the Datastage Jobs.
- Configured Talend Jobs to load VSAM files to Hive and Snowflake Databases.
- Working with different internal Marketing Teams and understanding the complex marketing techniques and providing robust solutions.
- Converting the existing Teradata Stored Procedures to Datastage Jobs using the Standard Framework.
- Performing Code Reviews and Tuning of the implemented Jobs using Resource Estimation Wizard.
- Organizing Technical discussions to help solve some of the existing cumbersome use cases and finding an optimal solution.
- Pitching in new approaches with new technology in mind was a challenging task.
- Discussing with Data Modelers and Business Analysts to understand the business logic of the EDW Mart Build (An internal Confidential Process).
- Working with outside vendors (BCG, Bain, TravelClick, Prolifics, etc.,) to make sure the outbound and inbound data gets delivered and loaded in a most efficient way.
- Creating and enhancing the Migration checklist for the Datastage upgrade to make sure the transition happens smoothly.
- Experience with writing Pig scripts to analyze & process large datasets and run scripts on Hadoop cluster.
- Involved in converting Hive/SQL queries into Scala/Spark transformations using Spark RDD's.
- Written Hive queries for data analysis to meet the business requirements.
- Experience in running Hadoop streaming jobs to process large amount of data comprised in different formats.
Environment: IBM Information server 9.1 and 11.5, 11.7(Administrator, Designer, Director), Talend, Python, SQL Client, Teradata, Hadoop, Autosys, UNIX, Windows 7.
Confidential
Datastage Developer
Responsibilities:
- Design, Develop and implement applications/ processes using IBM Datastage ETL product suite.
- Develop technical specifications and design documents for ETL Processes.
- Troubleshoot, perform performance tuning, and monitor Datastage ETL Jobs in development, QA, SIT and Production Environments.
- Involved in extracting data various sources like flat files, SQL server and SAP Hana using SAP BO Data Services.
- Build Database SQL (Structured Query Language) queries, triggers, procedures, functions and packages.
- Perform Analysis, Design, Development, Testing and Production Support of ETL and Database Systems.
- Build Unix Shell Scripts to automate applications/processes.
- Develop ETL test plans, test cases, test scripts, and test validation data sets for ETL systems.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Worked on extracting files from SQL database through Sqoop and placed in HDFS and processed.
- Experience in Load and transform large sets of structured, semi structured and unstructured data.
- Coordinate with the testing team to perform all phases of testing.
Environment: IBM Information server 9.1 and 11.3(Administrator, Designer, Director), Python, SQL Client, Teradata, Hadoop, UNIX, Windows 7.