We provide IT Staff Augmentation Services!

Etl Developer/ Data Engineer Resume

2.00/5 (Submit Your Rating)

Lake Forest, IL

SUMMARY

  • Overall 6+ years of professional experience in IT and expertise in ETL with AWS technologies as Data Analyst, ETL Developer, Data Engineer.
  • Have proven track record of working as Data Engineer on Amazon cloud services, Bigdata/Hadoop Applications and product development.
  • Ability in development and execution of XML, shell scripts and Perl scripts.
  • Well versed with Big data on AWS cloud services i.e. EC2, S3, Glue, Anthena, DynamoDB and RedShift
  • Experience in job/workflow scheduling and monitoring tools like Oozie, AWS Data pipeline & Autosys
  • Design and implement disaster recovery for the PostgreSQL Database.
  • In - depth understanding of SnowFlake cloud technology, SnowFlake Multi-cluster Size and Credit Usage
  • Played key role in Migrating Teradata objects into SnowFlake environment.
  • Experience with Snowflake Multi-Cluster Warehouses, Snowflake Virtual Warehouses & in building Snowpipe. knowledge of Data Sharing in Snowflake, Snowflake Database, Schema and Table structures.
  • Experience in using Snowflake Clone and Time Travel.
  • Experience in building ETL pipelines using Teradata.
  • Have good Knowledge in ETL and hands on experience in ETL.
  • Experience in various methodologies like Waterfall and Agile.
  • Defined and deployed monitoring, metrics, and logging systems on AWS.
  • Developed Installation Unix shell scripts in bash on CentOS for installing dependencies for Perl in both Internet and RPM modes
  • Informarica ETL developer with expertise in design and development of Extract, Transform and Load processes for data integration projects to build data marts
  • Experience working on creating and running Docker images with multiple micro - services.
  • Good experience in deploying, managing and developing with MongoDB clusters.
  • Docker container orchestration using ECS, ALB and lambda.
  • Experience with Unix/Linux systems with scripting experience and building data pipelines.
  • Responsible for migration of application running on premise onto AWS cloud.
  • Proficiency in multiple databases like MongoDB, Cassandra, MySQL, ORACLE and MS SQL Server.
  • Played a key role in migrating Cassandra, Hadoop cluster on AWS and defined different read/write strategies
  • Strong SQL development skills including writing Stored Procedures, Triggers, Views, and User Defined functions.
  • Expertise in development of various reports, dashboards using various Tableau Visualizations
  • Hands on experience with different programming languages such as Java, Python, R, SAS
  • Experience in using different Hadoop eco system components such as HDFS, YARN, MapReduce, Spark, Pig, Sqoop, Hive, Impala, Hbase, Kafka, and Crontab tools.
  • Experience in developing ETL applications on large volumes of data using different tools: MapReduce, Spark-Scala, PySpark, Spark-Sql, and Pig.
  • Experience in using SQOOP for importing and exporting data from RDBMS to HDFS and Hive.
  • Experience on MS SQL Server, including SSRS, SSIS, and T-SQL.

TECHNICAL SKILLS

Modeling Tools: IBM Infosphere, SQL Power Architect, Oracle Designer, Erwin 9.6/9.5, ER/Studio 9.7, Sybase Power Designer.

SAS Skills: SAS Data Integration (DI) Studio, SAS Management Console, SAS Enterprise Guide, SAS Enterprise Miner, SAS BI Suite, SAS Information Map Studio, SAS OLAP Cube Studio, SAS Information Delivery Portal, SAS Web Report Studio

Database Tools: Oracle 12c/11g, MS Access, Microsoft SQL Server 2014/2012 Teradata 15/14, Poster SQL, PostgreSQL 9.4, Netezza.

Scripting Languages: Unix, Bash, Perl, Python, Shell Programming

Big Data Technologies: Hadoop, HDFS 2, Hive, Pig, HBase, Sqoop, Flume.

Cloud Platform: AWS, EC2, S3,Snowflake, SQS, Azure.

Operating System: Windows, Dos, Unix, Linux.

BI Tools: SSIS, SSRS, SSAS.

Reporting Tools: Business Objects, Crystal Reports.

Tools: & Software: TOAD, MS Office, BTEQ, Teradata SQL Assistant.

PROFESSIONAL EXPERIENCE

Confidential, Lake Forest, IL

ETL Developer/ Data Engineer

Responsibilities:

  • Gathered Business Requirements Document break it to Functional Requirements and created stories in jira and follow up on it until the requirement is fulfilled.
  • Involved in Migrating Objects from Teradata to Snowflake.
  • Created Snowpipe for continuous data load.
  • Used COPY to bulk load the data.
  • Created internal and external stage and t ransformed data during load.
  • Used FLATTEN table function to produce lateral view of VARIENT, OBECT and ARRAY column.
  • Worked with both Maximized and Auto-scale functionality.
  • Used Temporary and Transient tables on diff datasets.
  • Cloned Production data for code modifications and testing.
  • Shared sample data using grant access to customer for UAT.
  • Time traveled to 45 days to recover missed data.
  • Developed data warehouse model in snowflake for over 100 datasets using whereScape.
  • Heavily involved in testing Snowflake to understand best possible way to use the cloud resources.
  • Scheduled different Snowflake jobs using CRON & ControlM.
  • Performed tuning and optimization of complex SQL queries using Teradata Explain.
  • Wrote numerous BTEQ scripts to run complex queries on the Teradata database.
  • Created tables, views in Teradata, according to the requirements.
  • Experience in handling the team and submitting the project reports to manager on sprint basis.
  • Used WhereScape as ETL tool to pull data from source systems/ files, cleanse, transform and load data into the Teradata using Teradata Utilities.
  • Loaded data into the Teradata tables using Teradata Utilities BTEQ, Fast Load, Multiload, and Fast Export and TPT.
  • Experience in Performance tuning of Source SQL queries and Teradata Queries.
  • Performance tuned and optimized various complex SQL queries.
  • Good knowledge on Teradata Manager, TDWM, PMON, DBQL, SQL assistant and BTEQ.
  • Worked on data warehouses with sizes from 30-50 Terabytes.
  • Maintained user security on the BO Reports, Folders and Universes.
  • Using Teradata PT(TPT) can simultaneously load data from multiple and dissimilar data sources into, and extract data from, Teradata Database
  • Expertise in UNIX shell scripting, Autosys job scheduler, FTP, SFTP and file management in various UNIX environments.
  • Experience with the design and development of Tableau visualization solutions.
  • Hands-on development assisting users in creating and modifying worksheets and data visualization dashboards.
  • Worked on cloud deployments using maven, docker and Jenkins.
  • Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs, EC2 hosts using Cloudwatch.
  • Used AWS Glue for the data transformation, validate and data cleansing.
  • Used python Boto 3 to configure the services AWS glue, EC2, S3

Environment: Snowflake, MS-Office, SQL, SQL Loader, PL/SQL, DB2, SharePoint, Talend, MS-Office, Redshift,SQL Server, Hadoop, Spark, AWS.

Confidential

Data Engineer/ Data Analyst

Responsibilities:

  • Wrote scripts and indexing strategy for a migration to Confidential Redshift from SQL Server and MySQL databases.
  • Implement software enhancements to port legacy software systems to Spark and Hadoop ecosystems on Azure Cloud.
  • Used Pig as ETL tool to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
  • Coordinated activities with SAS technical Support team to open tickets for SAS related issues, escalate issue and close ticket
  • Involved in Relational and Dimensional Data modeling for creating Logical and Physical Design of Database and ER Diagrams with all related entities and relationship with each entity based on the rules provided by the business manager using ER Studio.
  • Creating automated test tool for integration and regression testing using Unix/Perl shell scripting on UNIX based environment.
  • Contributed for maintenance of SAS software applications, development of back up strategies, performance tuning, capacity planning and develop strategies for efficient and effective use of SAS applications.
  • Analyzed existing systems and propose improvements in processes and systems for usage of modern scheduling tools like Airflow and migrating the legacy systems into an Enterprise data lake built on AWS Cloud.
  • Optimize postgresql.conf for performance improvement, Review all PostgreSQL logs for problems.
  • Designed and Implemented Sharding and Indexing Strategies for MongoDB servers.
  • Developed Shell/Python/Perl/Ruby Scripts for automation and deployment purpose.
  • Optimizing pig scripts, user interface analysis, performance tuning and analysis.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Managed software licensing, created and updated system documentation for SAS environment and documented problems and solutions for future reference.
  • Refreshing the Development, test and UAT environments with production data as per customer’s request.
  • Implemented the import and export of data using XML and SSIS.
  • Involved in Planning, Defining and Designing data base using ER Studio on business requirement and provided documentation.
  • Responsible for migration of application running on premise onto Azure cloud.
  • Used SSIS to build automated multi-dimensional cubes.
  • Wrote indexing and data distribution strategies optimized for sub-second query response
  • Developed a statistical model using artificial neural networks for ranking the students to better assist the admission process.
  • Utilize Power BI and SSRS to produce parameter driven, matrix, sub-reports, drill-down, drill-through, dashboards, and integrated report hyperlink functionality to access external applications and make dashboards available in Web clients and mobile apps.
  • Designed Data Marts by following Star Schema and Snowflake Schema Methodology, using industry leading Data modeling tools like ER Studio.
  • Performed Data cleaning and Preparation on XML files.
  • Robotic Process Automation of data cleaning and preparation in Python.
  • Prepared and uploaded SSRS reports. Manages database and SSRS permissions.
  • Develop SQL queries using stored procedures, common table expressions (CTEs), temporary table to support SSRS and Power BI reports.
  • Built analytical dashboards to track the student records and GPAs across the board.
  • Used deep learning frameworks like MXNet, Caffe 2, Tensorflow, Theano, CNTK and Keras to help clients build Deep learning models
  • Database Administration support and troubleshooting, including install PostgreSQL software, patches and upgrades; manage and monitor Tablespaces.
  • Participated in requirements meetings and data mapping sessions to understand business needs.

Environment: ER Studio, AWS, OLTP, SAS 9.1,9.2, MATLAB, Teradata, Sqoop, MongoDB, MySQL, HDFS, Linux, Shell, scripts, SSIS, SSAS, HBase, Azure, MDM.

Confidential

Data Engineer

Responsibilities:

  • Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Confidential Redshift for large scale data handling Millions of records every day
  • Worked on Big data on AWS cloud services i.e. EC2, S3, EMR and DynamoDB
  • Developed SSRS reports, SSIS packages to Extract, Transform and Load data from various source systems
  • Good Hands on experience with Database such as PL\SQL, PostgreSQL worked with various utilities to load/unload large volume Data.
  • Developed SAS macros programs to automate repetitive task to improve accuracy and to utilize time.
  • Utilized SAS to create reports on large product testing data, analyze patterns & detect failures.
  • Created and analyzed SAS database to support product production, planning, failure reduction, cost improvement, error prediction and detection.
  • Created Procedures, Sequences, Views, Fuctions etc using Oracle/DB2/SQL server/MySQL/ PostgreSQL databases.
  • Implementing and Managing ETL solutions and automating operational processes.
  • Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics.
  • Integrate data from various system sources (flat files, CSV, text form, and other formats) using SAS.
  • Daily activities involved collection of product failure data, data preparation, extraction, manipulation and data analysis using SAS.
  • Defined facts, dimensions and designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
  • Created Entity Relationship Diagrams (ERD), Functional diagrams, Data flow diagrams and enforced referential integrity constraints and created logical and physical models using Erwin.
  • Created ad hoc queries and reports to support business decisions SQL Server Reporting Services.
  • Analyze the existing application programs and tune SQL queries using execution plan, query analyzer, SQL Profiler and database engine tuning advisor to enhance performance.
  • Wrote various data normalization jobs for new data ingested into Redshift .
  • Created various complex SSIS/ETL packages to Extract, Transform and Load data
  • Advanced knowledge on Confidential Redshift and MPP database concepts.
  • Migrated on premise database structure to Confidential Redshift data warehouse
  • Was responsible for ETL and data validation using SQL Server Integration Services.
  • Defined and deployed monitoring, metrics, and logging systems on AWS.
  • Implemented Work Load Management (WML) in Redshift to prioritize basic dashboard queries over more complex longer-running adhoc queries. This allowed for a more reliable and faster reporting interface, giving sub-second query response for basic queries.
  • Worked publishing interactive data visualizations dashboards, reports /workbooks on Tableau and SAS Visual Analytics.
  • Used Hive SQL, Presto SQL and Spark SQL for ETL jobs and using the right technology for the job to get done.

Environment: SQL Server, SAS 9.1,9.2, Oracle, Redshift, Informatica, RDS, NOSQL, Snow Flake Schema, MySQL, PostgreSQL.

Confidential

Data Analyst/ ETL Developer

Responsibilities:

  • Developed stored procedures in MS SQL to fetch the data from different servers using FTP and processed these files to update the tables.
  • Responsible for Designing Logical and Physical data modeling for various data sources on Confidential Redshift.
  • Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in Redshift.
  • Assisted Statistics Department faculties on Data Mining, Statistical Modeling, predictive analytics projects involving SAS Enterprise Miner and SAS Enterprise Guide.
  • Involved in performance tuning, stored procedures, views, triggers, cursors, pivot, unpivot functions, CTE's
  • Developed and delivered dynamic reporting solutions using SSRS.
  • Involved in Normalization / De normalization techniques for optimum performance in relational and dimensional database environments.
  • Resolved the data type inconsistencies between the source systems and the target system using the Mapping Documents and analyzing the database using SQL queries.
  • Worked on ETL testing, and used SSIS tester automated tool for unit and integration testing.
  • Designed and created SSIS/ETL framework from ground up.
  • Created new Tables, Sequences, Views, Procedure, Cursors and Triggers for database development.
  • Build artifacts are deployed into Tomcat instances which were integrated using the Perl and Shell scripts.
  • Managed SAS/MATLAB/Mathematica/Maple software operations, application upgrades, patches, fixes, application maintenance and support of SAS components.
  • Created ETL Pipeline using Spark and Hive for ingest data from multiple sources.
  • Involved in using SAP and transactions done in SAP - SD Module for handling customers of the client and generating the sales reports.
  • Developed Shell/Perl Scripts for automation purpose.
  • Creating reports using SQL Reporting Services (SSRS) for customized and ad-hoc Queries
  • Coordinated with clients directly to get data from different databases.
  • Worked on MS SQL Server, including SSRS, SSIS, and T-SQL.
  • Designed and developed schema data models & Documented business workflows for stakeholder review.

Environment: ER Studio, SQL Server 2008, SSIS, Oracle, Business Objects XI, Rational Rose, Data stage, MS Visio, SQL, Crystal Reports 9

We'd love your feedback!