We provide IT Staff Augmentation Services!

Data Engineer Resume

2.00/5 (Submit Your Rating)

Titusville, NJ

SUMMARY

  • 8+ Years of experience as a Data Engineer/Data Analyst with Analysis, Design, Development, Testing, Customization, Bug fixes, Enhancement, Support, and Implementation of various web, stand - alone, client-server enterprise applications using MySQL workbench, Python, and Django in various domains.
  • Execute against an enterprise data governance framework, with a focus on improvement of data quality and the protection of the organization’s data assets through modifications to organization behavior, policies and standards, principles, processes, governance metrics, related tools, and data architecture
  • Experienced with full software development life cycle (SDLC), architecting scalable platforms, and object-oriented programming (OOPs) database design.
  • Experience in working on different Databases/Data warehouses like Teradata, Oracle, AWS Redshift, and Snowflake.
  • Performed Data Analysis using SQL, PL/SQL, Python, Spark, Databricks, Teradata SQL Assistant, SQL server management studio, and SAS.
  • Strong knowledge and use of development methodologies, standards, and procedures.
  • Proficient in writing Packages, Stored Procedures, Functions, Views, Materialized Views, and Database Triggers using SQL and PL/SQL in Oracle
  • Experience in developing web-based applications usingPython and Django.
  • Experience with Web Development, Amazon Web Services, Python, and the Django framework.
  • Good experience in developing web applications implementing MVT/MVC architecture using Django.
  • Experience in working onvarious applications using python integrated IDEs Sublime Text, PyCharm, NetBeans, Pydev, and Spyder.
  • Experience analyzing very large, complex, multi-dimensional data sets and developing analytic solutions, experience in predictive analytics using Python.
  • Strong knowledge and experience in Confidential Web Services (AWS) Cloud serviceslike EC2, and S3.
  • Working experience on AJAX framework to transform Datasets and Data tables into HTTP-serializable JSON strings.
  • Experience in writing Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on MySQL and PostgreSQL databases.
  • Excellent knowledge in creating Databases, Tables, Stored Procedures, DDL/DML Triggers, Views, User-defined data types, functions, Cursors, and Indexes using T-SQL.
  • Performing Admin activities onTableauServer Involved in writing MDX Scripts for the OLAP CUBES.
  • Interacted with business users for the conversion of Business functional requirements and development along with testing and production deployment.
  • Experienced T-SQL programming skills like creating Stored Procedures, Functions, Constraints, Querying, Joins, Keys, Indexes, Data Import/Export, Triggers, and Cursors.
  • Experienced in implementing CTE, anchor Recursive CTE, and temp tables, and effective DDL/DML Trigger to facilitate efficient data manipulation and data consistency as well as to support the existing applications.
  • Possess strong interpersonal and communication skills; contributed technical expertise as needed.

TECHNICAL SKILLS

Languages: Python, SQL, Shell Scripting, Java, PL/SQL, T-SQL

Frameworks: Django

IDE’s: Eclipse, PyCharm, PyDev. Microsoft Visio

Web Technologies: HTML5, CSS3, JavaScript, Angular.JS, Bootstrap, AJAX, JQuery, JSON

Cloud: AWS, Kubernetes, Openshift

BigData Technologies: Spark, Hive, Anaconda, HBase, MapReduce, Zookeeper, HDFS, Kafka

BI Tools: Tableau, Teradata

Web Services: REST, SOAP, HTTP, Apache, Tomcat Server, JSON, XML.

DataBase: SQL, MySQL, Oracle, Snowflake, JDBC.

Methodologies: Agile, Waterfall

Analytics: Power BI, ETL.

PROFESSIONAL EXPERIENCE:

Confidential, Titusville, NJ

Data Engineer

Responsibilities:

  • Develop an in-time peak/valley alert monitoring system by Teradata, SQL, and python. Design the whole logic of the statistician model, matrix setting, threshold algorithm design, and auto-trigger email sending system.
  • Built a series of functions to clean, repopulate, and pivot the original database in Teradata by SQL queries.
  • Hyperparameter in algorithm function to optimize the dataflow job and make the product meet requirements from the inner customers.
  • Design the interface and content in HTML for the sample alert email, including a table containing key information, a dashboard chart designed by a python in attachment, and a link to the terminal dashboard in an internal platform.
  • Built compliance and data quality checks pipeline using airflow, SQL, Teradata, and cloud functions.
  • Contribute to the technical architecture design, documentation, and implementation.
  • Followed the Agile development with Jira as a development management and issue tracking tool. Created a confluence page for developing best practices and project documentation.
  • Create programs using python to read real-time data from SDP (Streaming Data Platform) and perform analysis and load the data to Analytical Cloud Datawarehouse.
  • Migrate scripts and programs to AWS Cloud environment.
  • Migrating campaigns from Unica Affinium campaign marketing tool to Quantum.
  • Automate the process to send the data quality alerts to slack channel and email using Databricks, Python and HTML. This will alert users if there are any issues with data.
  • Perform data comparison between SDP(Streaming Data Platform) real-time data with AWS S3 data and Snowflake data using Databricks, Spark SQL, and Python.
  • Create and monitor production batch jobs that load analytical data to the data source tables on daily basis. And fixing and re-executing jobs when there is a job failure.
  • Alert data consumers about the delays in data loads to the data sources/tables using Slack bot API integration with Python code.
  • Create batch programs using UNIX shell script and Teradata BTEQ.
  • Create and manage versions of the scripts using Github.
  • Segment customers using the Unica Affinium campaign marketing tool.
  • Create segmentation reports (Test level Audit Reports) for each campaign.
  • Create fulfillment reports like Data and Business intent validation reports. Perform Ad-Hoc queries & extracting data from existing data stores.
  • Manage the entirety of the campaign’s logic, including audience segmentation, exclusions, and assignment of offers and channels.

Environment: Teradata, Unica Affinium Campaign, Teradata SQL Assistant, SQL, Python, Databricks, Snowflake, Redshift, Splunk, AWS, Spark, Quantum, HTML.

Confidential, Boston, MA

Data Engineer/Data Analyst

Responsibilities:

  • Perform Data Analysis on the analytical data present in AWS S3, AWS Redshift, Snowflake, and Teradata using SQL, Python, Spark, and Databricks.
  • Design, develop, implement and execute marketing campaigns for US card customers using Unica Affinium campaign, Snowflake, AWS S3, Pyspark, and Databricks.
  • Create scripts and programs to gain an understanding of data sets, discover data quality and data integrity issues associated with the analytical data and perform root cause analysis for those issues.
  • Write complex SQL scripts to analyze data present in different Databases/Data warehouses like Snowflake, Teradata, and Redshift.
  • Perform segmentation analytics for each campaign using database technologies present both on-premise (such as SQL, Teradata, UNIX) and on the Cloud platform using AWS technologies and Big Data technologies such as Spark, Python, and Databricks.
  • Create custom reports and dashboards using business intelligence software like Tableau and QuickSight to present data analysis and conclusions.
  • Create automated solutions using Databricks, Spark, Python, Snowflake, and HTML.
  • Involved in migration of datasets and ETL workloads with Python from On-prem to AWS Cloud services.
  • Created monitors, alarms, notifications, and logs for Lambda functions, Glue Jobs, EC2 hosts using CloudWatch, and used AWS Glue for the data transformation, validation, and data cleansing.
  • Hands-on work experience in writing applications on NoSQL databases like HBase, Cassandra, and MongoDB.
  • Developed Python code to gather the data from HBase and designs the solution to implement using PySpark.
  • Written Hive queries for data analysis to meet the business requirements and Designed and developed a User Defined Function (UDF) for Hive.
  • Implemented data ingestion strategies and scalable pipelines, data warehouse, and data mart structures in the Snowflake data platform.
  • Proficiency in writing complex SQL queries, and PL/SQL to write Stored Procedures, Functions, and Triggers.
  • Designed and implemented data pipelines that handle a lot of data streaming.
  • Documented database designs that include data models, metadata, ETL specifications, and process flows for business data project integrations.
  • Developed code for data ingestion and curation using Informatica IICS, Spark, and Kafka.
  • Automated complex processes using Databricks, SSIS and SQL.
  • Created and maintained Technical documentation for launching the Hadoop cluster and for executing Hive queries.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and python.
  • Used and configured multiple AWS services like RedShift, EMR, EC2, and S3 to maintain compliance with organization standards.

Environment: Python, PySpark, ETL, AWS, Glue, Lambda, EC2, CloudWatch, MySQL, SQL, NoSQL, PL/SQL, Teradata, Snowflake, Hive, Agile, and Windows.

Confidential, New York, NY

Data Engineer/ Data Analyst

Responsibilities:

  • Analyzing the Business requirements and System specifications to understand the Application.
  • Importing data from source files like flat files using Teradata load utilities like FastLoad, Multiload, and pump.
  • Designed Informatica mappings to propagate data from various legacy source systems to Oracle. The interfaces were staged in Oracle before loading to the Data warehouse.
  • Performed Data transformations using various Informatica Transformations like Union, Joiner, Expression, Lookup, Aggregate, Filter, Router, Normalizer, Update Strategy, etc.
  • Responsible for Tuning Report Queries and ADHOC Queries.
  • Wrote transformations for data conversions into required form based on the client requirement using Teradata ETL processes.
  • Extracting files from Teradata tables and placed in Hadoop distributed file in the system by java using Sqoop.
  • Experienced in Tuning SQL Statements and Procedures for enhancing the load performance in various schemas across databases. Tuning the queries to improve the performance of the report refresh time.
  • Created customized Web Intelligence reports from various sources of data.
  • Programmer analysts with expertise in Tableau Servers in ETL, Teradata, and other EDW data integrations and developments.
  • Involved in performance tuning on the source and target database and digital media data for querying and data loading.
  • Developed SQL scripts and shell scripts to move data from source systems to staging and from staging to Data warehouse in batch processing mode and understanding of Google cloud platforms by AWS to retrieve the data.
  • Involved in building and deploying cloud-based data pipelines and BI applications using AWS and GCP services.
  • Build Data Pipelines to ingest the structured data.
  • Exploring and performing POC on Google Cloud Platform (including Cloud Machine Learning, Cloud Data Store, Bigtable, Big Query, DataLab, and DataStudio).
  • Creating, loading, and materializing views to extend the usability of data.
  • Automated Unix shell scripts to verify the count of records added every day due to incremental data load for a few of the base tables to check for the consistency
  • Making modifications as required for reporting process by understanding the existing data model and being involved in retrieving data from relational databases.
  • Managing queries by creating, deleting, modifying, viewing, enabling, and disabling rules.
  • Loading the data into the warehouse from different flat files.

Environment: Teradata, MYSQL, Hadoop, Java, Python, Mainframes, Oracle, DB2, Teradata SQL Assistant, SQL Server, Flat files, SQL, AWS, Erwin, Linux, Shell Scripting

Confidential

Data Engineer

Responsibilities:

  • Currently serve as a cloud data engineer, managing their AWS environments and cloud foundry open-source PaaS.
  • Design and develop moderate to complex data pipeline using AWS Glue
  • Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0, and Jupiter Notebook 4. X, and Excel.
  • Setup repeatable patterns for Migration of data to AWS Data Lake from on promises and other cloud providers.
  • Work with AWS tools like Athena, Quick Sight, and Sage maker to provide platforms for Cyber Fraud and analytics team to perform necessary analytics and machine learning.
  • Perform the technical analysis on existing code in Unix/DataStage for migration to AWS.
  • Use Standard CI/CD pipeline to roll and deploy code.
  • Prepare root cause corrective action documents for the ETL issues.
  • Monitor environment health and services to meet compliance, and review daily reports and dashboards.
  • Stream applications and services logs to Splunk and Kibana for log analysis and archival of historical log data.
  • Setup repeatable pattern for migration of data
  • Automated backups of s3 buckets and was-rds databases and perform restoration as needed.
  • Architect and design static and dynamic websites using content management systems and build packs.
  • Worked on Amazon EC2, S3, IAM, VPC, RDS, SQS, Route53, and CloudFront and other services automation and orchestration services (Elastic Beanstalk, CloudFormation).
  • Worked with AWS Cloud platform and Configured AWS EC2 Cloud Instances using AMIs and launched instances concerning specific applications.
  • Created highly available and scalable infrastructure in AWS cloud by using various AWS services like EC2, VPC, Autoscaling, ELB, RDS, and Route53.
  • Deployed Docker engines in virtualized platforms for containerization of multiple apps.
  • Configured and deployed applications in AWS Linux environment in a virtual private cloud (VPC) and database subnet group for DB isolation within the Amazon RDS MySQL DB cluster, with strong experience with monitoring tools such as CloudWatch.
  • Responsible for Continuous Integration (CI) and Continuous Delivery (CD) process implementation using Jenkins along with Shell scripts to automate routine jobs.
  • Expert level knowledge of Splunk and Kibana data analysis, monitoring, security, and business intelligence tools

Environment: Python, Django, MVC, HTML5, XHTML, CSS3, JavaScript, Angular.JS, AWS, Bootstrap, AJAX, jQuery, JSON, REST, MySQL, SQL, Agile and Windows.

Confidential

Data Analyst

Responsibilities:

  • Involved in Designing the ETL process to Extract transform and load data from OLAP to the Teradata data warehouse.
  • Installed and configured Hadoop MapReduce, HDFS, and developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Experienced in defining job flows and importing & exporting data into HDFS and Hive using Sqoop.
  • Experienced in managing and reviewing Hadoop log files and running Hadoop streaming jobs to process terabytes of XML format data.
  • Load and transform large sets of structured, semi-structured, and unstructured data.
  • Responsible to manage data coming from different sources and supporting MapReduce Programs that are running on the cluster.
  • Involved in loading data from the UNIX file system to HDFS.
  • Installed and configured Hive and written Hive UDFs.
  • Involved in the performance tuning of ETL code review and analyzing the target-based commit interval for optimum session performance.
  • Involved in developing the complex strategies of ETL jobs based on dependencies.
  • Design of process-oriented UNIX script and ETL processes for loading data into the data warehouse.
  • Used Stored Procedures to create Database Automation scripts to create databases in different environments.
  • Generated different Space Reports in Teradata Manager to analyze different kinds of issues
  • Provided ongoing support by developing processes and executing object migrations, security and access privilege setup, and active performance monitoring.
  • Expertise in using Visual Explain, Index wizard, and Statistics wizard to tune the bad queries and analyze the plans implement the recommendations to gain the performance.
  • Wrote SQL queries and matched the data with databases and reports.
  • Tuned and Enhanced Universes with SQL Queries for the Report Performance.
  • Created complicated reports including sub-reports, graphical reports, formula base, and well-formatted reports according to user requirements, and by using data visualization we communicated data by encoding it as visual objects.
  • Developed several Informatica Mappings, Mapplets, and Transformations to load data from relational and flat file sources into the data mart.
  • Was responsible for troubleshooting, identifying, and resolving data problems, worked with analysts to determine data requirements and identify data sources, and provide estimates for task duration.
  • Involved in unit testing, systems testing, integrated testing, Data validation, and user acceptance testing.

Environment: Hadoop MapReduce, HDFS, Teradata, SSIS, SSRS, Oracle, MS SQL Profiler, JavaScript, XML, ErwinMS Office, Tableau Desktop, Tableau Server

We'd love your feedback!