We provide IT Staff Augmentation Services!

Big Data Engineer Resume

2.00/5 (Submit Your Rating)

TX

SUMMARY

  • Around 10 years of extensive experience in Analysis, Design, Development and Implementation as a Big Data Engineer.
  • Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
  • Hands on experience in Azure Development and worked on Azure storage, Azure SQL Database, Azure AD, Azure Data Lake, Azure search, and notification hub.
  • Extensive experience in analyzing data using Big Data Ecosystems including HDFS, MapReduce, Hive & PIG.
  • Experience in Identifying and Resolving snowflake DB/data issues.
  • Knowledge of job workflow scheduling and monitoring tools like Oozie (hive, pig).
  • Extensive Python scripting experience for Scheduling and Process Automation.
  • Experience in development and design of various scalable systems using Hadoop technologies in various environments.
  • Good Knowledge in SQL, PL/SQL and Python coding.
  • Good working with data modeling tools like Erwin, Power Designer and ER Studio.
  • Experience in understanding the security requirements for Hadoop.
  • Excellent knowledge in Migrating servers, databases, and applications from on premise to AWS and Google Cloud Platform
  • Experience in data transformations using Map - Reduce, HIVE for different file formats.
  • Creating, documenting and maintaining logical and physical databasemodels in compliance with enterprise standards and maintained corporate metadata definitions for enterprise datastores within a metadata repository.
  • Experience in writing SQL queries and optimizing the queries in Oracle, DB2, Teradata and BigData.
  • Good working knowledge on NoSQL databases such as Hbase.
  • Extensive Knowledge on developing Spark SQL jobs by developing Data Frames.
  • Experience developing on - premise and Real Time processes.
  • Good experience in designing and implementing OLTP and OLAP data models.
  • Good working experience on using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Sound knowledge in Data Quality & Data Governance practices & processes.
  • Excellent understanding of best practices of Enterprise Data Warehouse and involved in Full life cycle development of Data Warehousing.
  • Expertise in data base programming (SQL, PL/SQL) DB2, Data basetuning and Query optimization.
  • Excellent knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch
  • Extensive experience in development of T-SQL, DML, DDL, DTS, Stored Procedures, Triggers, Sequences, Functions and Packages.
  • Good working knowledge on Snowflake and Teradata databases.
  • Experience in designing, developing, scheduling reports/dashboards using Tableau
  • Experience in Data Transformation and Data Mapping from source to target database schemas and also data cleansing.
  • Experience with Integration Services (SSIS), Reporting Service (SSRS) and Analysis Services (SSAS)
  • Strong experience in using MS Excel and MS Access to dump the data and analyze based on business needs.
  • Excellent Interpersonal and communication skills, efficient time management and organization skills, ability to handle multiple tasks and work well in a team environment.

TECHNICAL SKILLS

Big Data tools: Hadoop3.0, HDFS, Hive 2.3, Kafka1.1, Scala, Oozie4.3, Pig 0.17, HBase 1.2, Sqoop 1.4, AWS

Data Modeling Tools: Erwin 9.8/9.7, ER/Studio V17, Power Designer

Cloud Services: AWS, Amazon Redshift, AZURESQL, Azure Synapse, Azure Data Lake and GCP.

Programming Languages: SQL, T-SQL, UNIX shells scripting, PL/SQL.

Database Tools: Oracle 12c/11g, Teradata15/14, MDM.

ETL Tools: SSIS, Informaticav10 and Talend.

Project Execution Methodologies: JAD, Agile, SDLC, Waterfall, and RAD

Reporting tools: SQL Server Reporting Services (SSRS), Tableau, Crystal Reports, Strategy, Business Objects

Operating Systems: Confidential Windows 10/8/7, UNIX

PROFESSIONAL EXPERIENCE

Confidential, TX

Big Data Engineer

Responsibilities:

  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce, HBase, and Hive.
  • Worked on Software Development Life Cycle (SDLC), testing methodologies, resource management and scheduling of tasks.
  • Developed data pipeline using Pig, Sqoop to ingest cargo data and customer histories into HDFS for analysis.
  • Worked on Spark using Python and Spark SQL for faster testing and processing of data.
  • Worked with Azure BLOB and Data lake storage and loading data into Azure SQL Synapse analytics (DW).
  • Involved in running Hadoop streaming jobs to process terabytes of data.
  • Developed stored procedures/views in Snowflake and use in Talend for loading Dimensions and Facts.
  • Optimized the Pyspark jobs to run on KubernetesCluster for faster data processing
  • Developed custom Map Reduce programs for data analysis and data cleaning using pig Latinscripts.
  • Extracted data using Sqoop Import query from multiple databases and ingest into Hive tables.
  • Evaluated and improved application performance with Spark.
  • Performed land process to load data into landing tables of MDM Hub using external batch processing for initial data load in hub store.
  • Designed technical solution for real-time analytics using Kafka and HBase
  • Created post commit and pre-push hooks using Python in GIT repos.
  • Extensively used SQL* loader to load data from flat files to the database tables in Oracle.
  • Implemented Spark Sql to update queries based on the business requirements.
  • Acquired good understanding and experience of NoSQL databases such as HBase.
  • Used Git for version control, JIRA for project tracking and Jenkins for continuous integration.
  • Wrote scala and python scripts as required for spark engine.
  • Created and Configured Azure Cosmos DBTrigger in Azure Functions, which invokes the Azure Function when any changes are made to the Azure Cosmos DB container.
  • Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into Hbase.
  • Used Model Mart of Erwin for effective model management of sharing, dividing and reusing model information and design for productivity improvement
  • Analyzed the sql scripts and designed it by using PySpark SQL for faster performance.
  • Implemented Oozie workflow engine to run multiple Hive and Python jobs.
  • Developed Json Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Cosmos Activity.
  • Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala.
  • Created data access modules in python.
  • Provisioned Azure data lake store and azure data lake analytics, and leverage U-SQL to write federated queries across data stored in multiple azure services.
  • Migrated Snowflake database to Windows Azure and updating the Connection Strings based on requirement.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics . Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in InAzure Databricks.
  • Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL
  • Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB)
  • Wrote Map Reduce jobs using Pig Latin, Optimized the existing Hive and Pig Scripts.
  • Designed and developed standalone data migration applications to retrieve and populate data from Azure Table / BLOB storage to on premise SQL Server instances.
  • Created custom scripts for adding indexes to SQL Azure Tables.
  • Imported the complete data from RDBMS to HDFS cluster using Sqoop.
  • Created pipelines and schedule activities using Azure Data Factory.

Environment: Big Data3.0, Hadoop3.0, Map Reduce, Azure Data bricks, HBase1.2, Hive2.3, Pig0.17, Sqoop1.4, Spark SQL, Python, Azure BLOB Storage, Azure Data lake storage, Pyspark, MDM, Kafka1.1, GIT, SQL, Oracle12c, JIRA, NoSQL, CosmosDB, Erwin9.8, Oozie4.3, Scala.

Confidential, St. Louis, MO

Data Engineer

Responsibilities:

  • Worked on big data technologies Hive SQL, Sqoop, Hadoop and MapReduce
  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Used Stack Driver Monitoring in GCP to check the alerts of the applications that run on the Google Cloud Platform.
  • Created Hive tables on top of HBase using Storage Handler for effective OLAP analysis.
  • Built real time pipeline for streaming data using Kafka and Spark Streaming.
  • Worked on reading multiple data formats on HDFS using python.
  • Involved in converting Hive into Spark transformations using API’s like Spark SQL, Data Frames and python.
  • Experience working with Azure Blob Storage, Azure Data Lake, Azure Data Factory, Azure SQL, Azure SQL Datawarehouse, Azure Analytics, Polybase, Azure HDInsight, Azure Databricks.
  • Transform data by running a Python activity in Azure Databricks.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
  • Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark data bricks cluster.
  • Developed Sqoop scripts to import data from Oracle database and handled incremental loading on the point of sale tables.
  • Implemented logical and physical relational database and maintained Database Objects in the data model using Erwin
  • Worked on partitioning Hive tables and running the scripts in parallel to reduce run-time of the scripts.
  • Worked on creating End-End data pipeline orchestration using Oozie.
  • Used UDF’s to implement business logic in Hadoop by using Hive to read, write and query the Hadoop data in HBase.
  • Used Google Cloud Platform (GCP) services like Compute Engine, Cloud Functions, and Cloud Storage.
  • Provided guidance to development team working on PySpark as ETL platform
  • Worked on SQOOP scripts to transfer data from RDBMS to Hadoop Environment
  • Focused on performance tuning of Hive/Spark SQL for faster results
  • Extensively integrated the hivewarehouse with HBase
  • Developed Source to Target Matrix with ETLtransformationlogic for ETL team.
  • Worked on Google Cloud Platform Services like Vision API, Instances.
  • Wrote MapReduce code using python in order to get rid of certain security issues in the data.
  • Developed workflows in Oozie for business requirements to extract the data using Sqoop.
  • Designed & developed various Ad hoc reports for the finance (Oracle SQL, PL/SQL, SAS).
  • Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala.
  • Built various graphs for business decision making using Python matplotlib library.
  • Wrote Pig Scripts for sorting, joining, filtering and grouping data.
  • Converted into different Data formats for user/business requirements by streaming data pipeline from various sources Snowflake and unstructured data
  • Developed tools to automate some base tasks using Python.
  • Created permanent and volatile tables in Teradata and using SQL script to generate reports.
  • Configured, monitored and automated Google cloud Services as well as involved in deploying the content cloud platform using Google compute engine, Google storage buckets.
  • Analyzed all existing SSIS packages, SQL Server objects & new functional specs.

Environment: Big data3.0, Hive2.3, SQL, Sqoop1.4, PySpark, Hadoop3.0, Azure Data bricks, Map Reduce, Agile, GCP, OLAP, HBase1.2, Kafka1.1, Spark, python, Erwin9.8, Oozie4.3, Oracle12c, ETL, SQL, PL/SQL, SAS, Teradata, SSIS.

Confidential, Plymouth, PA

Data Engineer / Data Modeler

Responsibilities:

  • Participated in JAD sessions for design optimizations related to data structures as well as ETLprocesses
  • Connected to Amazon Redshift through Tableau to extract live data for real time analysis.
  • Established uniform Master Data Dictionary and Mapping rules for metadata, data mapping and lineage.
  • Extensively worked with Embarcadero ER/StudioRepositoryCheckout and Check in Models and sub models.
  • Automated AWS S3 data upload / download using Python scripts.
  • Analyzed and designed best fit logical and physical data models and relational database definitions using DB2.
  • Handled importing of data from various sources, performed transformations using Pig and loaded data into HDFS
  • Built and maintained SQL scripts, indexes, and complex queries for data analysis and extraction.
  • Designed and implemented Oracle PL/SQL store procedures, functions and packages for data manipulation and validation.
  • Implemented business logic using Python
  • Created and worked Sqoopjobs with incremental load to populate Hive External tables.
  • Used forward engineering approach for designing and creating databases for OLAP model
  • Developed data Mart for the base data in Star Schema, Snow-Flake Schema
  • Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
  • Leveraged AWS S3 as storage layer for HDFS.
  • Translated logical data models into physical database models,generated DDLs for DBAs
  • Created Complex SQL Queries using Views, Indexes, Triggers, Roles, Stored procedures and User Defined Functions worked with different methods of logging in SSIS.
  • Created or modified the T-SQL queries as per the business requirements.
  • Designed and Implement test environment on AWS.
  • Created naming standards for data attributes and metadata to track data source, load frequency, generated key values, and data dictionary.
  • Created 3NF business area data modeling with de-normalized physical implementation of data.
  • Performed extensive Data Analysis and Data Validation on different systems like DB2.
  • Wrote and executed unit, system, integration and UAT scripts in a data warehouse projects.

Environment: ER/Studio, Python, AWS S3, DB2, HDFS, SQL, Oracle, PL/SQL, ETL, Sqoop, Hive, OLAP, OLTP, HDFS, T-SQL, Pig, Amazon Redshift,Metadata, UAT,DBA, SSIS.

Confidential -Trenton, NJ

Data Analyst/Data Engineer

Responsibilities:

  • Performed Data Analysis and Data Profiling and worked on data transformations and dataquality rules.
  • Modeled new tables and added them to the existing data model using Power Designer as part of data modeling.
  • Wrote Python scripts to parse JSON documents and load the data in database.
  • Created customized report using OLAP Tools such as Crystal Report for business use.
  • Worked on Informatica development to load the source DB2 database to Target Cloudant database as per the requirements gathered from multiple business departments.
  • Created the data models for OLTP and Analytical systems.
  • Created SQL tables with referential integrity, constraints and developed queries using SQL, SQL*PLUS and PL/SQL
  • Connected to Oracle directly, created the dimensions, hierarchies, and levels on Tableaudesktop.
  • Created Logical and Physical EDW models and data marts.
  • Designed and maintained Metadata documents on team site for unison in design and implementation of Data Model.
  • Gathered and translated business requirements into detailed, production-level technical specifications detailing new features and enhancements to existing business functionality.
  • Involved in writing T-SQL working on SSIS, SSRS, SSAS, Data Cleansing, Data Scrubbing and Data Migration.
  • Involved in data profiling to integrate the data from different sources.
  • Developed Star and Snowflake schemas based dimensional model to develop the data warehouse.
  • Performed the Data Mapping, Data design (Data Modeling) to integrate the data across the multiple databases in to EDW.
  • Used advanced MicrosoftExcel to create pivot tables and pivot reporting, as well as use VLOOKUPfunction

Environment: Power Designer, Python, OLAP, OLTP, DB2, SQL, Informatica, PL/SQL, T-SQL, SSIS, SSRS, SSAS, Tableau, MS Excel.

Confidential, Houston, TX

Data Analyst

Responsibilities:

  • Collected, analyze and interpret complex data for reporting and/or performance trend analysis
  • Involved in extensive data validation by writing several complex SQL queries.
  • Involved in back-end testing and worked with data quality issues.
  • Handled performance requirements for databases in OLTP and OLAP models.
  • Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
  • Optimized data collection procedures and generated reports on a weekly, monthly, and quarterly basis.
  • Used Informatica features to implement Type I & II changes in slowly changing dimension tables.
  • Developed various UNIX Shell Scripts to generate ad hoc reports.
  • Scheduled Jobs for executing the stored SSIS packages which were developed to update the database on daily basis.
  • Developed Stored Procedures and complex Packages using PL/SQL.
  • Generated Reports using Global Variables, Expressions and Functions for the SSRS reports.
  • Managed all indexing, debugging and query optimization techniques for performance tuning using T-SQL.
  • Defined best practices for Tableau report development and effectively used data blending feature in tableau.
  • Extensively performed the gap analysis and impact analysis.
  • Developed Triggers, Functions, Cursors, Materialized Views and Procedures.
  • Preformed ad-hoc requests for clients using Excel and customized SQL queries to extract information.

Environment: SQL, OLAP, OLTP, Informatica, SSIS, UNIX, PL/SQL, SSRS, T-SQL, Tableau, MS Excel.

We'd love your feedback!