We provide IT Staff Augmentation Services!

Data Engineer Resume

5.00/5 (Submit Your Rating)

Malvern, PA

SUMMARY

  • Around 8+ years of professional experience in IT industry which includes BigDataand Hadoop Ecosystem along with Data Analyst,Data Mining,Data Acquisition and Data Validation.
  • Expert knowledge in SDLC (Software Development Life Cycle) and was involved in all phases in projects
  • Expertise in using Cloud based managed services fordatawarehousing Azure (Azure Data Lake Storage, Azure Data Factory).
  • Hands on experienceAzureservices log Analytics, OMS, ACR, AKS,AzureVM's, App Services,AzureStorage Accounts, Data Factory,AzureAPIM, Key Vaults, ARM Templates,AzureDNS, Traffic Manager, ASE service, Send Grid,AzurePush Notifications.
  • Experience in Dimensional Modeling using Star and Snowflake schema methodologies ofDataWarehouse and Integration projects
  • Excellent proficiency in Agile/Scrum and waterfall methodologies.
  • Experience in designing Star, Snowflakes schemas and database modeling using Erwin tool.
  • Experience on cloud - basedSnowflakeDataWarehouse anddatastewardship on statistics.
  • Extensive experience in using ER modeling tools such as Toad, Erwin and ER/Studio.
  • Experience in integration of variousdatasources with multiple Relational Databases like SQL Server, Teradata, MySQL, PostgreSQL and Oracle.
  • Experience inDataIngestion projects to injectdataintoDataLake using multiple sources systems using Talend, Bigdata.
  • Proficient indatagovernance,dataquality, metadata management, masterdatamanagement.
  • Experience in working with creating ETL specification documents, & creating flowcharts, process work flows anddataflow diagrams.
  • Experience in execution of Batch jobs through thedatastreams to SPARK Streaming.
  • Good knowledge in streaming applications using Apache Kafka.
  • Hands on experience in working with Tableau Desktop, Tableau Server and Tableau Reader in various versions.
  • Designed and developed a horizontally scalable APIs usingPythonFlask.
  • Extending HIVE and PIG core functionality by using custom UDF's.
  • Experience in designing both time driven anddatadriven automated workflows using Oozie.
  • Expertise in SQL Server Analysis Services (SSAS), SQL Server Reporting Services (SSRS) and SQL Server Integration Services.
  • Expertise in OLTP/OLAP System Study, Analysis and E-R modeling, developing Database Schemas like Star schema andSnowflakeschema used in relational, dimensional and multidimensional modeling.
  • In-depth knowledge of T-SQL, SSAS, SSRS, SSIS, OLAP, OLTP, BI suite, Reporting and Analytics.
  • DevelopedPythonand PySpark programs for data analysis on MapR, Cloud era, Horton works Hadoop clusters.
  • Strong experience in using MS Excel and MS Access to dump thedataand analyze based on business needs.
  • Extensive experience inDataVisualization including producing tables, graphs, listings using various procedures and tools such as Tableau, Power BI.
  • Experienced in working with both technical and non-technical team members
  • Effective team player with strong communication and interpersonal skills, possessing strong ability to adapt and learn new technologies and new business lines promptly, work ethics and the ability to work in a team efficiently.

PROFESSIONAL EXPERIENCE

Confidential, Malvern, PA

Data Engineer

Responsibilities:

  • Developed complete end to end Big-dataprocessing in Hadoop eco system.
  • Provided application support during the build and test phases of the SDLC for their product.
  • Used Oozie for automating the end to enddatapipelines and Oozie coordinators for scheduling the work flows.
  • Performeddataprofiling and transformation on the rawdatausing Pig, Python, and oracle
  • Developed predictive analytic using Apache Spark.
  • Worked on NoSQL and Cloud based large databases Hadoop (HDFS),Snowflaketo storedatafrom heterogeneous sources to provide Customer Experience.
  • Created dimensional model for the reporting system by identifying required dimensions and facts using Erwin.
  • AzureDataLake,Azure DataFactory,Azure DataBricks, Azure SQL Database and Azure SQLdatawarehouse environment
  • Designed and implemented database solutions in Azure SQLDataWarehouse, Azure SQL.
  • Enabled detailed monitoring and configured alerts forAzureservices using Application insights,AzureMonitor.
  • Organized all the resources usingAzureresource groups and deployedAzureVM's and app services using secure VNETS and Subnets.
  • Managed AzureDataLakes (ADLS) andDataLake Analytics and an understanding of how to integrate with other Azure Services.
  • Used PySpark to expose Spark API toPython.
  • Updating the existingsnowflakequeries based on changing needs.
  • Analyzing and validating thesnowflakedatabased on expectations.
  • Created workflows by writing Airflow DAG's for uploadingdataintosnowflakefrom AWS S3 buckets.
  • Developed and implemented adatapipeline using Kafka and Strom to storedatainto HDFS.
  • Created automated python scripts to convert thedatafrom different sources and to generate the ETL pipelines.
  • Worked with Snowflake SaaS for cost effectivedatawarehouse implementation on cloud.
  • Designed 3NFdatamodels for ODS, OLTP systems and dimensionaldatamodels using Star andSnowflakeSchemas.
  • Worked onSnowflakeenvironment to remove redundancy and load real timedatafrom variousdatasources into HDFS using Kafka.
  • Created an ETL pipeline for user steps analysis and scheduled workflow on daily basis using ApacheAirflow
  • Generated workflows throughApache Airflow,thenApache Ooziefor scheduling the hadoop jobs which controls large data transformations
  • Working overApache Spark,Kafka,Hadoop,Cassandraunder the environment ofApache Mesos. Also usedApache OozieandAirflow.
  • Involved in modeling (Star Schema methodologies) in building and designing the logicaldatamodel into Dimensional Models.
  • Designed and implemented data loading and aggregation frameworks and jobs that will be able to handle hundreds of GBs of json files, usingSpark,AirflowandSnowflake.
  • Created shared dimension tables, measures, hierarchies, levels, cubes and aggregations on MS OLAP/ OLTP/Analysis Server (SSAS).
  • Created indexes both non clustered and clustered indexes in order to maximize the query performance in T-SQL.
  • Created Hive External tables and loaded thedatainto tables and querydatausing HQL.
  • Generated multiple enterprise reports like SSRS and Crystal report, worked on Tableau.
  • Wrote MapReduce jobs to generate reports for the number of activities created on a day, during a dumped from the multiple sources and the output was written back to HDFS.
  • Used Sqoop to efficiently transferdatabetween databases and HDFS.
  • Performed the day to day activities of projects including meeting business stakeholders and understanding business requirements.
  • Extracteddatafromdatasources and analyzeddatato identify emerging trends and patterns through highly scalable and efficient analytical approaches.
  • PerformedDataAnalysis andDataProfiling and worked ondatatransformations anddataquality rules.
  • Participated in End to Enddatamining life cycle and used advanceddatamining techniques to extract thedatafrom different sources, conducted studies and generated rapid plots with different visualization tools.
  • Performed featuring engineering and statistical modeling using machine learning and deep learning techniques, optimized the model performance and deployed the model.
  • Developeddatawarehouse model inSnowflakefor over 100 datasets.
  • Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring.
  • Developed a fully automated continuous integration system using Git, MySQL and custom tools developed in Python and Bash.
  • Migrated on premise data toAzureStorage accounts and also worked on migrating DB from oracle toAzureSQL server.
  • Migrated on premise applications toAzureVM's, App Services and AKS container platform, managing deployments of infrastructure through Azure CLI using ARM templates.
  • Outguessed thedatafrom HDFS to Azure SQLdatawarehouse by building ETL pipelines.
  • Used ErwinDataModeler for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
  • Extensively used Python libraries forDataanalysis, and used efficient methods for handling null values, missing values, outliers.
  • Applied different regression techniques to predict the sales of different stores and e-commerce for various customer groups
  • Evaluate the model with adjusted R2 value, RMSE score and boosted the model performance with hyper parameter tuning, cross validation techniques with best parameters
  • Developed and managed the code with version control tool Gitlab repository.
  • Designed and developed user interfaces and customization of Reports using Tableau and designed cubes fordatavisualization, mobile/web presentation with parameterization and cascading.
  • Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of the database.
  • Mappings and Sessions based on business user requirements and business rules to loaddatafrom source flat files and RDBMS tables to target tables.

Environment: MS SQL, Hadoop, HDFS, Pig, Hive, Map Reduce, Python libraries (Numpy, Pandas, Ski-kit learn, SciPy, Matplotlib), PL/SQL, SQL, Snowflake, Azure,DataFactory, AzureDatalake, Azure, SQL, Azure BLOB, Azure Databricks, Git, Azure SQL, Azure Storage

Confidential, Ridgeland, MS

Software/ETL Engineer

Responsibilities:

  • Designed, Developed and Supported ETL Processes using Informatica Power Center
  • Worked on developing complex mappings based on the business requirements
  • Coordinated with source system owners, day-to-day ETL progress monitoring,Datawarehouse target schema Design (Star Schema) and maintenance
  • Performance tuned Informatica session for largedatafiles by increasing block size,datacache size, sequence buffer length and target-based commit interval
  • Performance tuned Informatica code to eliminate bottle necks
  • Used Debugger to troubleshoot cause of failures, invalid results, if any, after running a process.
  • Developed mappings for other Informatica ETL based projects.
  • Experience in using Teradata load utilities (FASTLOAD, MULTILOAD and TPUMP) to load huge volumes ofdatato Teradata RDBMS. Used BTEQ scripts for automating process
  • Coordinated in daily team meetings, technical code review meetings and interacted with businesspeople for better technical solutions and proposed ETL strategy based on Agile Methodologies.
  • Designed and developed IDQ solutions fordataprofiling and cleansing.
  • Used the Address Doctor to validate the address and performed exception handling, reporting and monitoring thedata.
  • Good knowledge in creating File to DB and DB to DB related interfaces.
  • Created and scheduled jobs using Control-M
  • Created Different Parameter Files and Changed Session Parameters, Mapping Parameters and Variables During Run Time
  • Modified and simplified the programs using UNIX and Netezza Workbench Aginity to run using only Teradata SQL to decrease run time and reduce database space usage
  • Used Debugger to troubleshoot cause of failures, invalid results, if any, after running a process.
  • Analyzed performance issues within applications/databases and make recommendations for Improvement.

Environment: Informatica Power Center 9.1, Informatica IDQ 9.5.1, Aginity Workbench, Teradata v2r12, SQL server 2008, Netezza, Windows XP, MS Excel, Control-M, Informatica Cloud.

We'd love your feedback!