Sr Data Engineer Resume
Milwaukee, WI
SUMMARY
- Dedicated Data Engineer7+ years of experience in client/server presents distinctive competency in Business Analytics/ Business Intelligence/DataWarehouse /DataMarts/BigData/DataScientist/ Customer RelationshipManagement(CRM)/ Marketing Relationship Management (MRM)/ Sales & Marketing/ Supply ChainManagement etc.
- Around 5 years of Professional IT experience asDataEngineer/DataAnalyst in building data pipelines usingBigdataHadoop ecosystem, Spark, Hive, Sqoop Google cloud storage, Python, SQL, Tableau, GitHub and ETL tools.
- Experience migrating on - premises Cloudera Hadoop environment toAWSEMR and Cloudera CDP Cloud.
- Experienced in implementing the end-to-end Business Intelligence enterprise solutions that include BI architecture, software development, deployment, infrastructure maintenance.
- Experienced in implementing the end-to-end Business Intelligence enterprise solutions that includeBIarchitecture, software development, deployment, infrastructure maintenance.
- Expert in ETL transforming,datavisualization from diverse business areas in novel and insightful ways and Ability to lead full lifecycle solutions, interface wif clients.
- Extensive database programming experience in writingT-SQL, User DefinedFunctions, Triggers, Views, Temporary Tables Constraints, and Indexes using various DDL and DML commands.
- ExperiencedDataEngineerwif profound noledge in Python, SQL and visualization tools like Tableau, PowerBI.
- Experience wifETL(Extract, Transform, Load) using pig.
- Generated visualization dashboards, reports usingPowerBIand integrated various reports into single dashboards usingPowerBI.
- Experience optimizing ETL workflows using AWS Redshift.
- Expertise in Java, Scala and scripting languages likePython.
- Experienced using advance features likepowerpivot,powerview usingPowerBI
- Experienced in requirement analysis, application development, application migration and maintenance using Software Development Lifecycle (SDLC) andPython/Java technologies.
- Experience in working wif Amazon Web Services (AWS) using EMR, S3, Atana andRedshift.
- Technical expertise in BI, DW,Dataprofiling,DataCleansing,DataIntegrity,DataSecurity.
- Experience working wif Health Industry standards HIPAA,HL7.
- Experience in using and writing SQL queries, database creation, and writing stored procedures, DDL, DML SQL queries.
- Proven expertise in deploying major software solutions for various high-end clients meeting the business requirements such as bigdataProcessing, Ingestion,Analytics and Cloud Migration from on premtoAWSCloud usingAWSEMR,S3,DynamoDB.
- Excellent noledge in Java and SQL in application development and deployment.
- Established and sustained a number of ExistingBIdashboards, reports, and content packs. Customized PowerBIVisualizations and Dashboards in line wif the client's needs.
- In depth understanding/noledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node,DataNode and MRv1 and MRv2(YARN).
PROFESSIONAL EXPERIENCE
Sr Data Engineer
Confidential, Milwaukee, WI
Responsibilities:
- Designed, Developed and Deployeddatapipelines for movingdataacross various systems.
- Performed pre-processing and cleansing ofdatausing Apache Spark, loadeddatato Atana/Hive tables for populating visualization dashboards like Tableau.
- CustomizedBIportal and reports using Cognos SDK,PowerBIAPI and Tableau API.
- Worked on DevelopingETLpipelines on S3 Parquet files ondatalake using AWS Glue.
- As part of my day to day responsibilities, me use different Hadoop technologies such as hive, spark, Kafka, python, shell scripting, and SQL queries.
- DevelopedPowerAutomateflows toautomatetasks involving various data sources.
- Creating infrastructure usingairflowand dataflow for connecting different cloud sources.
- Created Dax Queries to generated computed columns inPowerBI
- Migrated reports and data models from Cognos toPowerBIplatform and Tableau server reports.
- Developed different Dashboards usingPowerBIas per the Business Requirements.
- Developed solutions for ETL using AWS Redshift and Create/Loaddatawarehouse tables/Views.
- Used MicrosoftPowerBIPowerQuery to extract data from external sources and modify data to by calculated columns as required in Excel and created SSIS packages to load excel sheets into data
- Developing pipelines inAirflowDirect Acyclic Graphs using Python.
- PowerAutomatepowerBIreports and tasks workflow to large scale systems securely.
- Tools utilized: MySQL, Hive, Java for application layers and HDFS, Sqoop, AWS for ETL processes.
- Worked on reading and writing multiple data formats like JSON, ORC, Parquet usingPySpark.
- Built Ad-HocPowerBIreports depending on business requirements was one of the major responsibilities.
- ManagedPowerBIPortal- Data refreshes/ User management and set up gateway data refresh.
- Created analytical dashboards to assist business users wif critical KPIs and facilitating strategic planning wif Tableau.
- Designing Datamart andDataWarehouse using Star Schema and Snowflake Schema.
- Lead a team of analysts for their PowerBI reporting and supervising the development.
- Performed T-SQL tuning and optimization of queries for reports that take longer execution time using MS SQL Profiler, Index Tuning Wizard, and SQL Query Analyzer.
- Created Calculated Columns and Measures inPowerBIand Excel depending on the requirement using DAX queries.
- Involved in developing and designing Dashboards usingPowerBIto show KPI in the project.
- Experience in achieving concurrency on large number of tables/views using AWSRedshift concurrency and scaling techniques using Spark streaming and Java methodologies.
- Developed Tabular Model infrastructure in SSAS for various Lines of Business and createdover different interactive Dashboards using various visualizations inPowerBIDesktop.
- Generated variousPowerBIreports and dashboards on Claims data weekly and monthly on client requirements.
- Designed developed and tested variousPowerBIvisualizations for dashboard and ad-hoc reporting solutions by connecting from different data sources and databases.
- Developed integration checks around thePySparkframework for Processing of large datasets.
- Worked on migration ofPySparkframework into AWS Glue for enhanced processing.
- Have also been using AWS cloud technologies such as S3 fordatastorage, EMR fordataprocessing, EC2 for virtual Linux instances, CloudWatch for log analysis and Lambda functions for triggering various jobs in AWS cloud services.
- Responsible to Build the ETL Pipelines (Extract, Transform, Load) fromdatalake to different databases based on the requirements.
- Performed theDataMapping,Datadesign (DataModeling) to integrate thedataacross the multiple databasesintoEDW.
- Build the infrastructure required for optimal extraction, transformation, and loading (ETL) ofdatafrom a wide variety ofdatasources like Teradata and Oracle using Spark, Python, Hive, Kafka and otherBigdata technologies.
- ExportedDatainto Snowflake by creating Staging Tables to loadDataof different files from Amazon S3.
- Worked on the code transfer of a quality monitoring program fromAWSEC2 toAWS.
- Analyzed services-level trends and forecasts, and identified high-value customers by designing an interactive report to analyze and visualize thedatausing PowerBI
- As a part ofDataMigration, wrote many SQL Scripts for Mismatch ofdataand worked on loading the historydatafrom Teradata SQL to snowflake.
- Microsoft PowerBI, SQL Server, SSIS, SSAS, SSRS, T-SQL
Environment: Hadoop, Spark, Redshift, Scala, PowerBI, Apache Spark, PySpark, python, Teradata, ETL, AWS, Snowflake, T-SQL, Hive, Aorta, Sqoop, GCP, Google cloud storage, BigQuery, Dataproc, dataflow, SQL, DB2, UDP, GitHub, Tableau, Lookers etc.
Data engineer
Confidential, Tempa, FL
Responsibilities:
- Developed complex transformations in Spark and Hive for calculating insurance claims and returns of the policies for forecasting the customer demand.
- Managed scheduled data transfers usingAirflow.
- Design and create ETL packages using SSIS (SQL Server Integration Services) toautomateETL processes.
- Developed SQL scripts to Upload, Retrieve, Manipulate and handle sensitivedata(National Provider IdentifierDataI.e. Name, Address, SSN, Phone No) in Teradata, SQL Server Management Studio and Snowflake Databases for the Project.
- Optimized the PySpark jobs to run onKubernetesCluster for fasterdataprocessing.
- Developed python scripts and extensively useddatastructures (such as lists, dictionaries) and libraries to load MLmodel’sdataoutput intoAWSRedis.
- Exported the analyzeddatato the relational databases using Sqoop for visualization and to generate reports fortheBIteam.
- Implemented Role based access security using MicrosoftPowerBI.
- Developed Tableau and MicrosoftPowerBIDashboards to showcase the result of data analysis from the database.
- Developing data processing tasks usingPySparksuch as reading data from external sources, merge data, perform data enrichment and load in to target data destinations.
- Utilized Tableau to design dynamic dashboards and stories using data from SQL Server, Excel and CSV data sources. DeployingPowerBIdashboards toPowerBImobile apps and web
- Worked onETLProcessing which consists ofdatatransformation,datasourcing and mapping, Conversion, and loading.
- Currently working as an Architect on a new cloud implementation, responsible for building solutions using AzureDevOps for migration of their disparate RDBMS data sources using AWS DMS into Snowflake.
- Creating and puttingdatainto Hive tables for dynamically addingdataintodatatables forEDWtables and historical metrics utilizing partitioning and bucketing.
- Worked on DevelopingETLpipelines on S3 Parquet files ondatalake using AWS Glue.
- Performed T-SQL tuning and optimization of queries for reports that take longer execution time using MS SQL Profiler, Index Tuning Wizard, and SQL Query Analyzer
- Developed T-SQL stored procedures, functions, and views to implement the business logic for front end applications, reports, andETLprocesses.
- Implemented, designed and coded python user defined functions inPySpark.
- Performed T-SQL tuning and optimization of queries for reports that take longer execution time using MS SQL Profiler, Index Tuning Wizard, and SQL Query Analyzer
- Worked on design and development of Unix Shell Scripting as a part of theETLProcess to automate the process of loading.
- Develop a RESTful API to provide access todatain HBase. Involved in managing and reviewing Hadoop log files.
Environment: MapReduce, PySpark, python, HDFS, Hive, Pig, Impala, ETL, T-SQL, Hue, Sqoop, Kafka, Oozie, YARN, Spark, AWS, Spark SQL (DataFrames and Dataset), Spark Streaming.
Data engineer
Confidential, Phoenix, AZ
Responsibilities:
- Responsible for laying architectural specification from Business requirements at global level technical design document.
- Development ofHL7requirements and specifications.
- Explored PySpark framework on AzureDataDatabricks for improving the performance and optimization of the existing algorithms in Hadoop using PySpark Core, Spark SQL and Spark Streaming APIs.
- DevelopedETLusing Microsoft toolset (SSIS, TSQL, MS SQL Server) to implementType 2 ChangeDataCapture process for various dimensions.
- Creation and Maintenance ofHL7and Flat File Specifications documents.
- Involved in Development of PySpark applications to process and analyze textdatafrom emails, complaints, forums, and click streams to achieve comprehensive customer care.
- DesignedAWSCloud Formation templates to create VPC, subnets, NAT to ensure successful deployment of Web applications and database templates.
- Implemented CARS (Customer Anti-Money Laundering Risk Scoring) and Transaction
- Monitoring (TM) Model requirements and played key role indatasource requirement analysis,ETLDataStage code development and deployment.
- Extracted structureddatafrom multiple relationaldatasources asDataFrames in Spark SQL on Databricks.
- Involved indatamodeling, ingestingdatainto Cassandra using CQL, java APIs and other drivers.
- Implemented CRUD operations using CQL on top of Cassandra file system.
- Worked onETLtasks like pulling, pushingdatafrom and too various servers.
- Transformed theDataFrames as per the requirements ofdatascience team.Involved in accessing the hive tables using Hive Context and transform thedataand store it to HBase.
- Implemented HRdatapipeline from PostgreSQL to Hadoop utilizing CDC functionality.
- Worked on a POC to compare processing time of Impala wif Spark SQL for efficiency batch processing and History / Delta Lake architecture.
- Developed Spark Applications for various business logics using Python.
Environment: Cloudera (CDH5), python, HDFS, PySpark, T-SQL, Impala, ETL, Kudu, AWS, Parquet, hive 2.2.0, Kafka 1.1.0, Sqoop, Shell Scripting, Spark 2.0, Glue, Redshift, PostgreSQL, Cassandra, Linux - Cent OS, Map Reduce, Scala 2.10.4, Eclipse.
Software Engineer
Confidential, Pittsburgh, PA
Responsibilities:
- Involved in requirement gathering wif different business teams to understand their analytical/reporting needs.
- Redesigned and improved existing procedures, triggers, UDF, and views wif execution plans, SQL profiler, and DTA.
- Involved in writing SQL scripts, stored procedures and functions and debugging them and Scheduling SQL workflows to automate the update processes.
- Wrote T-SQL stored procedures and complex SQL queries to acquire the business logic.
- Created SSIS packages to extractdatafrom OLTP databases; consolidated and populateddatainto the OLAP database.
- Utilized various transformations such as lookup, derived column, conditional split, fuzzy lookup fordatastandardization.
- Work wif Oracle technologies andETLtools such as Ab Initio.
- Debugged SSIS packages wif error log details, checkpoints, breakpoints, anddataviewers.
- Scheduled and maintain packages by daily, weekly, and monthly using SQL ServerAgent in SSMS.
- Created SSRS reports to produce in-depth business analyses in SSRS.
- Created slicers in PowerBI for interactive reports and designs. It also provided filter options and implemented business logic at chart level using advanced set analysis and aggregation functions.
Environment: Microsoft SQL Server 2012, SSDT, SSMS, T-SQL, SSIS, SSRS, SSAS,ER-Win, SharePoint, Power BI, TFS, DTA, Tableau, C#.Net, Visual Studio, CSS, HTML.