Senior Data Engineer Resume
Indianapolis, IN
SUMMARY
- Have good understanding and working experience in APIGEE Edge Platform, Restful web services using HTTP and SOAP web services.
- 5+ years of experience in SQL/TSQL .
- Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory .
- Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
- Posses' strong knowledge of database management in writing complex SQL queries, Stored procedures, database tuning, query optimization and resolving key performance issues.
- Creating complex Custom Views using Toad and SQL Developer to create data sets for Power BI dashboards.
- Creating powerful calculations, key metrics, and key performance indicators (KPIs).
- Involved in setting up Jenkins Master and multiple slaves for the entire team as a CI tool as part of Continuous development and deployment process.
- Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.
- Good understanding of Big Data Hadoop and Yarn architecture along with various Hadoop Demons such as Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster Manager, and Kafka (distributed stream-processing).
- Experience in Database Design and development with Business Intelligence using SQL Server 2014/2016, Integration Services (SSIS), DTS Packages, SQL Server Analysis Services (SSAS), DAX, OLAP Cubes, Star Schema and Snowflake Schema.
- Excellent communication skills with excellent work ethics and a proactive team player with a positive attitude.
- Domain Knowledge of Finance, Logistics and Health insurance.
- Strong skills in visualization tools Power BI, Confidential Excel - formulas, Pivot Tables, Charts and DAX Commands.
- Expertise in various phases of project life cycles ( Design, Analysis, Implementation, and testing ).
- Led database administration and database performance tuning efforts to provide scalability and accessibility in a timely fashion, provide 24/7 availability of data, and solve end-user reporting and accessibility problems.
TECHNICAL SKILLS
Big data Space: Hadoop, Hive, HBASE, Yarn, Flume, Impala, Oozie, Pig, Zookeeper, Spark, Elastic Search, MongoDB, Snappy, AWS, MapReduce, Sqoop, Avro, Kafka, Azure Data Lake, RedShift
Data Base: Oracle, DB2, MS Access, MY SQL, T - SQL, Spark SQL, and U-SQL Azure Data Lake Analytics, MS SQL Server, Teradata, NoSQL
IDE: Eclipse, IntelliJ IDEA, NetBeans, JDeveloper
Operating Systems: UNIX, LINUX, Mac OS, Windows, Variants
Web Technologies: HTML, CSS, JavaScript, AJAX, JSP, XML, DOM, XSLT
Programming Languages: Python, Scala, SQL, Shell Scripting, Pig Latin, Scala, C/C++
Hadoop Distribution: Horton Works, MapR, SPARK, Cloudera (CDH3, CDH4, and CDH5), Apache EMR
Analytics Tools: Tableau, Microsoft SSIS, SSAS, SSRS
Version controls: GIT, CVS, SVN
RDBMS: Teradata, MS SQL Server, MySQL, DB2, SAP BW, Oracle, SQL Azure
Reporting: Tableau, Power BI, PowerApps, Confidential Excel - formulas, Pivot Tables, Charts and DAX Commands, Toad
ETL Tools: Talend, Informatica, AB Initio
PROFESSIONAL EXPERIENCE
Senior Data Engineer
Confidential, Indianapolis, IN
Responsibilities:
- Analyze, design, and build Modern data solutions using Azure PaaS service to support visualization of data. Understand current Production state of application and determine the impact of new implementation on existing business processes.
- Migrated data from various sources to Azure.
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
- Created data Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
- Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
- Setup Continuous integration with Jenkins and make use of wide range of plugins available to setup smooth developer friendly workflows.
- Worked on Visual Studio, SQL Server, DAX queries, PowerApps Studio, PowerBI Desktop and Azure Analysis Services.
- Gather business requirements and translate them to technical requirements.
- Create Cosmos scripts which pulls data from upstream structured streams and include business logic & transformations to meet the requirements.
- Modify the existing cosmos scripts to incorporate the new business changes and validate the same.
- Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- To meet specific business requirements wrote UDF's in Scala and PySpark.
- Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
- Hands-on experience on developing SQL Scripts for automation purpose.
- Designed and generated OLAP cubes. Built reports, dashboards, and scorecards on OLAP cubes.
- Modified and maintained cube Dimensions, Hierarchies and adding Aggregations to Cube.
- Developed Spark applications using Spark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
- Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Environment: Azure Data lake, Data factory, Azure Databricks, Azure SQL database, Azure SQL Datawarehouse, Tableau, Power BI, Tableau, Spark SQL
Data Engineer
Confidential, Detroit, Michigan
Responsibilities:
- Monitored Full/Incremental/Daily Loads and support all scheduled ETL jobs for batch processing.
- Designed and architected scalable data processing and analytics solutions, including technical feasibility, integration, development for Big Data storage, processing and consumption of Azure data, analytics, big data (Hadoop, Spark), business intelligence (Reporting Services, Power BI), NoSQL, HDInsight, Stream Analytics, Data Factory, Event Hubs and Notification Hubs.
- Owned the solution design, technical architecture, and pricing piece of one of largest Government analytics initiatives with an approved budget of $363 million. Worked with various teams from Confidential Product Engineering, SAS Institute, Hortonworks, and System integrators to string together and manage a consortium of vendors who pitched the offering to the government. The consortium of Confidential was declared both T1(Technically most competitive) and L1(Price Wise most competitive). The Tender was awarded to this consortium in January 2017.
- Owns the Azure technical customer engagement including architectural design sessions and implementation of projects using big data use-cases, Hadoop-based design patterns and real time/stream analytics.
- Designed an end-to-end analytical landscape involving PowerBI dashboards connected to backend SQL Server 2016 system on Azure to enable a government infrastructure agency to analyze/detect fraud in over Rs 4.6 Billion worth of annual Tenders. This included conceptualizing the architecture from the Govt RFP, sizing the Azure Cloud infrastructure, designing, and setting up PowerBI connectivity all the way up to building the actual Tender dashboards for the paid PoC.
- Designed and built a Data Discovery Platform for a large system integrator using Azure HDInsight components. Used Azure data factory and data Catalog to ingest and maintain data sources. Security on HDInsight was enabled using Azure Active directory and
- Designed end to end Azure cloud-based analytics dashboard for a state government for showing real time updates for the recently their state assembly elections 2016. Solution utilized PowerBI, Enterprise Gateway and Azure SQL Server.
- Designed a machine learning program models(classifiers) for the state of Andhra Pradesh to predict dropouts in their primary and secondary schooling system. These models ingested data from population demographics, student historical performance/dropout rate and family economics to arrive at a prediction and probability for the dropout rate.
- Utilize U-SQL for data analytics/ data ingestion of raw data in Azure and Blob storage
- Performed thorough data analysis for the purpose of overhauling the database using ANSI-SQL.
- Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.
- Hands-on experience on developing SQL Scripts for automation purpose.
- Involved in SQL Server Configuration, Administration, Implementation and Troubleshooting for Business work. Migrated existing self-service reports and ad hoc reports to Power BI.
- Developed custom calculated measures using DAX in Power BI to satisfy business requirements.
Environment: ADS, Azure Blob Storage, Azure Data lake, Data factory, Azure Databricks, Azure SQL database, Azure SQL Datawarehouse, Netezza, Python, Spark SQL, Data Migration, SQL Server programming, Power BI, T-SQL, Talend
Data Engineer
Confidential, Chicago, IL
Responsibilities:
- Develop New Spark Sql ETL logics in Big Data for the migration and availability of the Facts and Dimensions used for the Analytics.
- Develop of Spark Sql application, Big Data Migration from Teradata to Hadoop and reduce Memory utilization in Teradata analytics.
- Requirement Gathering and Leading Team for the development of the Big Data environment and Spark ETL logics migrations.
- Develop the Spark Sql logics which mimics the Teradata ETL logics and point the output Delta back to Newly Created Hive Tables and as well the existing TERADATA Dimensions, Facts, and Aggregated Tables.
- Make sure Data is matched with TERADATA and SPARK Sql logics.
- Creating Views on Top of the HIVE tables and give it to customers for the analytics.
- Analyzing Hadoop cluster and different big data analytic tools including Pig, HBase and Sqoop.
- Worked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.
- Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache and stored the data into HDFS for analysis.
- Strong knowledge on creating and monitoring cluster on Hortonworks Data platform.
- Developed Unix shell scripts to load large number of files into HDFS from Linux File System
- Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
- Design, develop, test, implement and support of Data Warehousing ETL using Abinitio and Hadoop Technologies.
- Involved in performance tuning of the ETL process by addressing various performance issues at the extraction and transformation stages.
- Used Kubernetes to deploy scale, load balance, scale and manage docker containers with multiple names spaced versions.
Environment: Snowflake, Hadoop, HDFS, Python, Unix, Shell Scripting, Teradata, Spark SQL, AWS, Talend.
ETL Developer
Confidential
Responsibilities:
- Responsible for creating ETL mapping documents to define data flow from source system to target database.
- Worked with Source system Subject Matter Expert SME to ensure that the extracts are properly mapped.
- Worked on the backend SSAS cube development which was the source data for the tableau.
- Identified Facts, Dimensions, Levels / Hierarchies of Dimensional Modelling.
- Generated multiple Enterprise reports using SSRS from SQL Server Database (OLTP) and SQL Server Analysis Services Database (OLAP) and included various reporting features such as group by, drilldowns, drill through, sub-reports, navigation reports (Hyperlink) etc.
- Created different Parameterized Reports (SSRS 2008/2012) which consist of report Criteria in various reports to make minimize the report execution time and to limit the no of records required.
- Involved in development & Deployment of SSAS Cubes and Monitor Full and Incremental loads and support any issues.
- Implementing Dashboards and Score Cards using Confidential Performance Point Server and integrated with share point.
- Provide Operational Support to modify existing Tabular SSAS models to satisfy new business requirements.
- Advanced knowledge of Excel for using Pivot tables, Power Pivot, Complex formulas.
- Designed SSIS Packages to transfer data from flat files, Excel SQL Server using Business Intelligence Development Studio.
- Extensively used SSIS transformations such as Lookup, Derived column, Data conversion, Aggregate, Conditional split, SQL task, Script task and Send Mail task etc.
- Performed data cleansing, enrichment, mapping tasks and automated data validation processes to ensure meaningful and accurate data was reported efficiently.
- Creating dashboards Actuals vs Goals for Fundraising and Biomed, and Organization 360 with power BI.
- Developed complex calculated measures using Data Analysis Expression language (DAX).
- Used Sql Queries at the custom Sql level to pull the data in tableau desktop and validated the results in Tableau by running Sql queries in Sql developer and Google Big-Query.
- Attended daily stand-up meetings to update scrum master with what is achieved, what is being done and is there any impediment.
- Participated in the retrospective meetings and next sprint planning meetings.
- Involved in Onsite Offshore coordination to ensure the completeness of Deliverables.
Environment: Tableau desktop 8.1/8.2, tableau server 8.1/8.2, ETL, Talend, Microsoft SQL Server, Google Big-Query, SSIS, SSAS, Salesforce, Talend, DAX, Analysis Services, SSIS, T-SQL
Data Analyst
Confidential
Responsibilities:
- Involved in all the phases of Software Development and Life Cycle (SDLC) for the application.
- Involved in defining the source to target data mappings and data definitions.
- Created Report Models for ad-hoc reporting and analysis using SSAS.
- Performed small enhancements (data cleansing/data quality) in SSIS.
- Worked with business users to gather in-depth business requirements and keep track of changes in their requirements.
- Extensively used SQL for accessing and manipulating database systems such as writing complex queries, joins, stored procedures, user-defined functions, views, and indexes using SQL and PSQL programming.
- Worked on all types of transformations that are available in Power BI query editor.
- Worked on all kind of reports such as Yearly, Quarterly, Monthly, and Daily.
- Developed Excel Power View, Power Query and PowerPivot dashboards for data analysis.
- Used various sources to pull data into Power BI such as SQL Server, SAP BW, Oracle, SQL Azure etc.
- Created stored procedures and SQL queries to pull data into PowerPivot model.
- Scheduled Automatic refresh and scheduling refresh in Power BI service.
- Analyzed test results and reported necessary corrective actions to the deployment team. Aided in regression analysis.
Environment: MS Office (MS Word, MS Excel, MS Power Point), MS SQL Server 2008, TSQL, SSAS, SSIS, SSRS, ETL, OLAP Systems.