We provide IT Staff Augmentation Services!

Sr. Software Engineer - Data Engineer Resume

5.00/5 (Submit Your Rating)

SUMMARY

  • Overall 12+ years of IT Experience including 5 years of experience in Hadoop ecosystem (Hadoop, MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Flume, Kafka, Cassandra, Zookeeper, Oozie, Airflow, Spark, PySpark, PyHive), Python, GCP, AWS along with 9+ years of experience in Business Intelligence Tools (SSRS, SSIS, SSAS), Power BI and Tableau in Development, Data warehouse, Test and Production Environments on various business domains.
  • Experience with Hadoop distributions and related technologies for ingesting, indexing, and analyzing large amounts of data.
  • Excellent understanding of Hadoop architecture and various components such as HDFS, YARN, High Availability, Name Node, Data Node and MapReduce programming paradigm.
  • Experience in Handling structured data/unstructured data/semi - structured data/Big data/Data Warehousing on distributed system.
  • Expertise in using various Hadoop infrastructures such as Map Reduce, Pig, Hive, Zookeeper, HBase, Sqoop, Oozie, Flume and spark for data storage and analysis.
  • Wrote Ad - hoc queries for analyzing the data using Hive QL.
  • Extensively worked on writing Hive queries for joining multiple tables based on business requirements.
  • Created multiple Hive tables, implemented partitioning, dynamic partitioning, and buckets in Hive for efficient data access.
  • Implemented Sqoop jobs for large sets of structured and semi-structured data migration between HDFS and/or other data storage like Hive or RDBMS.
  • Extracted data from log files and push into HDFS using Flume.
  • In - depth understanding of Spark Architecture including Spark Core, Spark SQL, RDDs, Data Frames, Spark Streaming, Spark MLib and Spark Real time Streaming.
  • Transforming and retrieving the data by using Spark, Impala, Pig, Hive, SSIS and Map Reduce.
  • Extensively used SQL, Numpy, Pandas, Spark, Hive for Data Analysis and Model building.
  • Involved in configuring Spark to optimize data process.
  • Developed Spark code using Spark SQL & Data Frames for aggregation.
  • Used NoSQL Database including Hbase, MongoDB, Cassandra.
  • Scheduled workflow using Oozie workflow Engine.
  • Experience setting up Amazon S3 bucket and Access control policies, S3 & Glacier LifeCycle rules.
  • Experience deploying Hadoop Applications on a persistent Elastic MapReduce(EMR) cluster through S3.
  • Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications.
  • Developed python code for creating different DAGs (tasks, dependencies) for workflow management by Installing and configuring Apache Airflow.
  • Google Cloud Platform (Google Cloud Storage, Big Query, Big Table, Cloud SQL, Pub/Sub) Lead SQL Data Integration and Hadoop Developer Roles and Responsibilities.
  • Expert knowledge in Data modeling - both normalized for OLTP and de-normalized for OLAP
  • Expert in generating on-demand and scheduled reports for business analysis or management decision using SQL Servers Reporting Services (SSRS), Tableau and POWER BI. Periodic reporting is done on a daily, weekly, monthly, and quarterly basis which helps the client.
  • Created various types of charts like Heat Maps, Geocoding, Symbol Maps, Pie Charts, Bar Charts, Tree Maps, Gantts, Circle Views, Line Charts, Area Charts, Scatter Plots, Bullet Graphs and Histograms in Table Desktop, Power BI and Excel to provide better data visualization.
  • Solid knowledge of Power BI and Tableau Desktop report performance optimization.
  • Created Excel reports, Dashboards & Performing Data validation activity VLOOKUP, HLOOKUP, Macros, formulas, index match, Slicer with (Pivot table, Get Pivot Data, Dashboards), Power View/Map and Heat Map.
  • T-SQL development skills in Objects creation such as Tables and Views, User Defined functions, Indexes, Stored Procedures, CTE, Cursors and Triggers using SQL Server 2008R 2/2012/2014.
  • Strongly experienced in Database Installation, Backup, Restoration, Linked Servers, and Maintenance Planning.
  • Proficient in Creating, Configuring and Fine-tuning SSIS Packages, SSRS reports and T-SQL queries using tools like SQL Profiler, Performance Monitor and Database Tuning advisor.
  • Configured SSIS packages using Package configuration wizard to allow packages run on different environments.
  • Proven proficiency in Data Transformations like Derived Column, Conditional Split, Aggregate, Union all, Merge join, Lookup, Sort & Execute SQL Task to load data into Data Warehouse.
  • Experience in project management methodologies such as Waterfall, Iterative, and Agile - Scrum.
  • Excellent communication, presentation, interpersonal skills, strong troubleshooting, and organizational skills.

TECHNICAL SKILLS

Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Flume, Kafka, Cassandra, Zookeeper, Oozie, Automic, Airflow, Spark, PySpark, PyHive.

Cloud: AWS (S3, ELB, EMR, EC2, RDS, RedShift), GCP (DataProc, Storage, PUB/SUB, Logging)

Databases: MS SQL server 2017/2016/ 2012/2008/2005 , MS Access, DB2

Data Warehousing & BI: SQL Server, Business Intelligence Development Studio, SSIS, SSRS, SSAS

Reporting Tools: SQL Server Reporting Service (SSRS), Power BI, Tableau Server, Tableau Server, Quicksight

ETL Tools: SQL Server Integration Services (SSIS), SQL Server DTS, BIDS Helper, Amazon Glue

SQL Server Tools: SQL server Management Studio, SQL server Query Analyzer, SQL server profiler

.NET technologies: VB.NET, C#.NET, ASP.NET, ADO.NET, Visual Studio .NET 2012/2010/2008/2005

Languages: C, C++ UNIX Shell Scripting, HTML, DHTML, XHTML, CSS, Java Script, VB Script, AJAX, JQuery, AngularJS, Angular, Node.js, Python.

Version Tools: Github, Source Tree, Visual Source Safe, Team Foundation Server, SVN, Smart CVS 7.1

Other Tools: MS Office Suite, Soap UI 5.0.0, Altova XMLSpy 2015, Beyond Compare 3

Operating Systems: Windows NT/98/2000/XP/Vista/8.1/10, Linux (CentOS, Ubuntu), Mac OS

PROFESSIONAL EXPERIENCE

Confidential

Sr. Software Engineer - Data Engineer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools such as HiveQL.
  • Developing the data pipelines using Python (Pyhive, OS) and HiveQL.
  • Extensively worked on writing HIVE queries for joining multiple tables based on business requirements.
  • Created multiple Hive tables, implemented partitioning, dynamic partitioning, and buckets in Hive for efficient data access.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Extracted files from Cassandra and MongoDB through Sqoop and placed in HDFS and processed.
  • Streamlining and expansion of Hive pipeline to accommodate additional client requirements while also to optimizing pre-existing Hive queries
  • Conversion and tuning of pipeline tasks from Hive to Spark SQL & Data Frames for vast performance improvements and efficiency.
  • Loaded files to Hive and HDFS from MongoDB.
  • Involved in configuring Spark to optimize data process.
  • Responsible for using Oozie to control workflow.
  • Configured Zookeeper, worked on Hadoop High Availability with Zookeeper failover controller, add support for scalable, fault-tolerant data solution.
  • Scripted Unix bash shell scripts as applicable.
  • Developed Spark, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Data sources are extracted, transformed, and loaded to generate CSV data files with Python programming and SQL queries.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
  • Hands on experience on AWS platform with EC2, S3 & EMR.
  • Experience setting up Amazon S3 bucket and Access control policies, S3 & Glacier Lifecycle rules.
  • Experience deploying Hadoop Applications on a persistent Elastic MapReduce (EMR) cluster through S3.
  • Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications.
  • Google Cloud Platform (Google Cloud Storage, Big Query, Big Table, Cloud SQL, Pub/Sub) Lead SQL Data Integration and Hadoop Developer Roles and Responsibilities.
  • Developed python code for creating different DAGs (tasks, dependencies) for workflow management by Installing and configuring Apache Airflow.
  • Built dashboards/Vizs to convey Stories using techniques for guided analytics, interactive dashboard design, and visual best practices.
  • Developed Tableau workbooks to perform year/year, quarter/quarter, YTD, QTD and MTD type of analysis.
  • Building, publishing customized interactive reports and dashboards, report scheduling using Tableau server.
  • Created action filters, parameters, and calculated sets for preparing dashboards and worksheets in Tableau.
  • Generated Tableau Dashboard with Groups, Hierarchies, quick/context/global filters, calculated fields, table calculations, parameters, bins, Aliases, Background images, Trend Lines, Reference lines, Dual axis, Combined Axis and Forecast on Tableau reports.
  • Restricted data for specific users using Row level security and User filters.
  • Created different KPI using calculated key figures and parameters
  • Developed Tableau data visualization using Cross tabs (Highlight tables), Heat maps, Box and Whisker charts, Scatter Plots, Geographic Map, Pie Charts, Tree Maps, Histograms, Combo charts, Bar Charts and Density Chart.
  • Developed Tableau workbooks from multiple data sources using Data Blending.
  • Cleaning and Blending multiple data sources to allow for different views on data in a single dashboard.
  • Involved in trouble shooting of performance issues which were associated with Tableau reports.
  • Designed and developed Power BI graphical and visualization solutions with business requirement documents and plans for creating interactive dashboards.
  • Utilized Power BI (Power View) to create various analytical dashboards that depicts critical KPIs such as legal case matter, billing hours and case proceedings along with slicers/dicers enabling end-user to make filters.

Environment: Big Data, Hadoop 2.9.2, Hive 2.3.6, Pig, Sqoop, Flume, MapR, HDFS, HBase, Cassandra, Spark, PySpark, Unix Shell, Zookeeper, Oozie, Automic, Apache Airflow, Cloudera, Python, AWS, GCP, Microsoft SQL Server, Oracle DB, Tera Data, SQL, TSQL, Tableau Desktop, Tableau Server (10.x,9.x), Power BI Desktop, Power BI Service, SSRS.

Confidential

.Net/SQL BI/ETL Developer

Responsibilities:

  • Resolving production issues raised by Business and ensuring high level of client satisfaction during delivery.
  • Data conversion from legacy to Policy Center in the form of payloads/XMLs created by using different mappers developed in C#.NET.
  • Developed different tools like Supplemental Data GUI, Error Report web application and Address Standardization (location lookup service) using C#.NET, ASP.NET.
  • Developed different reports like SPORT (Supplemental Data for Production Optimized Readiness Tool), Daily Trigger report, 100-day report, PAS error report, RM error report using SSRS and Power BI.
  • Creating SQL Server Integration Services (SSIS) packages to import data from heterogeneous data sources like SQL Server Text files (CSV, XLs, and XML).
  • Extract Transform Load (ETL) development Using SQL server 2008/2012, SQL 2008 Integration Services (SSIS).
  • Design and development of ETL Jobs using SQL Server Integration Services (SSIS) for data migration between different maintenance systems.
  • Enhancing and deploying the SSIS Packages from development server to production server.
  • Scheduling the SSIS packages and Jobs and check status of important jobs which run on daily basis. If one of the jobs fails, investigate the issue, and resolve the corresponding code causing it.
  • Configuring SSIS Catalogs for Development and UAT environments and using parameters to drive all configurations changes and deployments.
  • Develop from scratch design databases along with team lead and ensure that proper normalization rules are followed with proper use of primary and foreign keys in each table.
  • Creating XML by using Script task/Script Component or .Net process and loading the created XML into Guidewire Policy Center.
  • Creating and updating cubes (measures, dimensions, hierarchies, and relationships), defined calculation (calculated members, named sets and scripts), perspectives using SSAS and MDX.
  • Prepare high - level Architecture designs for the SSAS/SSRS reporting solutions.
  • Creating many drill through and drill down reports using SQL Server Reporting Services (SSRS) and Power BI.
  • Generating Developing Complex SSRS Reports involving Sub Reports, Matrix/Tabular Reports, Charts and Graphs, custom and parameterized reports using SSRS.
  • Scheduling report and exported in PDF format.
  • Deploying the reports in Report Manager, creating new Data Sources, Testing the reports values and documenting Test Scenarios, Test Case, and Test Results.
  • Designing SSRS reports with dynamic sorting, defining data source and subtotals for the report.
  • Generating Reports using Global Variables, Expressions and Functions for the reports using SSRS 2008 and Modified existing reports and made necessary changes.
  • Designed, developed, and tested various Power BI and Tableau visualizations for dashboard and ad-hoc reporting solutions by connecting from different data sources and databases.
  • Fixing the different issues/defects reported by users by making code changes as required.
  • Identify the performance bottlenecks in the packages/reports and resolve the same.
  • Attending daily stand-up calls and other project meetings.
  • Involved in complete Software Development Life Cycle (SDLC) process by analyzing business requirements and understanding the functional workflow of information from source systems to destination systems.
  • Maintaining documentation of data transformations for all ETL processes and managing weekly code review practices.
  • Experienced in Data Marts, Data warehousing, Operational Data Store ODS, OLAP, Star Schema Modeling, Snow-Flake Modeling, Fact and Dimensions Tables using MS Analysis Services.
  • Involved in Analyzing, designing, building, testing of OLAP cubes with SSAS 2005 and in adding calculations using MDX.

Environment: MSBI Suite (SSIS/SSRS/SSAS) 2012/2008/2005 , Microsoft SQL Server 2017/2016/ 2012/2008/2005 , T-SQL, C#.NET, ASP.NET, VB.NET, jQuery, AngularJS, Business Intelligent Development Studio 2008/2005, VisualStudio.NET 2012/2010/2008/2005 , IIS, Microsoft Visio, Power BI, Tableau, Azure, BIDS Helper 2008, Git, Source Tree, Team Foundation Server (TFS), Smart CVS 7.1, XML, SoapUI 5.0.0, Altova XMLSpy 2015, Beyond Compare 3, MS Office 2010/2007/2003, SharePoint, Query Analyzer, Query Optimizer, Performance Tuning, Agile Scrum, JIRA, Guidewire PolicyCenter Functional Knowledge.

We'd love your feedback!