Data Engineer Resume
Chicago, IL
SUMMARY:
- Over 8years of experience in BI, Data Analytics, Mining, Migration and Warehousing technologies.
- Gather and analyze business requirements and translate business needs into long - term business intelligence solutions leveraging R, Python, Microsoft SQL Server platform (SSIS, SSAS, SSRS, and Power-BI), Tableau and Hadoop frameworks.
- Full stack BI developer and ability to multi-task and manage multiple projects in cross-functional environments.
- Knowledge of Agile Project Management (Scrum and Kanban) methodology.
- Strong understanding of RDMBS. Proficiency in writing SQL Queries, Stored Procedures, Triggers and Views.
- Extensively worked on Normalization and De-Normalization techniques for OLTP and OLAP systems.
- Strong knowledge in data warehousing concepts such as: Star and Snowflake schema modeling, Fact, Dimensions Tables and Data Marts modeling.
- Excellent data wrangling skills: extract, transform and analyze data using a variety of analytical tools such as R, Python, Java, ETL tools and statistical techniques.
- Ability to quickly learn new programming languages.
- Knowledge (and understanding) of machine learning and data mining algorithms leveraging R and Python
- Utilized data science toolkits, such as NumPy, Pandas, Matplotlib, etc. in Python and dplyr, readr, tidyr, ggplot, googleVis, etc. in R and Stream API, Map Reduce in Java.
- Design, build and support APIs and services that are exposed to other internal teams using Java Springboot
- Knowledge of Linux, Git commands and various agile tools such as Rally, Jira and Aha.
- Expert in creating, configuring and fine-tuning ETL workflows designed in Server Integration Services (SSIS).
- Tailor solutions for Management Dashboards, Financial Reporting and Operational Business Intelligence.
- Experience in implementing advanced visualizations like: Pareto Charts, Bollinger Bands, Bump Charts and Funnel Charts using Tableau and Power BI desktop.
- Extensively worked on various databases and also gained experience in processing large data files using NoSQL databases like CouchBase and MongoDB.
- Experienced in developing data pipelines and analyzing raw data in distributed systems using Pig Latin scripts.
- Developed scripts in Hive using User Defined Functions for data-analysis and reporting.
- Authored over seven peer reviewed scholarly publications on developing complex MIMO control algorithms.
TECHNICAL SKILLS:
MS: SQL Server 2005/08/12, Postgresql 8/9, Oracle 11g/10g/9i/8i, MS Access, MySQL
Database Technologies/Tools: SQL Server Management Studio, SQL Server Integration Services, SQL Server Analysis Services, Erwin, Spring Batch
Reporting Tools: Microsoft SQL Server Reporting Services 2008/12, Tableau 6/7/8/9, Power BI (2015), Kibana 4, R Shiny, R Markdown & Flexdashboard
Operating Systems: Mac, Linux, Windows Server, UNIX, DOS
Programming Languages: Java 8, Python, R, C#, T-SQL
NoSQL Databases: CouchBase, MongoDB, Cassandra
Hadoop/Distributed Frameworks: Hive, Pig, Impala, Sqoop, Flume, Greenplum, Elasticsearch, Logstash
Cloud Computing Platforms: Amazon Web Services (EC2), Microsoft Azure, Cloudera, GE-Predix
Control Systems Tools: Matlab, Simulink, Labview
WORK EXPERIENCE:
Confidential
Data Engineer
Responsibilities:
- Design and develop new systems and tools to enable users to consume and understand data faster.
- Work across multiple cross-functional teams in high visibility roles and own the data solution end-to-end.
- Provide technology guidance, evaluate tools, perform POC’s and design solutions.
- Engineering structured and unstructured data from source systems to fit business needs
- Perform analysis on a variety of data sets to discover new insights, help clients understand trends in key metrics, and answer questions with data
- Createdproof-of-conceptprototype analytical solutions using R and Python for demonstration andevaluationpurposes on Predix platform
- Implemented Machine learning and Data Mining algorithms such as Decision Trees, Association rules, K-Means Clustering, and ANOVA tests using libraries in R and Python
- Updated Python code to help improve data pipeline infrastructure and make data manipulation fast and reliable
- Work closely with data scientists and collaborate with back-end and front-end engineering teams along with our customer solutions group
- Used time series management and data analysis Predix module to efficiently manage, distribute, ingest, and store time series data.
- Performed data visualization with Matplotlib, R Shiny, ggplot2 and Tableau
- Connected Hive tables with Tableau and performed data visualization for report
- Created multiple Hive tables with partitioning and bucketing for more efficient data access
- Used HiveQL for data transformation, cleansing and filtering
- Designed, developed, tested, and maintained Tableau functional reports based on requirements.
- Worked on Importing and Exporting data from Oracle to HDFS and Hive using Sqoop
- Performed source data ingestion, cleansing and transformation in Hadoop using PIG
- Designed and extended server side Java code to meet evolving data needs and increase analytics capabilities
- Documented conceptual/logical models and implemented physical models using Erwin and DB Schema.
- Developed prototypes and production systems using Java Enterprise technologies.
- Designed systems using the Springboot MVC Framework with an analytical mind set.
- Developed, prototyped and delivered high-performing REST API micro services
- Worked on data migrations using SSIS/Spring Batch to read data from multiple data sources and persist to Postgres database on the cloud for use by various micro services.
- Query asset data (i.e. Gas/Wind Turbine, Locomotive sensor data) using Graph Expression Language (GEL)
- Scripted (NoSQL) CouchBase N1QL queries to perform Ad-hoc analysis of business logic.
- Implemented Logstash to connect to a variety of sources and stream data to a central analytics system.
- Developed real-time summary and charted streaming data in Kibana.
Environment: Java Spring Boot, R, Python 3.x, Tableau 9, PostgreSQL 8/9, CouchBase Server 4.5, Predix, Logstash 2.3, Kibana 4, Erwin 9.1, DB Schema, Hive, Pig, Sqoop
Confidential, Chicago, IL
Data Engineer
Responsibilities:
- Designed and developed various SSIS packages to extract data from flat files, excel files and legacy systems. Transform data using various transformations provided by SSIS and load data into destinations like SQL Server.
- Communicated with clients to gather business requirements and held meetings to better understand them.
- Developed and optimized SQL objects such as Stored Procedures, Triggers, CTE’s and Views.
- Involved in complete life cycle of creating SSIS packages, building, deploying and executing them in both the environments (Development and Production).
- Analyzed the historical data and extracted required data using SSIS packages.
- Used various transformations such as: File Systems Tasks, FTP Tasks, Data Cleansing Tasks, Fuzzy lookups, etc.
- Implemented various control flow tasks using Foreach Loop and Sequential Containers and Bulk Insert tasks within SSIS whenever necessary.
- Responsible for ensuring proper implementation of ETL troubleshooting and error debugging processes like Event Handlers, Loggings, Checkpoints, Transactions and Package Configurations.
- Deployed packages using Project Deployment Model and responsible for scheduling SSIS jobs and automating the process using SQL Server Agent.
- Created reports to visualize key financial figures using LOD expressions and deploy them on Tableau Server
- Developed calculated fields using various Aggregate, Logical, Table and Date functions.
- Embedded live Tableau dashboards directly into Salesforce Canvas.
- Expanded Tableau mapping capabilities using custom Geocoding and Polygon maps.
- Worked with background images to create image based custom visualizations.
- Imported trading and derivatives data, web log data into Hadoop Distributed File System using Flume and Sqoop.
- Responsible for developing Pig scripts and Hive queries for data processing, cleansing and reporting.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with tables and historical metrics.
- Used Pig as ETL tool for transformations, event joins, filter bot traffic and perform pre-aggregations before storing the data onto HDFS and NoSQL databases.
- Worked with various input and storage file formats such as; SequenceFile, RCFile, Parquet, Avro and JSON for effective data importing and exporting to NoSQL databases such as MongoDB, Cassandra and traditional RDMBS using Sqoop.
- Involved in extraction of data from various databases (SQL, NoSQL) and HDFS file system. Generated interactive vibrant visualizations and reports by using Tableau 8/9, and published them securely on Tableau server.
Environment: Microsoft SQL Server 2012, Visual Basic 2010/12, .Net Framework, SSIS, Power-BI, Tableau 8/9, Oracle 11g/10g, MongoDB, Cassandra, Pig, Hive, Sqoop, MapReduce.
Confidential, Charlotte, NC
BI Developer
Responsibilities:
- Involved in data preparation and selection of data analysis strategy as per the requirement.
- Developed, monitored and deployed SSIS packages.
- Used CDC, FTP tasks and Slowly Changing Dimension, Fuzzy lookup transformations extensively.
- Analyzed historical data and extracted required data using various data and control Flow tasks.
- Implemented data pipelines in Python leveraging packages such as Numpy, Pandas, etc.
- Worked on developing batch jobs to consume data from RESTFUL web services to enable Tableau visuals.
- Performed data wrangling and manipulation using dplyr package in R to aid analysis.
- Developed visualizations using Cross tabs, Heat maps, Box and Whisker charts, Scatter Plots, Geographic Maps, Pie Charts, and Density Chart.
- Worked extensively with Tableau analysis features such as; Actions, Calculations, Parameters, Background Images, Maps, Trend Lines, Statistics and Table calculations.
- Implemented advanced visualizations like: Pareto Charts, Bollinger Bands, Bump Charts and Funnel Charts.
- Leveraged various data mining algorithms such as clustering, correlation and outlier detection in R language
- Created visualizations using Matplotlib, R Shiny and ggplot2.
- Develop interfaces for clients that allow them to manipulate, visualize, and analyze data in the R environment (e.g., via interactive web applications via Shiny).
- Held sessions on Tableau architecture and administration for interns.
- Provided expertise and guidance to other developers for optimal use of Filters, Sets and Groups.
- Involved in the scheduling and deployment of the reports.
- Involved in testing of the reports and preparing the test cases.
Environment: SQL-Server 2008 R2, R, SSIS, BIDS 2008, Tableau 6/7, Oracle 10g, DB2, Excel, Microsoft Access, Windows 7, ERWIN, FlatFiles, .NET Framework, TFS, R 2.10, Python 2.0
Confidential, Los Angeles, CA
BI Developer (SSIS/ SSRS/SSAS)
Responsibilities:
- Worked with business users to understand requirements and translated them as technical specifications related to different groups like Commercial Business Intelligence, ACE with respect to Policy, Premiums, Claims, Commissions, Deductions, Net pay, Contract Value etc.
- Created complex ETL packages using SSIS to extract data from staging tables to Data-Mart tables with Incremental loads and maintain the history of records using Slowly Changing Dimension Type 2.
- Implemented logging at package level and task level using SQL Server Profiler and XML File log providers.
- Transferred data from multiple data sources and file formats to SQL Server using various features like Data conversion, Derived columns, Lookup transformations, etc. in SSIS.
- Generated multiple enterprise reports (SSRS) from SQL Server Database (OLTP) and SSAS.
- Included various reporting features in SSRS such as: group by drilldowns, drill through sub-reports and parameterized view reports.
- Involved in dimensional modeling (Star &Snowflake) and creating data source views in SSAS.
- Involved in development & deployment of SSAS cubes and monitor full and incremental loads.
- Created aggregations, partitions, KPI’s and perspectives for a cube as per business requirements.
- Involved in creating, structuring and formatting using Report Designer and Report Builder in SSRS.
Environment: VS 2008, SQL Server 2008, Oracle 9i/10g, .NET Framework 3.5, SQL Server BIDS, SSIS, SSAS, SSRS, DTS, Report Builder, SharePoint, PerformancePoint Services, Power Pivot.
Confidential
Data Analyst
Responsibilities:
- Involved in upgrading and migration from SQL Server 2005 to SQL Server 2008 databases.
- Created tabular drill down, parameterized and cascaded sub-reports using SQL Server Reporting Services (SSRS).
- Responsible for creating dashboard reports using SSRS, and ad-hoc reporting using Report Builder.
- Created database objects using T-SQL scripts, stored procedures & views.
- Developed complex statistical and engineering solutions using Analysis ToolPak: StatPlus: mac LE in excel.
- Analyzed data by using various modules such as: Analysis of Variance (ANOVA), Regression Analysis, Statistical Charts, Time Series Analysis etc.
- Created various periodic reports using SSRS and implemented various visualizations like: Indicators, Embedded Charts, Sparklines, Databars, Chart Overlays, etc.
- Worked with event handlers to send custom e-mail notifications to business users as well as technical teams.
- Designed and deployed Tabular, Matrix, Chart, Linked, Drill-down, and Drill-through Reports with drop down list of parameterized options in SSRS.
- Responsible for indexing, debugging, optimization and performance tuning.
- Actively participated in troubleshooting, monitoring and maintenance of SQL Server data.
Environment: MS SQL Server 2000/05/08, Windows Server 2003, SSIS, SSAS, SSRS, DTS, SQL profiler, T-SQL, XML, Excel, Microsoft Visio 2003, MS Visual Studio 2008.