docs.datacontroller.io/docs/excel.md

127 lines
6.8 KiB
Markdown
Raw Normal View History

2021-05-12 18:42:21 +00:00
---
layout: article
title: Excel
2024-01-23 17:34:36 +00:00
description: Data Controller can extract all manner of data from within an Excel file (including formulae) ready for ingestion into SAS. All versions of excel are supported.
2021-11-11 14:19:04 +00:00
og_image: https://docs.datacontroller.io/img/excel_results.png
2021-05-12 18:42:21 +00:00
---
# Excel Uploads
2024-01-23 17:34:36 +00:00
Data Controller supports two approaches for importing Excel data into SAS:
- Simple - source range in tabular format, with column names/values that match the target Table. No configuration necessary.
- Complex - data is scattered across multiple ranges in a dynamic (non-fixed) arrangement. Pre-configuration necessary.
Thanks to our pro license of [sheetJS](https://sheetjs.com/), we can support all versions of excel, large workbooks, and fast extracts. We also support the ingest of [password-protected workbooks](/videos#uploading-a-password-protected-excel-file).
2024-01-24 10:24:58 +00:00
2024-01-23 17:34:36 +00:00
Note that data is extracted from excel from _within the browser_ - meaning there is no need for any special SAS modules / products.
A copy of the original Excel file is also uploaded to the staging area. This means that a complete audit trail can be captured, right back to the original source data.
## Simple Excel Uploads
To make a _simple_ extract, select LOAD / Tables / (library/table) and click "UPLOAD" (or drag the file onto the page). No configuration necessary.
![](img/xltables.png)
The rules for data extraction are:
* Scan the each sheet until a row is found with all target columns
* Extract rows until the first *blank primary key value*
This is incredibly flexible, and means:
* data can be anywhere, on any worksheet
* data can start on any row, and any column
* data can be completely surrounded by other data
* columns can be in any order
* additional columns are simply ignored
!!! note
If the excel contains more than one range with the target columns (eg, on different sheets), only the FIRST will be extracted.
Uploaded data may *optionally* contain a column named `_____DELETE__THIS__RECORD_____` - if this contains the value "Yes", the row is marked for deletion.
If loading very large files (eg over 10mb) it is more efficient to use CSV format, as this bypasses the local rendering engine, but also the local DQ checks - so be careful! Examples of local (excel) but not remote (CSV) file checks include:
* Length of character variables - CSV files are truncated at the max target column length
* Length of numeric variables - if the target numeric variable is below 8 bytes then the staged CSV value may be rounded if it is too large to fit
* NOTNULL - this rule is only applied at backend when the constraint is physical (rather than a DC setting)
* MINVAL
* MAXVAL
* CASE
2024-01-24 10:24:58 +00:00
Note that the HARDSELECT_*** hooks are not applied to the rendered Excel values (they are only applied when actively editing a cell).
2024-01-23 17:34:36 +00:00
![image](https://user-images.githubusercontent.com/4420615/233036372-87b8dd02-a4cd-4f19-ac1b-bb9fdc850607.png)
2021-05-12 18:42:21 +00:00
2024-01-24 10:24:58 +00:00
### Formulas
2021-05-12 18:42:21 +00:00
It is possible to configure certain columns to be extracted as formulae, rather than raw values. The target column must be character, and it should be wide enough to support the longest formula in the source data. If the order of values is important, you should include a row number in your primary key.
Configuration is as follows:
![](img/excel_config_setup.png)
Once this is done, you are ready to upload:
<iframe width="560" height="315" src="https://www.youtube.com/embed/Reg803vI2Ak" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
2021-05-12 19:29:24 +00:00
The final table will look like this:
![](img/excel_results.png)
2024-01-23 17:34:36 +00:00
2024-01-24 10:24:58 +00:00
## Complex Excel Uploads
2024-01-23 17:34:36 +00:00
2024-01-24 10:24:58 +00:00
Through the use of "Excel Maps" you can dynamically extract individual cells or entire ranges from anywhere within a workbook - either through absolute / relative positioning, or by reference to a "matched" (search) string.
2024-01-23 17:34:36 +00:00
Configuration is made in the following tables:
2024-01-24 11:16:43 +00:00
1. [MPE_XLMAP_RULES](/tables/mpe_xlmap_rules) - detailed extraction rules for a particular map
2. [MPE_XLMAP_INFO](/tables/mpe_xlmap_info) - optional map-level attributes
2024-01-23 17:34:36 +00:00
2024-01-24 11:16:43 +00:00
Each [rule](/tables/mpe_xlmap_rules) will extract either a single cell or a rectangular range from the source workbook. The target will be [MPE_XLMAP_DATA](/tables/mpe_xlmap_data), or whichever table is configured in [MPE_XLMAP_INFO](/tables/mpe_xlmap_info).
2024-01-23 21:14:23 +00:00
To illustrate with an example - consider the following excel. The yellow cells need to be imported.
![](img/xlmap_example.png)
2024-01-24 11:16:43 +00:00
The [MPE_XLMAP_RULES](/tables/mpe_xlmap_rules) configuration entries _might_ (as there are multiple ways) be as follows:
2024-01-23 21:14:23 +00:00
|XLMAP_ID|XLMAP_RANGE_ID|XLMAP_SHEET|XLMAP_START|XLMAP_FINISH|
|---|---|---|---|---|
|MAP01|MI_ITEM|Current Month|`MATCH B R[1]C[0]: ITEM`|`LASTDOWN`|
|MAP01|MI_AMT|Current Month|`MATCH C R[1]C[0]: AMOUNT`|`LASTDOWN`|
|MAP01|TMI|Current Month|`ABSOLUTE F6`||
|MAP01|CB|Current Month|`MATCH F R[2]C[0]: CASH BALANCE`||
|MAP01|RENT|/1|`MATCH E R[0]C[2]: Rent/mortgage`||
|MAP01|CELL|/1|`MATCH E R[0]C[2]: Cell phone`||
2024-01-24 11:18:48 +00:00
To import the excel, the end user simply needs to navigate to the LOAD tab, choose "Files", select the appropriate map (eg MAP01), and upload. This will stage the new records in [MPE_XLMAP_DATA](/tables/mpe_xlmap_data) which will go through the usual approval process and quality checks. A copy of the source excel file will be attached to each upload.
2024-01-23 21:14:23 +00:00
2024-01-24 11:18:48 +00:00
The corresponding [MPE_XLMAP_DATA](/tables/mpe_xlmap_data) table will appear as follows:
2024-01-23 21:14:23 +00:00
| LOAD_REF | XLMAP_ID | XLMAP_RANGE_ID | ROW_NO | COL_NO | VALUE_TXT |
|---------------|----------|----------------|--------|--------|-----------------|
| DC20231212T154611798_648613_3895 | MAP01 | MI_ITEM | 1 | 1 | Income Source 1 |
| DC20231212T154611798_648613_3895 | MAP01 | MI_ITEM | 2 | 1 | Income Source 2 |
| DC20231212T154611798_648613_3895 | MAP01 | MI_ITEM | 3 | 1 | Other |
2024-01-24 09:37:49 +00:00
| DC20231212T154611798_648613_3895 | MAP01 | MI_AMT | 1 | 1 | 2500 |
| DC20231212T154611798_648613_3895 | MAP01 | MI_AMT | 2 | 1 | 1000 |
| DC20231212T154611798_648613_3895 | MAP01 | MI_AMT | 3 | 1 | 250 |
| DC20231212T154611798_648613_3895 | MAP01 | TMI | 1 | 1 | 3750 |
| DC20231212T154611798_648613_3895 | MAP01 | CB | 1 | 1 | 864 |
| DC20231212T154611798_648613_3895 | MAP01 | RENT | 1 | 1 | 800 |
| DC20231212T154611798_648613_3895 | MAP01 | CELL | 1 | 1 | 45 |
2024-01-23 17:34:36 +00:00
2024-01-24 20:51:46 +00:00
### Video
2024-01-23 17:34:36 +00:00
2024-01-24 20:51:46 +00:00
<iframe title="Complex Excel Uploads" width="560" height="315" src="https://vid.4gl.io/videos/embed/3338f448-e92d-4822-b3ec-7f6d7530dfc8?peertubeLink=0" frameborder="0" allowfullscreen="" sandbox="allow-same-origin allow-scripts allow-popups"></iframe>
2021-05-12 18:42:21 +00:00