complex excel uploads
Some checks are pending
Publish to docs.datacontroller.io / Deploy docs (push) Waiting to run

This commit is contained in:
zmaj 2024-01-23 17:34:36 +00:00
parent 81ce833ebe
commit 109c9735a5
14 changed files with 177 additions and 77 deletions

View File

@ -40,7 +40,7 @@ The Editor screen lets users who have been pre-authorised (via the `DATACTRL.MPE
1 - *Filter*. The user can filter before proceeding to perform edits.
2 - *Upload*. If you have a lot of data, you can [upload it directly](dcu-fileupload). The changes are then approved in the usual way.
2 - *Upload*. If you have a lot of data, you can [upload it directly](files). The changes are then approved in the usual way.
3 - *Edit*. This is the main interface, data is displayed in tabular format. The first column is always "Delete?", as this allows you to mark rows for deletion. Note that removing a row from display does not mark it for deletion! It simply means that this row is not part of the changeset being submitted.
The next set of columns are the Primary Key, and are shaded grey. If the table has a surrogate / retained key, then it is the Business Key that is shown here (the RK field is calculated / updated at the backend). For SCD2 type tables, the 'validity' fields are not shown. It is assumed that the user is always working with the current version of the data, and the view is filtered as such.

View File

@ -34,7 +34,7 @@ proc metalib;
run;
```
If you have other dates / datetimes / times you would like us to support, do [get in touch](http://datacontroller.io/contact)!
If you have other dates / datetimes / times you would like us to support, do [get in touch](https://datacontroller.io/contact)!

View File

@ -1,68 +0,0 @@
# Data Controller for SAS: File Uploads
Files can be uploaded via the Editor interface - first choose the library and table, then click "Upload". All versions of excel are supported.
Uploaded data may *optionally* contain a column named `_____DELETE__THIS__RECORD_____` - where this contains the value "Yes" the row is marked for deletion.
If loading very large files (eg over 10mb) it is more efficient to use CSV format, as this bypasses the local rendering engine, but also the local DQ checks - so be careful! Examples of local (excel) but not remote (CSV) file checks include:
* Length of character variables - CSV files are truncated at the max target column length
* Length of numeric variables - if the target numeric variable is below 8 bytes then the staged CSV value may be rounded if it is too large to fit
* NOTNULL - this rule is only applied at backend when the constraint is physical (rather than a DC setting)
* MINVAL
* MAXVAL
* CASE
Note that the HARDSELECT_*** hooks are not applied to the rendered Excel values (they are currently only applied when editing a cell).
![image](https://user-images.githubusercontent.com/4420615/233036372-87b8dd02-a4cd-4f19-ac1b-bb9fdc850607.png)
## Excel Uploads
Thanks to our pro license of [sheetJS](https://sheetjs.com/), we can support all versions of excel, large workbooks, and extract data extremely fast. We also support the ingest of [password-protected workbooks](/videos#uploading-a-password-protected-excel-file).
The rules for data extraction are:
* Scan the spreadsheet until a row is found with all the target columns (not case sensitive)
* Extract data below until the *first row containing a blank primary key value*
This is incredibly flexible, and means:
* data can be anywhere, on any worksheet
* data can contain additional columns (they are just ignored)
* data can be completely surrounded by other data
A copy of the original Excel file is also uploaded to the staging area. This means that a complete audit trail can be captured, right back to the original source data.
!!! note
If the excel contains more than one range with the target columns (eg, on different sheets), only the FIRST will be extracted.
## CSV Uploads
The following should be considered when uploading data in this way:
- A header row (with variable names) is required
- Variable names must match those in the target table (not case sensitive). An easy way to ensure this is to download the data from Viewer and use this as a template.
- Duplicate variable names are not permitted
- Missing columns are not permitted
- Additional columns are ignored
- The order of variables does not matter EXCEPT for the (optional) `_____DELETE__THIS__RECORD_____` variable. When using this variable, it must be the **first**.
- The delimiter is extracted from the header row - so for `var1;var2;var3` the delimeter would be assumed to be a semicolon
- The above assumes the delimiter is the first special character! So `var,1;var2;var3` would fail
- The following characters should **not** be used as delimiters
- doublequote
- quote
- space
- underscore
When loading dates, be aware that Data Controller makes use of the `ANYDTDTE` and `ANYDTDTTME` informats (width 19).
This means that uploaded date / datetime values should be unambiguous (eg `01FEB1942` vs `01/02/42`), to avoid confusion - as the latter could be interpreted as `02JAN2042` depending on your locale and options `YEARCUTOFF` settings. Note that UTC dates with offset values (eg `2018-12-26T09:19:25.123+0100`) are not currently supported. If this is a feature you would like to see, contact us.
!!! tip
To get a copy of a file in the right format for upload, use the [file download](/dc-userguide/#usage) feature in the Viewer tab
!!! warning
Lengths are taken from the target table. If a CSV contains long strings (eg `"ABCDE"` for a $3 variable) then the rest will be silently truncated (only `"ABC"` staged and loaded). If the target variable is a short numeric (eg 4., or 4 bytes) then floats or large integers may be rounded. This issue does not apply to excel uploads, which are first validated in the browser.

View File

@ -1,14 +1,62 @@
---
layout: article
title: Excel
description: Load Excel to SAS whilst retaining your favourite formulas! Data can be on any sheet, on any cell, even surrounded by other data. All versions of Excel supported.
description: Data Controller can extract all manner of data from within an Excel file (including formulae) ready for ingestion into SAS. All versions of excel are supported.
og_image: https://docs.datacontroller.io/img/excel_results.png
---
# Excel Uploads
Data Controller for SAS® supports all versions of excel. Data is extracted from excel from within the browser - there is no need for additional SAS components. So long as the column names match those in the target table, the data can be on any worksheet, start from any row, and any column.
The data can be completely surrounded by irrelevant data - the extraction will stop as soon as it hits one empty cell in a primary key column. The columns can be in any order, and are not case sensitive. More details [here](/dcu-fileupload/#excel-uploads).
Data Controller supports two approaches for importing Excel data into SAS:
- Simple - source range in tabular format, with column names/values that match the target Table. No configuration necessary.
- Complex - data is scattered across multiple ranges in a dynamic (non-fixed) arrangement. Pre-configuration necessary.
Thanks to our pro license of [sheetJS](https://sheetjs.com/), we can support all versions of excel, large workbooks, and fast extracts. We also support the ingest of [password-protected workbooks](/videos#uploading-a-password-protected-excel-file).
)
Note that data is extracted from excel from _within the browser_ - meaning there is no need for any special SAS modules / products.
A copy of the original Excel file is also uploaded to the staging area. This means that a complete audit trail can be captured, right back to the original source data.
## Simple Excel Uploads
To make a _simple_ extract, select LOAD / Tables / (library/table) and click "UPLOAD" (or drag the file onto the page). No configuration necessary.
![](img/xltables.png)
The rules for data extraction are:
* Scan the each sheet until a row is found with all target columns
* Extract rows until the first *blank primary key value*
This is incredibly flexible, and means:
* data can be anywhere, on any worksheet
* data can start on any row, and any column
* data can be completely surrounded by other data
* columns can be in any order
* additional columns are simply ignored
!!! note
If the excel contains more than one range with the target columns (eg, on different sheets), only the FIRST will be extracted.
Uploaded data may *optionally* contain a column named `_____DELETE__THIS__RECORD_____` - if this contains the value "Yes", the row is marked for deletion.
If loading very large files (eg over 10mb) it is more efficient to use CSV format, as this bypasses the local rendering engine, but also the local DQ checks - so be careful! Examples of local (excel) but not remote (CSV) file checks include:
* Length of character variables - CSV files are truncated at the max target column length
* Length of numeric variables - if the target numeric variable is below 8 bytes then the staged CSV value may be rounded if it is too large to fit
* NOTNULL - this rule is only applied at backend when the constraint is physical (rather than a DC setting)
* MINVAL
* MAXVAL
* CASE
Note that the HARDSELECT_*** hooks are not applied to the rendered Excel values (they are currently only applied when editing a cell).
![image](https://user-images.githubusercontent.com/4420615/233036372-87b8dd02-a4cd-4f19-ac1b-bb9fdc850607.png)
## Formulas
@ -26,5 +74,20 @@ The final table will look like this:
![](img/excel_results.png)
If you would like further integrations / support with excel uploads, we are happy to discuss new features. Just [get in touch](https://datacontroller.io/contact).
# Complex Excel Uploads
Through the use of "Excel Maps" this feature enables a series of cells / ranges to be dynamically extracted from anywhere within a workbook - either through absolute / relative positioning, or by reference to a "matched" (search) string.
Configuration is made in the following tables:
1. [MPE_XLMAP_RULES](tables/mpe_xlmap_rules.md) - detailed extraction rules for a particular map
2. [MPE_XLMAP_INFO](tables/mpe_xlmap_info.md) - optional map-level attributes
Each [rule](tables/mpe_xlmap_rules.md) will extract either a single cell or a rectangular range from the source workbook.

42
docs/files.md Normal file
View File

@ -0,0 +1,42 @@
# Data Controller for SAS: File Uploads
Data Controller supports the ingestion of two file formats - Excel (any version) and CSV.
If you would like to support other file types, do [get in touch](https://datacontroller.io/contact)!
## Excel Uploads
Data can be uploaded in regular (tabular) or dynamic (complex) format. For details, see the [excel](/excel).
## CSV Uploads
The following should be considered when uploading data in this way:
- A header row (with variable names) is required
- Variable names must match those in the target table (not case sensitive). An easy way to ensure this is to download the data from Viewer and use this as a template.
- Duplicate variable names are not permitted
- Missing columns are not permitted
- Additional columns are ignored
- The order of variables does not matter EXCEPT for the (optional) `_____DELETE__THIS__RECORD_____` variable. When using this variable, it must be the **first**.
- The delimiter is extracted from the header row - so for `var1;var2;var3` the delimeter would be assumed to be a semicolon
- The above assumes the delimiter is the first special character! So `var,1;var2;var3` would fail
- The following characters should **not** be used as delimiters
- doublequote
- quote
- space
- underscore
When loading dates, be aware that Data Controller makes use of the `ANYDTDTE` and `ANYDTDTTME` informats (width 19).
This means that uploaded date / datetime values should be unambiguous (eg `01FEB1942` vs `01/02/42`), to avoid confusion - as the latter could be interpreted as `02JAN2042` depending on your locale and options `YEARCUTOFF` settings. Note that UTC dates with offset values (eg `2018-12-26T09:19:25.123+0100`) are not currently supported. If this is a feature you would like to see, contact us.
!!! tip
To get a copy of a file in the right format for upload, use the [file download](/dc-userguide/#usage) feature in the Viewer tab
!!! warning
Lengths are taken from the target table. If a CSV contains long strings (eg `"ABCDE"` for a $3 variable) then the rest will be silently truncated (only `"ABC"` staged and loaded). If the target variable is a short numeric (eg 4., or 4 bytes) then floats or large integers may be rounded. This issue does not apply to excel uploads, which are first validated in the browser.
When loading CSVs, the entire file is passed to backend for ingestion. This makes it more efficient for large files, but does mean that frontend validations are bypassed.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 169 KiB

After

Width:  |  Height:  |  Size: 56 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 331 KiB

After

Width:  |  Height:  |  Size: 93 KiB

BIN
docs/img/mpe_xlmap_info.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 184 KiB

BIN
docs/img/xltables.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 101 KiB

View File

@ -29,7 +29,7 @@ The following resources contain additional information on the Data Controller:
Data Controller is regularly updated with new features. If you see something that is not listed, and we agree it would be useful, you can engage us with Developer Days to build the feature in.
* [Excel uploads](/dcu-fileupload/#excel-uploads) - drag & drop directly into SAS. All versions of excel supported.
* [Excel uploads](/excel) - drag & drop directly into SAS. All versions of excel supported.
* Data Lineage - at both table and column level, export as image or CSV
* Data Validation Rules - both automatic and user defined
* Data Dictionary - map data definitions and ownership

View File

@ -0,0 +1,29 @@
---
layout: article
title: MPE_XLMAP_INFO
description: The MPE_XLMAP_INFO table provides information about a particular XLMAP_ID
og_title: MPE_XLMAP_INFO Table Documentation
og_image: ../img/mpe_xlmap_info.png
---
# MPE_XLMAP_INFO
The MPE_XLMAP_INFO table provides information about a particular XLMAP_ID
The information is optional (unless you wish to configure a non-default target table).
![Screenshot](../img/mpe_xlmap_info.png)
See also:
* [MPE_XLMAP_RULES](tables/mpe_xlmap_rules.md)
## Columns
- `TX_FROM num`: SCD2 open datetime
- 🔑 `TX_TO num`: SCD2 close datetime
- 🔑 `XLMAP_ID char(32)`: A unique, UPPERCASE reference for the excel map.
- `XLMAP_DESCRIPTION char(1000)`: Map Description
- `XLMAP_TARGETLIBDS char(41)`: An alternative target table to which to upload the data. This MUST have the same structure as the [MPE_XLMAP_DATA](tables/mpe_xlmap_data.md) table.

View File

@ -0,0 +1,33 @@
---
layout: article
title: MPE_XLMAP_RULES
description: The MPE_XLMAP_RULES table contains the rules for mapping excel cells and ranges to XLMAP_IDs for upload into SAS
og_title: MPE_XLMAP_RULES Table Documentation
og_image: ../img/mpe_xlmap_rules.png
---
# MPE_XLMAP_RULES
The MPE_XLMAP_RULES table contains the rules for mapping excel cells and ranges to XLMAP_IDs for [upload into SAS](/excel)
![Screenshot](../img/mpe_xlmap_rules.png)
## Columns
- `TX_FROM num`: SCD2 open datetime
- 🔑 `TX_TO num`: SCD2 close datetime
- 🔑 `XLMAP_ID char(32)`: A unique, UPPERCASE reference for the excel map.
- 🔑 `XLMAP_RANGE_ID char(32)`: A unique reference for the specific range being loaded
- `XLMAP_SHEET char(32)`: The sheet name in which to capture the data. Examples:
- `Sheet2` - an absolute reference
- `/1` - the first tab in the workbook
- `XLMAP_START char(1000)`: The rule used to find the top left of the range. Use "R1C1" notation to move the target. Examples:
- `ABSOLUTE F4` - an absolute reference
- `RELATIVE R[2]C[2]` - In the XMLMAP_START case, this is the same as `ABSOLUTE B2`
- `MATCH P R[0]C[2]: My Test` - search column P for the string "My Test" then move 2 columns right
- `MATCH 7 R[-2]C[-1]: Top Banana` - search row 7 for the string "Top Banana" then move 2 rows up and 1 column left
- `XLMAP_FINISH char(1000)`: The rule used to find the end of the range. Leave blank for individual cells. Example values include those listed under XLMAP_START, plus:
- `BLANKROW` - search down (from XLMAP_START) until an entirely blank row is found, then choose the row above it
- `LASTDOWN` - The last non blank cell below the XLMAP_START cell. In the `RELATIVE R[x]C[x]` case, this is offset from from XLMAP_START rather than A1

View File

@ -7,7 +7,8 @@ nav:
- DC User Guide: dc-userguide.md
- Data Catalog: dcu-datacatalog.md
- Data Lineage: dcu-lineage.md
- File Uploads: dcu-fileupload.md
- Excel: excel.md
- File Uploads: files.md
- Filter Mechanism: filter.md
- Locking Mechanism: locking-mechanism.md
- Table Viewer: dcu-tableviewer.md
@ -32,7 +33,6 @@ nav:
- Dates / Datetimes: dcc-dates.md
- Dynamic Cell Dropdown: dynamic-cell-dropdown.md
- Emails: emails.md
- Excel Formulas: excel.md
- Formats: formats.md
- Groups: dcc-groups.md
- Libraries: libraries.md
@ -77,6 +77,7 @@ plugins:
- redirects:
redirect_maps:
'dci-evaluation.md': 'dci-deploysas9.md'
'dcu-fileupload.md': 'files.md'
# Repository
repo_name: 'dc/docs.datacontroller.io'