docs.datacontroller.io/docs/dcu-fileupload.md

69 lines
4.1 KiB
Markdown
Raw Normal View History

2018-09-28 12:12:32 +00:00
# Data Controller for SAS: File Uploads
2023-04-19 09:45:18 +00:00
Files can be uploaded via the Editor interface - first choose the library and table, then click "Upload". All versions of excel are supported.
2018-09-28 12:12:32 +00:00
2022-02-24 11:36:57 +00:00
Uploaded data may *optionally* contain a column named `_____DELETE__THIS__RECORD_____` - where this contains the value "Yes" the row is marked for deletion.
2023-04-19 09:45:18 +00:00
If loading very large files (eg over 10mb) it is more efficient to use CSV format, as this bypasses the local rendering engine, but also the local DQ checks - so be careful! Examples of local (excel) but not remote (CSV) file checks include:
* Length of character variables - CSV files are truncated at the max target column length
* Length of numeric variables - if the target numeric variable is below 8 bytes then the staged CSV value may be rounded if it is too large to fit
* NOTNULL - this rule is only applied at backend when the constraint is physical (rather than a DC setting)
* MINVAL
* MAXVAL
* CASE
Note that the HARDSELECT_*** hooks are not applied to the rendered Excel values (they are currently only applied when editing a cell).
![image](https://user-images.githubusercontent.com/4420615/233036372-87b8dd02-a4cd-4f19-ac1b-bb9fdc850607.png)
2021-03-20 13:56:16 +00:00
## Excel Uploads
2022-07-11 17:57:12 +00:00
Thanks to our pro license of [sheetJS](https://sheetjs.com/), we can support all versions of excel, large workbooks, and extract data extremely fast. We also support the ingest of [password-protected workbooks](/videos#uploading-a-password-protected-excel-file).
2021-03-20 13:56:16 +00:00
The rules for data extraction are:
* Scan the spreadsheet until a row is found with all the target columns (not case sensitive)
2022-02-24 11:36:57 +00:00
* Extract data below until the *first row containing a blank primary key value*
2021-03-20 13:56:16 +00:00
This is incredibly flexible, and means:
* data can be anywhere, on any worksheet
* data can contain additional columns (they are just ignored)
* data can be completely surrounded by other data
A copy of the original Excel file is also uploaded to the staging area. This means that a complete audit trail can be captured, right back to the original source data.
2021-10-07 15:43:22 +00:00
!!! note
2022-02-24 11:36:57 +00:00
If the excel contains more than one range with the target columns (eg, on different sheets), only the FIRST will be extracted.
2021-10-07 15:43:22 +00:00
2020-04-16 20:06:52 +00:00
## CSV Uploads
2018-09-28 12:12:32 +00:00
The following should be considered when uploading data in this way:
- A header row (with variable names) is required
2023-04-18 08:40:08 +00:00
- Variable names must match those in the target table (not case sensitive). An easy way to ensure this is to download the data from Viewer and use this as a template.
2018-09-28 12:12:32 +00:00
- Duplicate variable names are not permitted
- Missing columns are not permitted
- Additional columns are ignored
2022-06-07 11:17:19 +00:00
- The order of variables does not matter EXCEPT for the (optional) `_____DELETE__THIS__RECORD_____` variable. When using this variable, it must be the **first**.
- The delimiter is extracted from the header row - so for `var1;var2;var3` the delimeter would be assumed to be a semicolon
- The above assumes the delimiter is the first special character! So `var,1;var2;var3` would fail
2021-03-20 13:56:16 +00:00
- The following characters should **not** be used as delimiters
- doublequote
- quote
- space
- underscore
2018-09-28 12:12:32 +00:00
2021-03-20 13:56:16 +00:00
When loading dates, be aware that Data Controller makes use of the `ANYDTDTE` and `ANYDTDTTME` informats (width 19).
2021-05-12 20:22:35 +00:00
This means that uploaded date / datetime values should be unambiguous (eg `01FEB1942` vs `01/02/42`), to avoid confusion - as the latter could be interpreted as `02JAN2042` depending on your locale and options `YEARCUTOFF` settings. Note that UTC dates with offset values (eg `2018-12-26T09:19:25.123+0100`) are not currently supported. If this is a feature you would like to see, contact us.
2018-09-28 12:24:32 +00:00
!!! tip
2020-04-16 20:06:52 +00:00
To get a copy of a file in the right format for upload, use the [file download](/dc-userguide/#usage) feature in the Viewer tab
2023-04-18 08:40:08 +00:00
!!! warning
2023-04-19 09:50:54 +00:00
Lengths are taken from the target table. If a CSV contains long strings (eg `"ABCDE"` for a $3 variable) then the rest will be silently truncated (only `"ABC"` staged and loaded). If the target variable is a short numeric (eg 4., or 4 bytes) then floats or large integers may be rounded. This issue does not apply to excel uploads, which are first validated in the browser.
2020-04-16 20:06:52 +00:00