diff --git a/docs/dc-userguide.md b/docs/dc-userguide.md index bdf78b2..86b6c96 100644 --- a/docs/dc-userguide.md +++ b/docs/dc-userguide.md @@ -40,7 +40,7 @@ The Editor screen lets users who have been pre-authorised (via the `DATACTRL.MPE 1 - *Filter*. The user can filter before proceeding to perform edits. -2 - *Upload*. If you have a lot of data, you can [upload it directly](dcu-fileupload). The changes are then approved in the usual way. +2 - *Upload*. If you have a lot of data, you can [upload it directly](files). The changes are then approved in the usual way. 3 - *Edit*. This is the main interface, data is displayed in tabular format. The first column is always "Delete?", as this allows you to mark rows for deletion. Note that removing a row from display does not mark it for deletion! It simply means that this row is not part of the changeset being submitted. The next set of columns are the Primary Key, and are shaded grey. If the table has a surrogate / retained key, then it is the Business Key that is shown here (the RK field is calculated / updated at the backend). For SCD2 type tables, the 'validity' fields are not shown. It is assumed that the user is always working with the current version of the data, and the view is filtered as such. diff --git a/docs/dcc-dates.md b/docs/dcc-dates.md index e4e38ff..4f0b063 100644 --- a/docs/dcc-dates.md +++ b/docs/dcc-dates.md @@ -34,7 +34,7 @@ proc metalib; run; ``` -If you have other dates / datetimes / times you would like us to support, do [get in touch](http://datacontroller.io/contact)! +If you have other dates / datetimes / times you would like us to support, do [get in touch](https://datacontroller.io/contact)! diff --git a/docs/dcu-fileupload.md b/docs/dcu-fileupload.md deleted file mode 100644 index c34cdd4..0000000 --- a/docs/dcu-fileupload.md +++ /dev/null @@ -1,68 +0,0 @@ -# Data Controller for SAS: File Uploads - -Files can be uploaded via the Editor interface - first choose the library and table, then click "Upload". All versions of excel are supported. - -Uploaded data may *optionally* contain a column named `_____DELETE__THIS__RECORD_____` - where this contains the value "Yes" the row is marked for deletion. - -If loading very large files (eg over 10mb) it is more efficient to use CSV format, as this bypasses the local rendering engine, but also the local DQ checks - so be careful! Examples of local (excel) but not remote (CSV) file checks include: - -* Length of character variables - CSV files are truncated at the max target column length -* Length of numeric variables - if the target numeric variable is below 8 bytes then the staged CSV value may be rounded if it is too large to fit -* NOTNULL - this rule is only applied at backend when the constraint is physical (rather than a DC setting) -* MINVAL -* MAXVAL -* CASE - -Note that the HARDSELECT_*** hooks are not applied to the rendered Excel values (they are currently only applied when editing a cell). - -![image](https://user-images.githubusercontent.com/4420615/233036372-87b8dd02-a4cd-4f19-ac1b-bb9fdc850607.png) - - -## Excel Uploads - -Thanks to our pro license of [sheetJS](https://sheetjs.com/), we can support all versions of excel, large workbooks, and extract data extremely fast. We also support the ingest of [password-protected workbooks](/videos#uploading-a-password-protected-excel-file). - -The rules for data extraction are: - -* Scan the spreadsheet until a row is found with all the target columns (not case sensitive) -* Extract data below until the *first row containing a blank primary key value* - -This is incredibly flexible, and means: - -* data can be anywhere, on any worksheet -* data can contain additional columns (they are just ignored) -* data can be completely surrounded by other data - -A copy of the original Excel file is also uploaded to the staging area. This means that a complete audit trail can be captured, right back to the original source data. - -!!! note - If the excel contains more than one range with the target columns (eg, on different sheets), only the FIRST will be extracted. - -## CSV Uploads - -The following should be considered when uploading data in this way: - - - A header row (with variable names) is required - - Variable names must match those in the target table (not case sensitive). An easy way to ensure this is to download the data from Viewer and use this as a template. - - Duplicate variable names are not permitted - - Missing columns are not permitted - - Additional columns are ignored - - The order of variables does not matter EXCEPT for the (optional) `_____DELETE__THIS__RECORD_____` variable. When using this variable, it must be the **first**. - - The delimiter is extracted from the header row - so for `var1;var2;var3` the delimeter would be assumed to be a semicolon - - The above assumes the delimiter is the first special character! So `var,1;var2;var3` would fail - - The following characters should **not** be used as delimiters - - doublequote - - quote - - space - - underscore - -When loading dates, be aware that Data Controller makes use of the `ANYDTDTE` and `ANYDTDTTME` informats (width 19). -This means that uploaded date / datetime values should be unambiguous (eg `01FEB1942` vs `01/02/42`), to avoid confusion - as the latter could be interpreted as `02JAN2042` depending on your locale and options `YEARCUTOFF` settings. Note that UTC dates with offset values (eg `2018-12-26T09:19:25.123+0100`) are not currently supported. If this is a feature you would like to see, contact us. - -!!! tip - To get a copy of a file in the right format for upload, use the [file download](/dc-userguide/#usage) feature in the Viewer tab - -!!! warning - Lengths are taken from the target table. If a CSV contains long strings (eg `"ABCDE"` for a $3 variable) then the rest will be silently truncated (only `"ABC"` staged and loaded). If the target variable is a short numeric (eg 4., or 4 bytes) then floats or large integers may be rounded. This issue does not apply to excel uploads, which are first validated in the browser. - - diff --git a/docs/excel.md b/docs/excel.md index 5f2d9d8..ca47825 100644 --- a/docs/excel.md +++ b/docs/excel.md @@ -1,14 +1,62 @@ --- layout: article title: Excel -description: Load Excel to SAS whilst retaining your favourite formulas! Data can be on any sheet, on any cell, even surrounded by other data. All versions of Excel supported. +description: Data Controller can extract all manner of data from within an Excel file (including formulae) ready for ingestion into SAS. All versions of excel are supported. og_image: https://docs.datacontroller.io/img/excel_results.png --- # Excel Uploads -Data Controller for SAS® supports all versions of excel. Data is extracted from excel from within the browser - there is no need for additional SAS components. So long as the column names match those in the target table, the data can be on any worksheet, start from any row, and any column. -The data can be completely surrounded by irrelevant data - the extraction will stop as soon as it hits one empty cell in a primary key column. The columns can be in any order, and are not case sensitive. More details [here](/dcu-fileupload/#excel-uploads). +Data Controller supports two approaches for importing Excel data into SAS: + + - Simple - source range in tabular format, with column names/values that match the target Table. No configuration necessary. + - Complex - data is scattered across multiple ranges in a dynamic (non-fixed) arrangement. Pre-configuration necessary. + + +Thanks to our pro license of [sheetJS](https://sheetjs.com/), we can support all versions of excel, large workbooks, and fast extracts. We also support the ingest of [password-protected workbooks](/videos#uploading-a-password-protected-excel-file). + ) + +Note that data is extracted from excel from _within the browser_ - meaning there is no need for any special SAS modules / products. + +A copy of the original Excel file is also uploaded to the staging area. This means that a complete audit trail can be captured, right back to the original source data. + +## Simple Excel Uploads + +To make a _simple_ extract, select LOAD / Tables / (library/table) and click "UPLOAD" (or drag the file onto the page). No configuration necessary. + +![](img/xltables.png) + +The rules for data extraction are: + +* Scan the each sheet until a row is found with all target columns +* Extract rows until the first *blank primary key value* + +This is incredibly flexible, and means: + +* data can be anywhere, on any worksheet +* data can start on any row, and any column +* data can be completely surrounded by other data +* columns can be in any order +* additional columns are simply ignored + + +!!! note + If the excel contains more than one range with the target columns (eg, on different sheets), only the FIRST will be extracted. + +Uploaded data may *optionally* contain a column named `_____DELETE__THIS__RECORD_____` - if this contains the value "Yes", the row is marked for deletion. + +If loading very large files (eg over 10mb) it is more efficient to use CSV format, as this bypasses the local rendering engine, but also the local DQ checks - so be careful! Examples of local (excel) but not remote (CSV) file checks include: + +* Length of character variables - CSV files are truncated at the max target column length +* Length of numeric variables - if the target numeric variable is below 8 bytes then the staged CSV value may be rounded if it is too large to fit +* NOTNULL - this rule is only applied at backend when the constraint is physical (rather than a DC setting) +* MINVAL +* MAXVAL +* CASE + +Note that the HARDSELECT_*** hooks are not applied to the rendered Excel values (they are currently only applied when editing a cell). + +![image](https://user-images.githubusercontent.com/4420615/233036372-87b8dd02-a4cd-4f19-ac1b-bb9fdc850607.png) ## Formulas @@ -26,5 +74,20 @@ The final table will look like this: ![](img/excel_results.png) -If you would like further integrations / support with excel uploads, we are happy to discuss new features. Just [get in touch](https://datacontroller.io/contact). + + +# Complex Excel Uploads + +Through the use of "Excel Maps" this feature enables a series of cells / ranges to be dynamically extracted from anywhere within a workbook - either through absolute / relative positioning, or by reference to a "matched" (search) string. + +Configuration is made in the following tables: + +1. [MPE_XLMAP_RULES](tables/mpe_xlmap_rules.md) - detailed extraction rules for a particular map +2. [MPE_XLMAP_INFO](tables/mpe_xlmap_info.md) - optional map-level attributes + +Each [rule](tables/mpe_xlmap_rules.md) will extract either a single cell or a rectangular range from the source workbook. + + + + diff --git a/docs/files.md b/docs/files.md new file mode 100644 index 0000000..1197dfc --- /dev/null +++ b/docs/files.md @@ -0,0 +1,42 @@ +# Data Controller for SAS: File Uploads + +Data Controller supports the ingestion of two file formats - Excel (any version) and CSV. + +If you would like to support other file types, do [get in touch](https://datacontroller.io/contact)! + + + +## Excel Uploads + +Data can be uploaded in regular (tabular) or dynamic (complex) format. For details, see the [excel](/excel). + + +## CSV Uploads + +The following should be considered when uploading data in this way: + + - A header row (with variable names) is required + - Variable names must match those in the target table (not case sensitive). An easy way to ensure this is to download the data from Viewer and use this as a template. + - Duplicate variable names are not permitted + - Missing columns are not permitted + - Additional columns are ignored + - The order of variables does not matter EXCEPT for the (optional) `_____DELETE__THIS__RECORD_____` variable. When using this variable, it must be the **first**. + - The delimiter is extracted from the header row - so for `var1;var2;var3` the delimeter would be assumed to be a semicolon + - The above assumes the delimiter is the first special character! So `var,1;var2;var3` would fail + - The following characters should **not** be used as delimiters + - doublequote + - quote + - space + - underscore + +When loading dates, be aware that Data Controller makes use of the `ANYDTDTE` and `ANYDTDTTME` informats (width 19). +This means that uploaded date / datetime values should be unambiguous (eg `01FEB1942` vs `01/02/42`), to avoid confusion - as the latter could be interpreted as `02JAN2042` depending on your locale and options `YEARCUTOFF` settings. Note that UTC dates with offset values (eg `2018-12-26T09:19:25.123+0100`) are not currently supported. If this is a feature you would like to see, contact us. + +!!! tip + To get a copy of a file in the right format for upload, use the [file download](/dc-userguide/#usage) feature in the Viewer tab + +!!! warning + Lengths are taken from the target table. If a CSV contains long strings (eg `"ABCDE"` for a $3 variable) then the rest will be silently truncated (only `"ABC"` staged and loaded). If the target variable is a short numeric (eg 4., or 4 bytes) then floats or large integers may be rounded. This issue does not apply to excel uploads, which are first validated in the browser. + + +When loading CSVs, the entire file is passed to backend for ingestion. This makes it more efficient for large files, but does mean that frontend validations are bypassed. \ No newline at end of file diff --git a/docs/img/excel_config_setup.png b/docs/img/excel_config_setup.png index 95b7bbe..682539e 100644 Binary files a/docs/img/excel_config_setup.png and b/docs/img/excel_config_setup.png differ diff --git a/docs/img/excel_results.png b/docs/img/excel_results.png index ee5664e..46b1cd7 100644 Binary files a/docs/img/excel_results.png and b/docs/img/excel_results.png differ diff --git a/docs/img/mpe_xlmap_info.png b/docs/img/mpe_xlmap_info.png new file mode 100644 index 0000000..a664e5d Binary files /dev/null and b/docs/img/mpe_xlmap_info.png differ diff --git a/docs/img/mpe_xlmap_rules.png b/docs/img/mpe_xlmap_rules.png new file mode 100644 index 0000000..f5c5ef6 Binary files /dev/null and b/docs/img/mpe_xlmap_rules.png differ diff --git a/docs/img/xltables.png b/docs/img/xltables.png new file mode 100644 index 0000000..383c7b3 Binary files /dev/null and b/docs/img/xltables.png differ diff --git a/docs/index.md b/docs/index.md index 98a8f92..bc0ce7d 100644 --- a/docs/index.md +++ b/docs/index.md @@ -29,7 +29,7 @@ The following resources contain additional information on the Data Controller: Data Controller is regularly updated with new features. If you see something that is not listed, and we agree it would be useful, you can engage us with Developer Days to build the feature in. -* [Excel uploads](/dcu-fileupload/#excel-uploads) - drag & drop directly into SAS. All versions of excel supported. +* [Excel uploads](/excel) - drag & drop directly into SAS. All versions of excel supported. * Data Lineage - at both table and column level, export as image or CSV * Data Validation Rules - both automatic and user defined * Data Dictionary - map data definitions and ownership diff --git a/docs/tables/mpe_xlmap_info.md b/docs/tables/mpe_xlmap_info.md new file mode 100644 index 0000000..a6ba887 --- /dev/null +++ b/docs/tables/mpe_xlmap_info.md @@ -0,0 +1,29 @@ +--- +layout: article +title: MPE_XLMAP_INFO +description: The MPE_XLMAP_INFO table provides information about a particular XLMAP_ID +og_title: MPE_XLMAP_INFO Table Documentation +og_image: ../img/mpe_xlmap_info.png +--- + +# MPE_XLMAP_INFO + +The MPE_XLMAP_INFO table provides information about a particular XLMAP_ID + +The information is optional (unless you wish to configure a non-default target table). + +![Screenshot](../img/mpe_xlmap_info.png) + +See also: + +* [MPE_XLMAP_RULES](tables/mpe_xlmap_rules.md) + + + +## Columns + + - `TX_FROM num`: SCD2 open datetime + - 🔑 `TX_TO num`: SCD2 close datetime + - 🔑 `XLMAP_ID char(32)`: A unique, UPPERCASE reference for the excel map. + - `XLMAP_DESCRIPTION char(1000)`: Map Description + - `XLMAP_TARGETLIBDS char(41)`: An alternative target table to which to upload the data. This MUST have the same structure as the [MPE_XLMAP_DATA](tables/mpe_xlmap_data.md) table. \ No newline at end of file diff --git a/docs/tables/mpe_xlmap_rules.md b/docs/tables/mpe_xlmap_rules.md new file mode 100644 index 0000000..e519cab --- /dev/null +++ b/docs/tables/mpe_xlmap_rules.md @@ -0,0 +1,33 @@ +--- +layout: article +title: MPE_XLMAP_RULES +description: The MPE_XLMAP_RULES table contains the rules for mapping excel cells and ranges to XLMAP_IDs for upload into SAS +og_title: MPE_XLMAP_RULES Table Documentation +og_image: ../img/mpe_xlmap_rules.png +--- + +# MPE_XLMAP_RULES + +The MPE_XLMAP_RULES table contains the rules for mapping excel cells and ranges to XLMAP_IDs for [upload into SAS](/excel) + +![Screenshot](../img/mpe_xlmap_rules.png) + + + +## Columns + + - `TX_FROM num`: SCD2 open datetime + - 🔑 `TX_TO num`: SCD2 close datetime + - 🔑 `XLMAP_ID char(32)`: A unique, UPPERCASE reference for the excel map. + - 🔑 `XLMAP_RANGE_ID char(32)`: A unique reference for the specific range being loaded + - `XLMAP_SHEET char(32)`: The sheet name in which to capture the data. Examples: + - `Sheet2` - an absolute reference + - `/1` - the first tab in the workbook + - `XLMAP_START char(1000)`: The rule used to find the top left of the range. Use "R1C1" notation to move the target. Examples: + - `ABSOLUTE F4` - an absolute reference + - `RELATIVE R[2]C[2]` - In the XMLMAP_START case, this is the same as `ABSOLUTE B2` + - `MATCH P R[0]C[2]: My Test` - search column P for the string "My Test" then move 2 columns right + - `MATCH 7 R[-2]C[-1]: Top Banana` - search row 7 for the string "Top Banana" then move 2 rows up and 1 column left + - `XLMAP_FINISH char(1000)`: The rule used to find the end of the range. Leave blank for individual cells. Example values include those listed under XLMAP_START, plus: + - `BLANKROW` - search down (from XLMAP_START) until an entirely blank row is found, then choose the row above it + - `LASTDOWN` - The last non blank cell below the XLMAP_START cell. In the `RELATIVE R[x]C[x]` case, this is offset from from XLMAP_START rather than A1 diff --git a/mkdocs.yml b/mkdocs.yml index d23a125..dc51ed0 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -7,7 +7,8 @@ nav: - DC User Guide: dc-userguide.md - Data Catalog: dcu-datacatalog.md - Data Lineage: dcu-lineage.md - - File Uploads: dcu-fileupload.md + - Excel: excel.md + - File Uploads: files.md - Filter Mechanism: filter.md - Locking Mechanism: locking-mechanism.md - Table Viewer: dcu-tableviewer.md @@ -32,7 +33,6 @@ nav: - Dates / Datetimes: dcc-dates.md - Dynamic Cell Dropdown: dynamic-cell-dropdown.md - Emails: emails.md - - Excel Formulas: excel.md - Formats: formats.md - Groups: dcc-groups.md - Libraries: libraries.md @@ -77,6 +77,7 @@ plugins: - redirects: redirect_maps: 'dci-evaluation.md': 'dci-deploysas9.md' + 'dcu-fileupload.md': 'files.md' # Repository repo_name: 'dc/docs.datacontroller.io'