6.5 KiB
layout | title | description | og_image |
---|---|---|---|
article | Excel | Data Controller can extract all manner of data from within an Excel file (including formulae) ready for ingestion into SAS. All versions of excel are supported. | https://docs.datacontroller.io/img/excel_results.png |
Excel Uploads
Data Controller supports two approaches for importing Excel data into SAS:
- Simple - source range in tabular format, with column names/values that match the target Table. No configuration necessary.
- Complex - data is scattered across multiple ranges in a dynamic (non-fixed) arrangement. Pre-configuration necessary.
Thanks to our pro license of sheetJS, we can support all versions of excel, large workbooks, and fast extracts. We also support the ingest of password-protected workbooks.
Note that data is extracted from excel from within the browser - meaning there is no need for any special SAS modules / products.
A copy of the original Excel file is also uploaded to the staging area. This means that a complete audit trail can be captured, right back to the original source data.
Simple Excel Uploads
To make a simple extract, select LOAD / Tables / (library/table) and click "UPLOAD" (or drag the file onto the page). No configuration necessary.
The rules for data extraction are:
- Scan the each sheet until a row is found with all target columns
- Extract rows until the first blank primary key value
This is incredibly flexible, and means:
- data can be anywhere, on any worksheet
- data can start on any row, and any column
- data can be completely surrounded by other data
- columns can be in any order
- additional columns are simply ignored
!!! note If the excel contains more than one range with the target columns (eg, on different sheets), only the FIRST will be extracted.
Uploaded data may optionally contain a column named _____DELETE__THIS__RECORD_____
- if this contains the value "Yes", the row is marked for deletion.
If loading very large files (eg over 10mb) it is more efficient to use CSV format, as this bypasses the local rendering engine, but also the local DQ checks - so be careful! Examples of local (excel) but not remote (CSV) file checks include:
- Length of character variables - CSV files are truncated at the max target column length
- Length of numeric variables - if the target numeric variable is below 8 bytes then the staged CSV value may be rounded if it is too large to fit
- NOTNULL - this rule is only applied at backend when the constraint is physical (rather than a DC setting)
- MINVAL
- MAXVAL
- CASE
Note that the HARDSELECT_*** hooks are not applied to the rendered Excel values (they are only applied when actively editing a cell).
Formulas
It is possible to configure certain columns to be extracted as formulae, rather than raw values. The target column must be character, and it should be wide enough to support the longest formula in the source data. If the order of values is important, you should include a row number in your primary key.
Configuration is as follows:
Once this is done, you are ready to upload:
The final table will look like this:
Complex Excel Uploads
Through the use of "Excel Maps" you can dynamically extract individual cells or entire ranges from anywhere within a workbook - either through absolute / relative positioning, or by reference to a "matched" (search) string.
Configuration is made in the following tables:
- MPE_XLMAP_RULES - detailed extraction rules for a particular map
- MPE_XLMAP_INFO - optional map-level attributes
Each rule will extract either a single cell or a rectangular range from the source workbook. The target will be MPE_XLMAP_DATA, or whichever table is configured in MPE_XLMAP_INFO.
To illustrate with an example - consider the following excel. The yellow cells need to be imported.
The MPE_XLMAP_RULES configuration entries might (as there are multiple ways) be as follows:
XLMAP_ID | XLMAP_RANGE_ID | XLMAP_SHEET | XLMAP_START | XLMAP_FINISH |
---|---|---|---|---|
MAP01 | MI_ITEM | Current Month | MATCH B R[1]C[0]: ITEM |
LASTDOWN |
MAP01 | MI_AMT | Current Month | MATCH C R[1]C[0]: AMOUNT |
LASTDOWN |
MAP01 | TMI | Current Month | ABSOLUTE F6 |
|
MAP01 | CB | Current Month | MATCH F R[2]C[0]: CASH BALANCE |
|
MAP01 | RENT | /1 | MATCH E R[0]C[2]: Rent/mortgage |
|
MAP01 | CELL | /1 | MATCH E R[0]C[2]: Cell phone |
To import the excel, the end user simply needs to navigate to the LOAD tab, choose "Files", select the appropriate map (eg MAP01), and upload. This will stage the new records in MPE_XLMAP_DATA which will go through the usual approval process and quality checks. A copy of the source excel file will be attached to each upload.
The corresponding MPE_XLMAP_DATA table will appear as follows:
LOAD_REF | XLMAP_ID | XLMAP_RANGE_ID | ROW_NO | COL_NO | VALUE_TXT |
---|---|---|---|---|---|
DC20231212T154611798_648613_3895 | MAP01 | MI_ITEM | 1 | 1 | Income Source 1 |
DC20231212T154611798_648613_3895 | MAP01 | MI_ITEM | 2 | 1 | Income Source 2 |
DC20231212T154611798_648613_3895 | MAP01 | MI_ITEM | 3 | 1 | Other |
DC20231212T154611798_648613_3895 | MAP01 | MI_AMT | 1 | 1 | 2500 |
DC20231212T154611798_648613_3895 | MAP01 | MI_AMT | 2 | 1 | 1000 |
DC20231212T154611798_648613_3895 | MAP01 | MI_AMT | 3 | 1 | 250 |
DC20231212T154611798_648613_3895 | MAP01 | TMI | 1 | 1 | 3750 |
DC20231212T154611798_648613_3895 | MAP01 | CB | 1 | 1 | 864 |
DC20231212T154611798_648613_3895 | MAP01 | RENT | 1 | 1 | 800 |
DC20231212T154611798_648613_3895 | MAP01 | CELL | 1 | 1 | 45 |