docs.datacontroller.io/docs/excel.md
zmaj f5eac67aff
All checks were successful
Publish to docs.datacontroller.io / Deploy docs (push) Successful in 1m55s
vid
2024-01-24 20:51:46 +00:00

6.8 KiB

layout title description og_image
article Excel Data Controller can extract all manner of data from within an Excel file (including formulae) ready for ingestion into SAS. All versions of excel are supported. https://docs.datacontroller.io/img/excel_results.png

Excel Uploads

Data Controller supports two approaches for importing Excel data into SAS:

  • Simple - source range in tabular format, with column names/values that match the target Table. No configuration necessary.
  • Complex - data is scattered across multiple ranges in a dynamic (non-fixed) arrangement. Pre-configuration necessary.

Thanks to our pro license of sheetJS, we can support all versions of excel, large workbooks, and fast extracts. We also support the ingest of password-protected workbooks.

Note that data is extracted from excel from within the browser - meaning there is no need for any special SAS modules / products.

A copy of the original Excel file is also uploaded to the staging area. This means that a complete audit trail can be captured, right back to the original source data.

Simple Excel Uploads

To make a simple extract, select LOAD / Tables / (library/table) and click "UPLOAD" (or drag the file onto the page). No configuration necessary.

The rules for data extraction are:

  • Scan the each sheet until a row is found with all target columns
  • Extract rows until the first blank primary key value

This is incredibly flexible, and means:

  • data can be anywhere, on any worksheet
  • data can start on any row, and any column
  • data can be completely surrounded by other data
  • columns can be in any order
  • additional columns are simply ignored

!!! note If the excel contains more than one range with the target columns (eg, on different sheets), only the FIRST will be extracted.

Uploaded data may optionally contain a column named _____DELETE__THIS__RECORD_____ - if this contains the value "Yes", the row is marked for deletion.

If loading very large files (eg over 10mb) it is more efficient to use CSV format, as this bypasses the local rendering engine, but also the local DQ checks - so be careful! Examples of local (excel) but not remote (CSV) file checks include:

  • Length of character variables - CSV files are truncated at the max target column length
  • Length of numeric variables - if the target numeric variable is below 8 bytes then the staged CSV value may be rounded if it is too large to fit
  • NOTNULL - this rule is only applied at backend when the constraint is physical (rather than a DC setting)
  • MINVAL
  • MAXVAL
  • CASE

Note that the HARDSELECT_*** hooks are not applied to the rendered Excel values (they are only applied when actively editing a cell).

image

Formulas

It is possible to configure certain columns to be extracted as formulae, rather than raw values. The target column must be character, and it should be wide enough to support the longest formula in the source data. If the order of values is important, you should include a row number in your primary key.

Configuration is as follows:

Once this is done, you are ready to upload:

The final table will look like this:

Complex Excel Uploads

Through the use of "Excel Maps" you can dynamically extract individual cells or entire ranges from anywhere within a workbook - either through absolute / relative positioning, or by reference to a "matched" (search) string.

Configuration is made in the following tables:

  1. MPE_XLMAP_RULES - detailed extraction rules for a particular map
  2. MPE_XLMAP_INFO - optional map-level attributes

Each rule will extract either a single cell or a rectangular range from the source workbook. The target will be MPE_XLMAP_DATA, or whichever table is configured in MPE_XLMAP_INFO.

To illustrate with an example - consider the following excel. The yellow cells need to be imported.

The MPE_XLMAP_RULES configuration entries might (as there are multiple ways) be as follows:

XLMAP_ID XLMAP_RANGE_ID XLMAP_SHEET XLMAP_START XLMAP_FINISH
MAP01 MI_ITEM Current Month MATCH B R[1]C[0]: ITEM LASTDOWN
MAP01 MI_AMT Current Month MATCH C R[1]C[0]: AMOUNT LASTDOWN
MAP01 TMI Current Month ABSOLUTE F6
MAP01 CB Current Month MATCH F R[2]C[0]: CASH BALANCE
MAP01 RENT /1 MATCH E R[0]C[2]: Rent/mortgage
MAP01 CELL /1 MATCH E R[0]C[2]: Cell phone

To import the excel, the end user simply needs to navigate to the LOAD tab, choose "Files", select the appropriate map (eg MAP01), and upload. This will stage the new records in MPE_XLMAP_DATA which will go through the usual approval process and quality checks. A copy of the source excel file will be attached to each upload.

The corresponding MPE_XLMAP_DATA table will appear as follows:

LOAD_REF XLMAP_ID XLMAP_RANGE_ID ROW_NO COL_NO VALUE_TXT
DC20231212T154611798_648613_3895 MAP01 MI_ITEM 1 1 Income Source 1
DC20231212T154611798_648613_3895 MAP01 MI_ITEM 2 1 Income Source 2
DC20231212T154611798_648613_3895 MAP01 MI_ITEM 3 1 Other
DC20231212T154611798_648613_3895 MAP01 MI_AMT 1 1 2500
DC20231212T154611798_648613_3895 MAP01 MI_AMT 2 1 1000
DC20231212T154611798_648613_3895 MAP01 MI_AMT 3 1 250
DC20231212T154611798_648613_3895 MAP01 TMI 1 1 3750
DC20231212T154611798_648613_3895 MAP01 CB 1 1 864
DC20231212T154611798_648613_3895 MAP01 RENT 1 1 800
DC20231212T154611798_648613_3895 MAP01 CELL 1 1 45

Video