docs.datacontroller.io/docs/excel.md
zmaj 109c9735a5
Some checks are pending
Publish to docs.datacontroller.io / Deploy docs (push) Waiting to run
complex excel uploads
2024-01-23 17:34:36 +00:00

4.3 KiB

layout title description og_image
article Excel Data Controller can extract all manner of data from within an Excel file (including formulae) ready for ingestion into SAS. All versions of excel are supported. https://docs.datacontroller.io/img/excel_results.png

Excel Uploads

Data Controller supports two approaches for importing Excel data into SAS:

  • Simple - source range in tabular format, with column names/values that match the target Table. No configuration necessary.
  • Complex - data is scattered across multiple ranges in a dynamic (non-fixed) arrangement. Pre-configuration necessary.

Thanks to our pro license of sheetJS, we can support all versions of excel, large workbooks, and fast extracts. We also support the ingest of password-protected workbooks. )

Note that data is extracted from excel from within the browser - meaning there is no need for any special SAS modules / products.

A copy of the original Excel file is also uploaded to the staging area. This means that a complete audit trail can be captured, right back to the original source data.

Simple Excel Uploads

To make a simple extract, select LOAD / Tables / (library/table) and click "UPLOAD" (or drag the file onto the page). No configuration necessary.

The rules for data extraction are:

  • Scan the each sheet until a row is found with all target columns
  • Extract rows until the first blank primary key value

This is incredibly flexible, and means:

  • data can be anywhere, on any worksheet
  • data can start on any row, and any column
  • data can be completely surrounded by other data
  • columns can be in any order
  • additional columns are simply ignored

!!! note If the excel contains more than one range with the target columns (eg, on different sheets), only the FIRST will be extracted.

Uploaded data may optionally contain a column named _____DELETE__THIS__RECORD_____ - if this contains the value "Yes", the row is marked for deletion.

If loading very large files (eg over 10mb) it is more efficient to use CSV format, as this bypasses the local rendering engine, but also the local DQ checks - so be careful! Examples of local (excel) but not remote (CSV) file checks include:

  • Length of character variables - CSV files are truncated at the max target column length
  • Length of numeric variables - if the target numeric variable is below 8 bytes then the staged CSV value may be rounded if it is too large to fit
  • NOTNULL - this rule is only applied at backend when the constraint is physical (rather than a DC setting)
  • MINVAL
  • MAXVAL
  • CASE

Note that the HARDSELECT_*** hooks are not applied to the rendered Excel values (they are currently only applied when editing a cell).

image

Formulas

It is possible to configure certain columns to be extracted as formulae, rather than raw values. The target column must be character, and it should be wide enough to support the longest formula in the source data. If the order of values is important, you should include a row number in your primary key.

Configuration is as follows:

Once this is done, you are ready to upload:

The final table will look like this:

Complex Excel Uploads

Through the use of "Excel Maps" this feature enables a series of cells / ranges to be dynamically extracted from anywhere within a workbook - either through absolute / relative positioning, or by reference to a "matched" (search) string.

Configuration is made in the following tables:

  1. MPE_XLMAP_RULES - detailed extraction rules for a particular map
  2. MPE_XLMAP_INFO - optional map-level attributes

Each rule will extract either a single cell or a rectangular range from the source workbook.