docs.datacontroller.io/docs/dcu-fileupload.md
2020-04-16 22:06:52 +02:00

2.6 KiB

Data Controller for SAS: File Uploads

Files can be uploaded via the Editor interface - first choose the library and table, then click "Upload". All versions of excel are supported. If loading very large files (eg over 10mb) it is more efficient to use CSV format, as this bypasses the local rendering engine, but also the local DQ checks - so be careful! For CSV, alternative delimiters can be used (eg semicolons).

CSV Uploads

The following should be considered when uploading data in this way:

  • A header row (with variable names) is required
  • Variable names must match the target (not case sensitive). An easy way to ensure this is to download the data from Viewer and use this as a template.
  • Duplicate variable names are not permitted
  • Missing columns are not permitted
  • Additional columns are ignored
  • The order of variables does not matter
  • The delimiter is extracted from the header row - so for var1;var2;var3 the delimeter would be assumed to be a semicolon
  • The above assumes the delimiter is the first special character! So var,1;var2;var3 would fail
  • The following characters should not be used as delimiters
    • doublequote
    • quote
    • space
    • underscore

When loading dates, be aware that the data controller makes use of the ANYDTDTE and ANYDTDTTME informats (width 19). This means that uploaded date / datetime values should be unambiguous (eg 01FEB1942 vs 01/02/42) to avoid confusion - as the latter could be interpreted as 02JAN2042 depending on your locale and options YEARCUTOFF settings. Note that UTC dates with offset values (eg 2018-12-26T09:19:25.123+0100) are not currently supported. If this is a feature you would like to see, contact us.

!!! tip To get a copy of a file in the right format for upload, use the file download feature in the Viewer tab

Excel Uploads

Thanks to our pro license of sheetJS, we can support all versions of excel, and extract the data super quickly to boot.

The rules for data extraction are:

  • Scan the spreadsheet until a row is found with all the target columns (with no blank cells between columns)
  • Extract data below that row up until the first blank primary key value

This is incredibly flexible, and means:

  • data can be anywhere, on any worksheet
  • data can contain additional columns (they are just ignored)
  • data can be completely surrounded by other data

A copy of the original Excel file is also uploaded to the staging area. This means that a complete audit trail can be captured, right back to the original source data.