Fix the most common character encoding errors in CSV files
In large datasets, data integrity or "cleanliness" is vital.
Availability: all customers
LearnUpon recommends UTF-8 character encoding for batch uploads. Your uploaded file is more likely to import cleanly when encoded in UTF-8, rather than ISO or equivalent formats.
Character encoding errors crop up when
- your data comes from multiple sources
- your data uses multiple languages
Use the methods described to check how "clean" or consistent your data is, and remove characters which cause errors.
Confirm that any downloads use UTF-8 character encoding, before sharing data.
Background
Save as UTF-8 format
Whatever the source of your CSV file, your first choice is to save as UTF-8 format. This format resolves most errors, and makes any remaining issues easier to spot.
Check the encoding in a text editor or source code editor
Character encoding on a file can change without the creators' knowledge, when using word processors. The change can go unnoticed until you try to upload your file to LearnUpon.
When you work in multiple languages, some apps can render non-English characters as garbled text. The following screenshot shows an example of a CSV file with rendering errors.
To spot these changes, you need to view your data in a text editor or source code editor, like
- Notepad++: free source code editor which works in Windows environment
- TextPad: general purpose editor for plain text files for Windows
- Brackets: an open source code editor which works on Mac, Windows and Linux
- Open your .CSV file with a text editor and check the bottom of the window for the encoding.
- If the encoding is not set as UTF-8, find the encoding controls in the text editor.
- Set the encoding to UTF-8.
- Confirm the delimiter (the character that separates values) is comma, rather than pipe or other characters.
- Save to finish.
Note: Byte Order Mark (BOM) can have an adverse effect on character encoding for batch uploads. LearnUpon recommends that you encode in UTF-8 without BOM.
The following screen shot shows the edge of a Brackets window, and its encoding info.
Tip: read how to save .CSV files with UTF-8 encoding in different spreadsheet applications.
LearnUpon is not responsible for content outside this website.
Extra characters and "invisibles"
Applications like Microsoft Word/Excel can add extra formatting to data that you may not be aware of. Use a text editor to open your .CSV file, to look for
- extra characters added to the data
- strange formattings introduced like extra comma characters ","
- invisible characters: use "show invisibles", to view additional characters, like binary data
Check for whitespace
When creating your lists of users, be aware that whitespace is considered a character during the upload process.
Whitespace is the blank space between words and individual characters. Most words have a single space between them.
Human readers view whitespaces as blank, but for text processing whitespace is a character in its own right. So you must remove any whitespaces in your lists in strange places.
johnsmith@acmetraining.com,
johnsmith@acmetraining.com ,
For batch upload processing purposes, these two addresses are different, because one has whitespace between it and the comma.
If the space shown is not produced by a spacebar keystroke, and is a binary whitespace instead, then it can cause the batch user upload to fail as the whitespace is treated as its own individual character.
LearnUpon can "catch" and remove some whitespace characters, but sometimes you need to remove these manually using a text editor.
Check for "empty" lines
Ensure that your spreadsheet has no "empty" lines: open your spreadsheet in a text editor and delete any rows with "empty" data.
See: