The $escape parameter is completely unintuitive, but it is not broken. Here is a breakdown of fgetcsv()'s behaviour. In the examples I've used underscores (_) to show spaces and brackets ([]) to show individual fields:
- Leading whitespace in each field will be stripped if it comes immediately before an enclosure: ___"foo" -> [foo]
- There can only be one enclosure per field, although it will be concatenated with any data that appears between the end enclosure and the next delimiter/new line, including any trailing whitespaces ___"foo"_"bar"__ -> [foo_"bar"__]
- If the field does not start with (leading whitespace +) an enclosure, the whole field is interpreted as raw data, even if enclosure characters appear elsewhere within the field: _foo"bar"_ -> [_foo"bar"_]
- Delimiters cannot be escaped outside enclosures, they have to be enclosed instead. Delimiters don't need to be escaped inside enclosures: "foo,bar","baz,qux" -> [foo,bar][baz,qux]; foo\,bar -> [foo\][bar]; "foo\,bar" -> [foo\,bar]
- Double enclosures inside single enclosures are converted to single enclosures: "foobar" -> [foobar]; "foo""bar" -> [foo"bar]; """foo""" -> ["foo"]; ""foo"" -> [foo""] (empty enclosure followed by raw data)
- The $escape parameter works as expected, but unlike enclosures DOES NOT get unescaped. It is necessary to unescape the data elsewhere in the code: "\"foo\"" -> [\"foo\"]; "foo\"bar" -> [foo\"bar]
Note: the following data (which is a very common problem) is invalid: "\". Its structure is equivalent to "@ or in other words, an open enclosure, some data and no closing enclosure.
The following functions can be used to get the expected behaviour:
<?php
function fgetcsv_unescape_enclosures_and_escapes($fh, $length = 0, $delimiter = ',', $enclosure = '"', $escape = '\\') {
$fields = fgetcsv($fh, $length, $delimiter, $enclosure, $escape);
if ($fields) {
$regex_enclosure = preg_quote($enclosure);
$regex_escape = preg_quote($escape);
$fields = preg_replace("/{$regex_escape}({$regex_enclosure}|{$regex_escape})/", '$1', $fields);
}
return $fields;
}
function fgetcsv_unescape_all($fh, $length = 0, $delimiter = ',', $enclosure = '"', $escape = '\\') {
$fields = fgetcsv($fh, $length, $delimiter, $enclosure, $escape);
if ($fields) {
$regex_escape = preg_quote($escape);
$fields = preg_replace("/{$regex_escape}(.)/s", '$1', $fields);
}
return $fields;
}
function fgetcsv_unescape_all_strip_last($fh, $length = 0, $delimiter = ',', $enclosure = '"', $escape = '\\') {
$fields = fgetcsv($fh, $length, $delimiter, $enclosure, $escape);
if ($fields) {
$regex_escape = preg_quote($escape);
$fields = preg_replace("/{$regex_escape}(.?)/s", '$1', $fields);
}
return $fields;
}
?>
Caution: ideally, there shouldn't be any unescaped escape characters outside enclosures; the field should be enclosed and escaped instead. If there are any, they could end up being removed as well, depending on the function used.