Discussion:
fseek/ftell for files bigger than 2^32
(too old to reply)
jacob navia
2015-06-10 20:27:57 UTC
Permalink
Raw Message
The lcc-win compiler uses long long for fseek/ftell and thus supports
files bigger than 2Gig... commonplace nowadays.

Could it be possible to change the prototype of those functions in the
standard?

This would be a compatible change since any int can be extended to long
long and working code would not change...

The situation now is impossible with POSIX adding fseeko and ftello and
other vendors using various other names (fseek64 and ftell64 being also
used)

Updating those functions would be the easiest way out.

Files bigger than 2Gbytes are really nothing special with disk drives
making several TERA bytes!

Thanks for your attention

jacob
Kaz Kylheku
2015-06-10 22:44:13 UTC
Permalink
Raw Message
Post by jacob navia
The lcc-win compiler uses long long for fseek/ftell and thus supports
files bigger than 2Gig... commonplace nowadays.
These functions can support files bigger than 2Gb; it just requires
multiple calls.
Post by jacob navia
Could it be possible to change the prototype of those functions in the
standard?
This would be a compatible change since any int can be extended to long
long and working code would not change...
This breaks:

int (*seek_op)(FILE *, long, int) = fseek; // error

If there is a stinky cast, it breaks silently:

int (*seek_op)(void *stream, long, int) = (int (*)(void *, long, int)) fseek;
Kaz Kylheku
2015-06-10 23:20:57 UTC
Permalink
Raw Message
Post by Kaz Kylheku
Post by jacob navia
The lcc-win compiler uses long long for fseek/ftell and thus supports
files bigger than 2Gig... commonplace nowadays.
These functions can support files bigger than 2Gb; it just requires
multiple calls.
Post by jacob navia
Could it be possible to change the prototype of those functions in the
standard?
This would be a compatible change since any int can be extended to long
long and working code would not change...
int (*seek_op)(FILE *, long, int) = fseek; // error
int (*seek_op)(void *stream, long, int) = (int (*)(void *, long, int)) fseek;
Which is not to say I want to be a naysayer. In POSIX, transitions from raw
types like int have been managed. for instance, the accept function
used to have an "int *" parameter, and it was transitioned to "socklen_t *".

Transitioning the long parameter of fseek and ftell to a typedef should have
been done in C99, if not earlier.

The typedef lets users and implementors stage their own obsolescence schedule:
choose the point in time at which they break with backward compatibility and
target the typedef away from long to some other type.
Casper H.S. Dik
2015-06-11 07:34:01 UTC
Permalink
Raw Message
Post by Kaz Kylheku
Which is not to say I want to be a naysayer. In POSIX, transitions from raw
types like int have been managed. for instance, the accept function
used to have an "int *" parameter, and it was transitioned to "socklen_t *".
But generally socklen_t has the same size of int (it is generally defined
as int)
Post by Kaz Kylheku
Transitioning the long parameter of fseek and ftell to a typedef should have
been done in C99, if not earlier.
It is not a compatible change; it does not allow for binary compatibility.

Casper
Casper H.S. Dik
2015-06-11 07:31:49 UTC
Permalink
Raw Message
Post by jacob navia
The lcc-win compiler uses long long for fseek/ftell and thus supports
files bigger than 2Gig... commonplace nowadays.
Clearly not standard compliant as the standard defines is as:

int fseek(FILE *stream, long int offset, int whence);

The posix standard adds the following function (and similar):

int fseeko(FILE *stream, off_t offset, int whence);

off_t is the type which big enough for all file positions;
on 32 bit systems it can 32 but also 64 (might depend on the
compilation environment) Many 32-bit posix systems have added
a "large file compilation environment" in the 90's.
Post by jacob navia
Could it be possible to change the prototype of those functions in the
standard?
No, generally it is not possible to change the prototype to use
incompatible types.

As this problem already existed 20 years ago and was fixed in that time
without modifying the prototype, I guess I am not alone in that opinion.
Post by jacob navia
This would be a compatible change since any int can be extended to long
long and working code would not change...
Actually, it wouldn't. You are forgetting binaries compiled with
the old headers.
Post by jacob navia
The situation now is impossible with POSIX adding fseeko and ftello and
other vendors using various other names (fseek64 and ftell64 being also
used)
Which particular systems do you have where this is an issue?

Most 64-bit systems use a 64 bit long and they do not have this
issue; even phones now have > 4GB of memory and disks and now
are getting 64 bit CPUs.
Post by jacob navia
Updating those functions would be the easiest way out.
Clearly not because it break compatibility. And what type would
you use? What type does the standard have which is at least 64 bits?

Casper
Richard Kettlewell
2015-06-11 08:47:24 UTC
Permalink
Raw Message
Post by Casper H.S. Dik
Post by jacob navia
The situation now is impossible with POSIX adding fseeko and ftello
and other vendors using various other names (fseek64 and ftell64
being also used)
Which particular systems do you have where this is an issue?
Windows uses _fseeki64/_ftelli64, with an __int64 offset
parameter/return.

The obvious course of action would be for the next C standard to adopt
off_t and fseeko/ftello from POSIX.
--
http://www.greenend.org.uk/rjk/
Casper H.S. Dik
2015-06-11 09:11:06 UTC
Permalink
Raw Message
Post by Richard Kettlewell
Post by Casper H.S. Dik
Post by jacob navia
The situation now is impossible with POSIX adding fseeko and ftello
and other vendors using various other names (fseek64 and ftell64
being also used)
Which particular systems do you have where this is an issue?
Windows uses _fseeki64/_ftelli64, with an __int64 offset
parameter/return.
I think at Microsoft they made a stupid decision to make long 32
bit in 64 bit Windows. Originally at Sun we did nearly made the
same mistake (we're talking about 20 years ago) as a large
number of people felt that having int and long having the same
size would make porting easier. But as Sun wasn't the first, the
Unix industry voted for long being 64 bit and Sun followed that lead
when the first 64 bit Solaris came out. Of course, HAL build the
first 64 bit Solaris version a number of years earlier and they
used Sun/SPARC's first draft V9 ABI (sizeof (int) == sizeof (long))
around 1995. Sun release its first 64 bit CPU around the same time
but did not shipped a 64 bit OS until Solaris 7 (end of 1998)


For windows fseek/ftell's definition is an issue but that is a problem
created at Microsoft, at least for 64 bit Windows.
Post by Richard Kettlewell
The obvious course of action would be for the next C standard to adopt
off_t and fseeko/ftello from POSIX.
Right.

Casper
Richard Kettlewell
2015-06-11 09:32:28 UTC
Permalink
Raw Message
Post by Casper H.S. Dik
Post by Richard Kettlewell
Post by Casper H.S. Dik
Post by jacob navia
The situation now is impossible with POSIX adding fseeko and ftello
and other vendors using various other names (fseek64 and ftell64
being also used)
Which particular systems do you have where this is an issue?
Windows uses _fseeki64/_ftelli64, with an __int64 offset
parameter/return.
I think at Microsoft they made a stupid decision to make long 32
bit in 64 bit Windows. Originally at Sun we did nearly made the
same mistake (we're talking about 20 years ago) as a large
number of people felt that having int and long having the same
size would make porting easier. But as Sun wasn't the first, the
Unix industry voted for long being 64 bit and Sun followed that lead
when the first 64 bit Solaris came out. Of course, HAL build the
first 64 bit Solaris version a number of years earlier and they
used Sun/SPARC's first draft V9 ABI (sizeof (int) == sizeof (long))
around 1995. Sun release its first 64 bit CPU around the same time
but did not shipped a 64 bit OS until Solaris 7 (end of 1998)
For windows fseek/ftell's definition is an issue but that is a problem
created at Microsoft, at least for 64 bit Windows.
I don’t think the size of ‘long’ is the key factor here; even if they’d
made long be 64-bits in their 64-bit ABI, there’d still be their 32-bit
ABI to consider.

The POSIX world has mostly addressed this by having (at least) two
32-bit ABIs, distinguished by the size of off_t. If (hypothetically)
Microsoft adopted off_t now, either unilaterally or with ISO’s blessing,
they could make it 64 bits in both their ABIs, avoiding the trouble that
the dual ABIs cause in the POSIX world. Occasionally it’s best to be
late to the party.
--
http://www.greenend.org.uk/rjk/
Casper H.S. Dik
2015-06-11 09:53:22 UTC
Permalink
Raw Message
I don’t think the size of ‘long’ is the key factor here; even if they’d
made long be 64-bits in their 64-bit ABI, there’d still be their 32-bit
ABI to consider.
I'm not sure how much of the Windows market is 32 bit only; this
is likely now much smaller than the 64 bit capable market.

On the latter market there is no direct need for a 32-bit application
which can handle files over 2^31-1 bytes in size (I seem to remember
that FAT32 allowed for files upto 2^32-1 in size, more than fseek()
or ftell() would support.
The POSIX world has mostly addressed this by having (at least) two
32-bit ABIs, distinguished by the size of off_t. If (hypothetically)
Microsoft adopted off_t now, either unilaterally or with ISO’s blessing,
they could make it 64 bits in both their ABIs, avoiding the trouble that
the dual ABIs cause in the POSIX world. Occasionally it’s best to be
late to the party.
In Posix, generally both APIs can be used simultanously in the
same application.

The question is mostly how much future is there in 32 bit windows
and whether a new ABI is useful. Or do they have _ftell64 in 32 bit
Windows?


Casper
Richard Kettlewell
2015-06-11 10:21:20 UTC
Permalink
Raw Message
Post by Casper H.S. Dik
Post by Richard Kettlewell
I don’t think the size of ‘long’ is the key factor here; even if
they’d made long be 64-bits in their 64-bit ABI, there’d still be
their 32-bit ABI to consider.
I'm not sure how much of the Windows market is 32 bit only; this
is likely now much smaller than the 64 bit capable market.
On the latter market there is no direct need for a 32-bit application
which can handle files over 2^31-1 bytes in size (I seem to remember
that FAT32 allowed for files upto 2^32-1 in size, more than fseek()
or ftell() would support.
The market for 32-bit general-purposes computers, as a whole, may well
be tiny by now. But there’s more to life than PCs and even on 64-bit
platforms sometimes there are still practical reasons for deploying
32-bit object code.
Post by Casper H.S. Dik
Post by Richard Kettlewell
The POSIX world has mostly addressed this by having (at least) two
32-bit ABIs, distinguished by the size of off_t. If (hypothetically)
Microsoft adopted off_t now, either unilaterally or with ISO’s blessing,
they could make it 64 bits in both their ABIs, avoiding the trouble that
the dual ABIs cause in the POSIX world. Occasionally it’s best to be
late to the party.
In Posix, generally both APIs can be used simultanously in the
same application.
The problem is the two ABIs, with a B, not APIs.
Post by Casper H.S. Dik
The question is mostly how much future is there in 32 bit windows
and whether a new ABI is useful. Or do they have _ftell64 in 32 bit
Windows?
Yes.
--
http://www.greenend.org.uk/rjk/
Casper H.S. Dik
2015-06-11 11:26:12 UTC
Permalink
Raw Message
Post by Richard Kettlewell
Post by Casper H.S. Dik
In Posix, generally both APIs can be used simultanously in the
same application.
The problem is the two ABIs, with a B, not APIs.
It actually also works for the ABI in POSIX. You should not mix
them for the same file descriptors.

Casper
Richard Kettlewell
2015-06-11 12:10:32 UTC
Permalink
Raw Message
Post by Casper H.S. Dik
Post by Richard Kettlewell
Post by Casper H.S. Dik
In Posix, generally both APIs can be used simultanously in the
same application.
The problem is the two ABIs, with a B, not APIs.
It actually also works for the ABI in POSIX. You should not mix
them for the same file descriptors.
If it worked Debian wouldn’t need separate inn2 and inn2-lfs packages
(to pick an example relevant to Usenet that came up recently elsewhere).
--
http://www.greenend.org.uk/rjk/
Casper H.S. Dik
2015-06-11 12:44:45 UTC
Permalink
Raw Message
Post by Casper H.S. Dik
Post by Richard Kettlewell
Post by Casper H.S. Dik
In Posix, generally both APIs can be used simultanously in the
same application.
The problem is the two ABIs, with a B, not APIs.
It actually also works for the ABI in POSIX. You should not mix
them for the same file descriptors.
If it worked Debian wouldn’t need separate inn2 and inn2-lfs packages
(to pick an example relevant to Usenet that came up recently elsewhere).
It certainly works on Solaris as of Solaris 2.6 (18 years ago) and
certainly can work but what happens in this case is that the offsets
are encoded in the files (or so it seems) but they also say:

"The old inn2-lfs package does not exist anymore and must be replaced by
the new functionally equivalent inn2 package, which supports large files."

The history database needs to be rebuild when changing from
a non-LFS inn2 to a LFS compiled inn2 version.

Casper
Richard Kettlewell
2015-06-11 13:09:24 UTC
Permalink
Raw Message
Post by Casper H.S. Dik
Post by Casper H.S. Dik
Post by Richard Kettlewell
Post by Casper H.S. Dik
In Posix, generally both APIs can be used simultanously in the
same application.
The problem is the two ABIs, with a B, not APIs.
It actually also works for the ABI in POSIX. You should not mix
them for the same file descriptors.
If it worked Debian wouldn’t need separate inn2 and inn2-lfs packages
(to pick an example relevant to Usenet that came up recently elsewhere).
It certainly works on Solaris as of Solaris 2.6 (18 years ago) and
certainly can work but what happens in this case is that the offsets
How does Solaris arrange that an application built with 32-bit off_t
works with a library with 64-bit off_t (and that uses off_t in its API),
or vica versa, then?
Post by Casper H.S. Dik
"The old inn2-lfs package does not exist anymore and must be replaced by
the new functionally equivalent inn2 package, which supports large files."
The history database needs to be rebuild when changing from
a non-LFS inn2 to a LFS compiled inn2 version.
That doesn’t mean the general problem’s gone away.
--
http://www.greenend.org.uk/rjk/
Casper H.S. Dik
2015-06-11 15:13:12 UTC
Permalink
Raw Message
Post by Richard Kettlewell
How does Solaris arrange that an application built with 32-bit off_t
works with a library with 64-bit off_t (and that uses off_t in its API),
or vica versa, then?
You will need to provide two interfaces and do some magically
things whether you are using a LFS compile environment or not.

The C library is one such example; and if an interface which
makes the off_t visible, you will need to do the same.
Post by Richard Kettlewell
Post by Casper H.S. Dik
"The old inn2-lfs package does not exist anymore and must be replaced by
the new functionally equivalent inn2 package, which supports large files."
The history database needs to be rebuild when changing from
a non-LFS inn2 to a LFS compiled inn2 version.
That doesn’t mean the general problem’s gone away.
The problem was solved in the middle of the '90s; it needs to
be handled in all new ABIs/APIs. In the case of inn2 it seems to
encode the off_t inside the file so clearly it either needs to
abstract it or its database will only work with binaries compiled
in the way.

Casper
Richard Kettlewell
2015-06-11 15:52:22 UTC
Permalink
Raw Message
Post by Casper H.S. Dik
Post by Richard Kettlewell
How does Solaris arrange that an application built with 32-bit off_t
works with a library with 64-bit off_t (and that uses off_t in its API),
or vica versa, then?
You will need to provide two interfaces and do some magically
things whether you are using a LFS compile environment or not.
The C library is one such example; and if an interface which
makes the off_t visible, you will need to do the same.
That’s not solving the problem, that’s leaving it to
developers/integrators to work around.
Post by Casper H.S. Dik
Post by Richard Kettlewell
Post by Casper H.S. Dik
"The old inn2-lfs package does not exist anymore and must be replaced by
the new functionally equivalent inn2 package, which supports large files."
The history database needs to be rebuild when changing from
a non-LFS inn2 to a LFS compiled inn2 version.
That doesn’t mean the general problem’s gone away.
The problem was solved in the middle of the '90s; it needs to
be handled in all new ABIs/APIs. In the case of inn2 it seems to
encode the off_t inside the file so clearly it either needs to
abstract it or its database will only work with binaries compiled
in the way.
The problem I’m talking about is the use of APIs that contain off_t, not
how it encodes its database.
--
http://www.greenend.org.uk/rjk/
Casper H.S. Dik
2015-06-11 15:58:03 UTC
Permalink
Raw Message
That’s not solving the problem, that’s leaving it to
developers/integrators to work around.
That is not correct; whenever you have an ABI/API which uses
off_t you make two available. Under POSIX, you'd then compile
either with $(getconf LFS_CFLAGS) or not and depending on that
you either get one or the other.

It is a bit more work for API writers but generally the same
source code can be used to provide both interfaces. I.e., it
is hardly any additional work.
The problem I’m talking about is the use of APIs that contain off_t, not
how it encodes its database.
That was not the problem why there were two inn2 versions.

Casper
Richard Kettlewell
2015-06-11 16:14:12 UTC
Permalink
Raw Message
Post by Casper H.S. Dik
Post by Richard Kettlewell
That’s not solving the problem, that’s leaving it to
developers/integrators to work around.
That is not correct; whenever you have an ABI/API which uses
off_t you make two available. Under POSIX, you'd then compile
either with $(getconf LFS_CFLAGS) or not and depending on that
you either get one or the other.
It is a bit more work for API writers but generally the same
source code can be used to provide both interfaces. I.e., it
is hardly any additional work.
If “you” is not a developer or integrator, who is it?
Post by Casper H.S. Dik
Post by Richard Kettlewell
The problem I‘m talking about is the use of APIs that contain off_t, not
how it encodes its database.
That was not the problem why there were two inn2 versions.
The problem ***that I’m actually talking about*** is to do with the ABI
incompatibility.
https://sourceware.org/ml/libc-alpha/2014-03/msg00409.html mentions it.
--
http://www.greenend.org.uk/rjk/
Casper H.S. Dik
2015-06-11 16:56:07 UTC
Permalink
Raw Message
Post by Casper H.S. Dik
It is a bit more work for API writers but generally the same
source code can be used to provide both interfaces. I.e., it
is hardly any additional work.
If “you” is not a developer or integrator, who is it?
I suppose I'm one but I was not involved at that time.
Post by Casper H.S. Dik
The problem I‘m talking about is the use of APIs that contain off_t, not
how it encodes its database.
That was not the problem why there were two inn2 versions.
The problem ***that I’m actually talking about*** is to do with the ABI
incompatibility.
https://sourceware.org/ml/libc-alpha/2014-03/msg00409.html mentions it.
But that is *not* about APIs, it seems, but rather about
both file formats and hard coded assumptions on the size of off_t.

Casper
Richard Kettlewell
2015-06-11 17:01:05 UTC
Permalink
Raw Message
Post by Casper H.S. Dik
But that is *not* about APIs, it seems, but rather about
both file formats and hard coded assumptions on the size of off_t.
This is too much like pulling teeth, I’m giving up.
--
http://www.greenend.org.uk/rjk/
Keith Thompson
2015-06-11 15:05:41 UTC
Permalink
Raw Message
[...]
Post by Casper H.S. Dik
Post by jacob navia
Updating those functions would be the easiest way out.
Clearly not because it break compatibility. And what type would
you use? What type does the standard have which is at least 64 bits?
long long.

But the type used should be a typedef, similar (or identical) to POSIX's
off_t.

If backward compatibility were not an issue, changing ftell and fseek to
use off_t rather than long would be the obvious solution. Instead, I'd
advocate having ISO C adopt POSIX's fseeko(), ftello(), and off_t.
--
Keith Thompson (The_Other_Keith) kst-***@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Loading...