Post by s***@casperkitty.comPost by BGByep. doing C in a VM with full access checks is a little harder, as it
it is typically be necessary to validate pointers much more frequently.
It is difficult to sandbox C; it would be much easier to sandbox a language
that was very close to C, but added some restrictions and possibly some new
types as well. A secure language should not allow the construction of
possibly-invalid pointers. While there are some applications where it is
necessary to construct pointers from numbers in ways that cannot possibly
be validated, many other applications would have no need for such a thing
given a type-agnostic means of copying objects that may include pointers.
The relationship between C's numeric sizes and common machine word sizes
makes things a bit awkward, but if one had a system with 31-bit "char",
"short", and "int" types, 62-bit "long", and 93-bit "long long" and added
a restriction that storing into the storage occupied by a pointer the
result of any integer operator would invalidate the pointer, one could then
represent each pointer as a 31-bit object ID whose upper word was set and
a displacement whose upper word was clear. Loading the first word of a
pointer into a "char" would yield one whose upper bit was invisibly set,
but the only way code could ever get its hands on such a value would be
to load it from a valid pointer. Dangling pointers could be avoided by
ensuring that before any object-id gets reused, memory would be searched
for no-longer-valid object ids, and any such ids found would be invalidated.
the language I am working on is partly intended as a compromise between
some of the languages I have used.
I am primarily a C programmer, so C still weighs fairly heavily in the
language design (though, it isn't really meant to try to "replace" C, as
it is intended for different use cases).
I also use Java and C# to some extent, but find them at times to be a
bit dissatisfying (stuff that is fairly straightforward in C may be
rather annoying in C#, and a horrible mess in Java).
similarly, C++ also weighs in, though the language doesn't really aim to
recreate C++ (it is a somewhat different beast).
general:
syntax:
mixture of Java, C, and ActionScript;
tries to be fairly conservative here
I see little point in making the syntax needlessly weird
granted, some things depend on ones' perspective / ...
parser works more like C# than like in C / C++ or Java
parsing depends solely on syntax, not on context
type-system:
statically-typed, mostly Java-like;
variant types will exist as a special case (*1).
a lot of C-like features (arrays may be offset and type-cast);
will support structs, which are mostly C#-like;
struct pointers may point into arrays;
like C#, they will not support inheritance or virtual methods.
object system:
mostly Java-like, single inheritance with interfaces
methods implicitly virtual (unless final or not overloaded)
will probably support default methods.
near-term, will probably not support generics
scope:
semantically, scoping model is C# like.
packages and images work similarly to namespaces and assemblies
compilation process is also similar to C#
modules link horizontally and declaration order is irrelevant
like C and C++, functions and variables may be declared globally
argument variables may be passed by reference (like C# and C++)
memory management:
explicit, manual, and region-based (*2)
in many cases, lifetime is made explicit in declarations and usage.
essentially 'automatic duration' and use of RAII like patterns
other cases are handled by regions or by manual new/delete.
*1: as noted, this is where the 62 bit fixlong and flonum types come
into play. the normal long and double types are 64-bit. smaller integer
types are represented at full precision as variant.
these exist mostly because sometimes one really does need it.
'variant' will in many ways function similarly to 'object' in C#, and is
used in basically a similar way.
core types are otherwise fixed size, with an attempt to nail down
numerical semantics (arithmetic expressions should ideally have well
defined results). some semantics here differ a bit from C (different
type promotion rules, minor differences in operator precedence rules, ...).
*2: traditional garbage collection can have rather undesirable effects
on performance and overall responsiveness, particularly in real-time
use-cases (this is one of the major intended application areas).
implicitly, the language design will assume that the VM provides for
memory and object management, but mostly leaves it to user code that
memory gets freed.
new/delete will be used, but objects will generally exist within
regions. specialized regions may be created, and when destroyed any
objects remaining within the region are also destroyed (as an
alternative to tracking them down and deleting them individually).
I had considered also supporting reference-counting, but ATM I have
reservations on this.
Post by s***@casperkitty.comPost by BGBthe heap indices essentially were scaled.
Bit 0: 0
Bit 1-20: Cell Index (multiple of 16 bytes)
Bit 21-31: Identifies the heap region (multiple of 16MB)
unpacking the reference basically consists of fetching the region base
and adding in the cell index.
I've long thought the 8086 segmentation design was under-appreciated (though
I admit I was among those under-appreciating it in its day, I have yet to
see any better approach for accessing 1MB of addressing space on a 16-bit
machine). The 80286 design was a step back in terms of usability, and
except for emulation purposes the 80386 machine followed the 80286 approach
while ignoring what was good about the 8086 approach. What you describe is
like what I wish the 80386 could have supported at a hardware level, though
I would probably have had a smaller number of heap regions so as to allow
descriptors for all of them to be kept on-chip.
the size of the heap depends a fair bit on how much memory is in use.
this part is means to be fairly transparent to client code, and ideally
it will not be munging around too much with references.
it is likely to depend on the code's level of "trust" how much the VM
allows it to get away with on this front.
Post by s***@casperkitty.comPost by BGBthere is no need to "lock" memory in any sense though, as the memory
does not move around, nor is there any form of exclusivity towards its
access.
Allowing objects to move around can really help ease problems with memory
fragmentation, especially in resource-constrained systems. The cost of
supporting relocatable memory on the Macintosh was pretty reasonable given
the extent to which it could ease fragmentation.
ok, makes sense. could be worth considering, though I am uncertain ATM.
it would have to be counter-balanced some with the cost of memory copying.
if working with a lot of memory, memory access patterns can easily
become a big factor for performance, and the delays caused by "memcpy()"
style operations can potentially become fairly significant (particularly
in real-time).
though, this is likely to depend a lot on the hardware and use-case.
main targets presently are x86, x86-64, and ARM.