Discussion:
Open source C compiler using Regular Expressions
(too old to reply)
sasho648
2021-09-01 16:28:49 UTC
Permalink
It uses PCRE2 to parse the C file and match a huge regex composed of several .regex files stitched together one Perl script (main.pl). There are about 94 currently callouts placed inside it which invoke C++ code that reads named capture groups and calls the LLVM APIs appropriately to construct a program.

https://github.com/6a4h8/cparser2/tree/wip

This is an open source compiler using regular expressions and mainly focusing on the C89 (from fips pub 160 pdf document).



The backend was originally a huge C switch which I recently converted into C++ virtual functions - there are two pair of them - one for parsing - they can alter the match and one for producing.

The parsing one is mainly used for typedefs since they require context sensitive parsing inside functions.

Currently it doesn't implement: initialization, WIP on implementing conditional evaluation with the logical ops, incomplete types, un-prototyped functions.

Most importantly it doesn't support attributes and preprocessor directives.

It does implement: everything else hopefully.

Check out the WIP branch (lastly worked on Windows). Invocation:

cparser main.pl in_src.c

Expected output (llvm bitcode and IR representation):

in_src.c.bc
in_src.c.ll

It can be debugged if you uncomment the ending of line 6 in main.h. This will produce 2 output.txt files and significantly slow down the compilation process.
Benjamin Williams (Hodgez)
2023-06-12 05:55:33 UTC
Permalink
Absolute mad lad. I love it. I will have to give it a try later to see how all it works.
sasho648
2023-06-12 07:46:14 UTC
Permalink
Post by Benjamin Williams (Hodgez)
Absolute mad lad. I love it. I will have to give it a try later to see how all it works.
Just FYI - it's on https://github.com/AnFunctionArray/cllvmbackend on now (with git submodule - the actual perl/regex part). I guess on the "mad lad" part you'll be happy to hear that this version is also multithreaded (because it turned out (last time - I've not checked out the last perl updates) that this way was actually faster - with the bottleneck otherwise being the regex engine) - you need this evn vars:

MAXTHREADS=8
MINLEN=50000
SILENT=1

Otherwise the syntax is the same:

regularc ./parse.pl ./bulk/tests/test.c

But also generally last time it had some issues (since I was trying it for different purposes (for which there is the non standard INTPROM env var)). However I also had success compiling the c donut program with slight modifications (mainly removed the preprocessor - line concatenation and comments) at certain point in the past.
sasho648
2023-06-12 07:49:45 UTC
Permalink
Post by sasho648
Post by Benjamin Williams (Hodgez)
Absolute mad lad. I love it. I will have to give it a try later to see how all it works.
MAXTHREADS=8
MINLEN=50000
SILENT=1
regularc ./parse.pl ./bulk/tests/test.c
But also generally last time it had some issues (since I was trying it for different purposes (for which there is the non standard INTPROM env var)). However I also had success compiling the c donut program with slight modifications (mainly removed the preprocessor - line concatenation and comments) at certain point in the past.
Faster - that's for **very large** files - otherwise it's the same.
Loading...