← Back to home

Simple Corpus Obfuscator

The main idea goal of this obfuscator is being able to format any C program into a given block of English text using #define macros and other tricks to get there.

Furthermore, code generated by this obfuscator should compile successfully with -Wall -Werror.

First, we need to figure out what tokens we have in the text to work with. We can extract all word matches from the corpus using Regex, giving us the following word counts:

Code:

1{"A":1,"cantilever":2,"is":2,"a":5,"rigid":1,"structural":2,"element":1,"that":1,"extends":2,"horizontally":1,"and":1,"unsupported":1,"at":1,"one":1,"end":1,"Typically":1,"it":2,"from":1,"flat":1,"vertical":1,"surface":1,"such":1,"as":2,"wall":1,"to":1,"which":1,"must":1,"be":2,"firmly":1,"attached":1,"Like":1,"other":1,"elements":1,"can":1,"formed":1,"beam":2,"plate":1,"truss":1,"or":1,"slab":1,"Cantilever":5,"construction":1,"allows":1,"overhanging":1,"structures":1,"without":1,"additional":1,"support":1,"See":1,"also":1,"Applied":1,"mechanics":1,"bicycle":2,"brakes":1,"frame":1,"chair":1,"method":1,"Cantilevered":1,"stairs":1,"Corbel":1,"arch":1,"Euler":1,"Bernoulli":1,"theory":1,"Grand":1,"Canyon":1,"Skywalk":1,"Knudsen":1,"force":1,"in":1,"the":1,"context":1,"of":1,"microcantilevers":1,"Orthodontics":1,"Statics":1}

To prevent saturating the generation process with overly convoluted computations, we first remove all tokens that are duplicated across the text. This means that the script is sometimes unable to calculate an optimal solution, but also makes it a lot more feasible to write.

Then, the parsed string is tokenized into the following tokens:

Code:

1["A","rigid","element","that","horizontally","and","unsupported","at","one","end",".","Typically","from","flat","vertical","surface","such","wall",",","to","which","must","firmly","attached",".","Like","other","elements",",","can","formed",",","plate",",","truss",",","or","slab",".","construction","allows","overhanging","structures","without","additional","support",".","See","also","-","Applied","mechanics","-","brakes","-","frame","-","chair","-","method","-","Cantilevered","stairs","-","Corbel","arch","-","Euler","-","Bernoulli","theory","-","Grand","Canyon","Skywalk","-","Knudsen","force","in","the","context","of","microcantilevers","-","Orthodontics","-","Statics"]

If each token contained only valid variable names, we'd be done. However, symbols like . or , cannot be overridden via macro; the script will attempt to "escape" them using surrounding controllable tokens instead.

The basic strategy for doing so is as follows:

  • Attempt to escape integer literals by wrapping it in
    1(void) [token];
  • Attempt to escape any valid unary operator (e.g. ~!) by wrapping it in
    1(void) [operator] 0;
  • Attempt to escape any valid binary operator (e.g. -+*/%&|^><) by wrapping it in
    1(void)(0 [operator] 1);
    (1 is used on the right to prevent division by zero errors with /and %.)
  • Attempt to escape , by wrapping it in
    1(void)((void)0,0);
    (a discarded comma operator expression.)
  • Attempt to escape . by wrapping it in
    1(void)obj.prop;
    (a discarded access to some property on some object. This is used instead of something like 0.0 because object property access is valid across newlines, while a float literal is not.)
  • Attempt to escape ? and : by wrapping it in
    1(void)(0?0:0);
    (a discarded ternary expression.)

Unfortunately, -Werror causes compilation to fail on unclosed quotes in #define macros, and because macros can't insert block comments into code there are certain text sequences that are simply impossible to escape. These include:

  • Any corpus that starts with an uncontrollable token.
  • Any corpus that ends with an uncontrollable token.
  • Any invalid token (i.e. a token that starts with a number that isn't a valid number literal).
  • Two ,s in a row without a controllable token in between.
  • Any special character proceeded by an incompatible, uncontrollable token.

Furthermore, rules surrounding brackets are tricky and this generator is not smart enough to escape them automatically; corpora containing brackets or parentheses must be edited manually.

Applying these rules (and assuming no errors), we can generate the following parsed corpus:

Code:

1struct tbnlw{int srlpn;}tbnlw_o;int main(void){    /* rigid */  /* element */ /* that */  /* horizontally */ /* and */ 
2/* unsupported */ /* at */ /* one */ (void)tbnlw_o. srlpn;   /* from */  /* flat */ /* vertical */ /* surface */ /* such */
3  (void)((void)0, 0); /* which */  /* must */  /* firmly */ (void)tbnlw_o. srlpn; /* other */  (void)((void)0,
4  0);  (void)((void)0   , 0);(void)((void)0, 0);(void)((void)0, 0); (void)tbnlw_o.
5
6 srlpn; /* allows */ /* overhanging */ /* structures */ /* without */ /* additional */ (void)tbnlw_o.
7
8srlpn; (void)(0
9- 1); (void)(0
10-   1);(void)(0
11-   1);(void)(0
12-  1);(void)(0
13-  1);(void)(0
14- 1); (void)(0
15- 1); (void)(0
16- 1);(void)(0-1);  (void)(0
17- 1); /* Canyon */ (void)(0
18- 1); /* force */ /* in */ /* the */ /* context */ /* of */ (void)(0
19- 1);(void)(0
20- 1);}

where the commented tokens are free slots to put any C code you want to run.

Then, we can map these tokens to #define macros to generate the final C program (replace defines with // free comments with your C code):

Code:

1#define A                struct tbnlw{int srlpn;}tbnlw_o;int main(void){
2#define cantilever
3#define is
4#define a
5#define rigid            // free
6#define structural
7#define element          // free
8#define that             // free
9#define extends
10#define horizontally     // free
11#define and              // free
12#define unsupported      // free
13#define at               // free
14#define one              // free
15#define end              (void)tbnlw_o
16#define Typically        srlpn;
17#define it
18#define from             // free
19#define flat             // free
20#define vertical         // free
21#define surface          // free
22#define such             // free
23#define as
24#define wall             (void)((void)0
25#define to               0);
26#define which            // free
27#define must             // free
28#define be
29#define firmly           // free
30#define attached         (void)tbnlw_o
31#define Like             srlpn;
32#define other            // free
33#define elements         (void)((void)0
34#define can              0);
35#define formed           (void)((void)0
36#define beam
37#define plate            0);(void)((void)0
38#define truss            0);(void)((void)0
39#define or               0);
40#define slab             (void)tbnlw_o
41#define Cantilever
42#define construction     srlpn;
43#define allows           // free
44#define overhanging      // free
45#define structures       // free
46#define without          // free
47#define additional       // free
48#define support          (void)tbnlw_o
49#define See              srlpn;
50#define also             (void)(0
51#define Applied          1);
52#define mechanics        (void)(0
53#define bicycle
54#define brakes           1);(void)(0
55#define frame            1);(void)(0
56#define chair            1);(void)(0
57#define method           1);(void)(0
58#define Cantilevered     1);
59#define stairs           (void)(0
60#define Corbel           1);
61#define arch             (void)(0
62#define Euler            1);(void)(0
63#define Bernoulli        1);
64#define theory           (void)(0
65#define Grand            1);
66#define Canyon           // free
67#define Skywalk          (void)(0
68#define Knudsen          1);
69#define force            // free
70#define in               // free
71#define the              // free
72#define context          // free
73#define of               // free
74#define microcantilevers (void)(0
75#define Orthodontics     1);(void)(0
76#define Statics          1);}
77
78A cantilever is a rigid structural element that extends horizontally and is
79unsupported at one end. Typically it extends from a flat vertical surface such
80as a wall, to which it must be firmly attached. Like other structural elements,
81a cantilever can be formed as a beam, plate, truss, or slab.
82
83Cantilever construction allows overhanging structures without additional support.
84
85See also
86- Applied mechanics
87- Cantilever bicycle brakes
88- Cantilever bicycle frame
89- Cantilever chair
90- Cantilever method
91- Cantilevered stairs
92- Corbel arch
93- Euler-Bernoulli beam theory
94- Grand Canyon Skywalk
95- Knudsen force in the context of microcantilevers
96- Orthodontics
97- Statics
Corpus