The answerer is simple. You don’t even try. The complexity of it is beyond the reach of current computational process. Generative processes is a dumb assed process but an immensely powerful one. It can create great results, as nature does; provided the initial models are built sensibly (with some design intent) and selection is made intelligently. So, finally its about choice. It’s designing by selection.
Now for the contest. The purpose of this test is not to imitate, compete against or test against the steps involved in the medieval processes. The end results need to be compared, not by designers, but by those whom it is intended for - consumers. This way we can avoid the prejudices of designers (including the search for pencil marks) and asses if it is capable of producing similar end results.
I suggest a “Turing test” http://en.wikipedia.org/wiki/Turing_test/ . This will involve the comparison of machine generated and designer generated end results using a blind test method, where 100 or so designs can be shown to unbiased parties who will be asked to identify the ones that they like. If they choose the ones generated by medieval processes then we know that it is superior design method. If they score equally then both processes are equally good or bad.
I believe that this would be the recommended scientific test for such a comparison. This will help us judge end results. We can compare time taken separately.