v.1 |
Website related to the paper
This website describes the whole content of the experiment and gives access to the complete results
Complete source code in Mathematica notebooks (zipped file, Mathematica version 6.0 required) contents:
The compressed Zip file contains 27 files summing up 3.4 Mb after the decompression. It contains the full source code and files to perform the experiment. The only separated packages are such including the 100 images that were taken from the web to perform the image analysis. These 2 additional packages are Images.zip and ImagesBN.zip.
Images.zip contains 100 images in JPG full-color format. By contrast, ImagesBN.zip contains the same images converted into bitmap (BMP) black-and-white images. By the name of the images it is possible to trace them to the original source in the website from which they were taken: http://www.flickr.com
Most of them were either public images or images under the Creative Commons License.
- All files are either in their source format which is a Mathematica notebook with a ".nb" extension or in PDF format. Notebooks contain all the necessary code to perform and run a complete experiment from scratch, however they need to be open with, at least version 6.0 since they use several functions available only from that version. The whole experiment takes several hours. On a MacBook with a 1.8 Ghz. Core Duo Intel processor, 1Gb in RAM and at least 5Gb. of free space on Hard Disk for memory paginating it took approximately 30 hours (consider that the experiment consists in massively, exhaustively and systematically generate, analyze and classify an impressive amount of data partitioned in strings coming from the output of almost 20 thousand Turing machines (100 steps each one), one thousand cellular automata (100 steps each one), 100 images and a complete Human Being DNA chromosome, and all possible comparisons between all them. The included code is self-content, completely automatized and highly optimized to perform even other kind of experiments disregarding the source, the size or type. The user can even perform other experiments involving other systems and other string lengths using the same code.
- Because all files from a class of files (those beginning with the same name) are essentially the same varying only by the string length n to be analyzed, only the first file of the class of files was fully documented. So, by instance, only AnalysisComparisonn4.[nb|pdf] was fully documented, since each of the rest follow exactly the same order and use exactly the same functions and arguments. The user should be quickly used to the function and variable names since they make sense with their practical function, content or meaning and they are systematically used in the same order all across the files.
- The file Main.[nb|pdf] contains the main functions that were made to perform the whole experiment. That includes distances definitions, the Burnside lemma algorithm for calculating the total number of strings, the strings themself and the automatic rules for reducing any set of strings of any length to the reduced set according to the Burnside's lemma taking into consideration complexity symmetries.
- The files DNAAnalysis.[nb|pdf] and DNAAnalysis2.[nb|pdf] contain the results for the Human Genome sample analysis by complexity, and its comparisons with the abstract automata and image results. Those files contain the analysis for all string lengths, n = 4, 5, 6 and 10, and for the 2 possible binary transformations that a DNA sequence allows.
- Files beginning with "ComparisonAnalysis" are files containing the results of the analysis of abstract systems, namely Turing machines and Elementary Cellular Automata, both with regular and random inputs. At the end of the name each file contains the string length for which the analysis was performed.
Example: the file named ComparisonAnalysisn4.[nb|pdf] contains the results for abstract machines (Turing machines and Elementary Celullar Automata) for strings of length 4.
There are 4 files containing the results for strings of length n = 4, 5, 6 and 10. These files specifically contain the classifications by complexity for each of the studied abstract automata, their classification and distances between all them.
As described in the published paper two distances were implemented, one analyzing the raw classification -i.e. only the order of appearance mattered-, and another measuring the frequency or probability of appearance for each string class between two classifications from two different sources (automata, images or DNA).
- Files beginning with the name "ImagesExperiment" are the files containing the description and results of the Images experiment. As for the files above, after their name they contain the suffix with the length of the strings for which the experiment was performed.
Example: ImagesExperimentn4.[nb|pdf] contains the Image experiments for strings of length 4.
- Within the whole experiment and across all files some naming conventions were taken:
* MT means Turing machine (as well as its letter permutation "TM") with a regular tape initialized with n zeros. Where n is the length of the string to be analyzed.
* ECA means Elementary Cellular Automata (for which there are 256 2-symbol, 1-neighbor rules).
* MTR means Turing machine with a randomly initialized tape as input.
* ECAR means Elementary Cellular Automata initialized with a randomly generated string of black-and-white cells of length n, with n the length of the strings to be analyzed.
* DNA means, as usual, Deoxyribonucleic acid.
* DNAH means Homo Sapiens DNA (a sample).
* DNAHb is temporally used in the file DNAAnalysis2.[nb|pdf] to denote the second alternative binary transformation of a DNA sequence. After the first part it is renamed DNAH to perform the analysis just as it was performed in the file DNAAnalysis.[nb|pdf] for the first possible binary transformation.
* IMG means Image(s), and it is the chosen name for the variable, as the above, containing a specific classification for Images.
In all cases, the names are followed by a number indicating the string length of each classification.
Example: ECAR5 means that this variable contains the classification for Elementary Cellular Automata with random input and that the classification is the one for strings of length n, for example 4, for which there would be 6 different representative strings after applying Burnside's lemma.
- The file AllResults.[nb|pdf] contains tables synthesizing and presenting all the relevant results of the whole experiment concerning automata, images and DNA, in arrays grouped by string length n=4, 5 6 and 10 to be presented.
- The file HomoSapiensChromosome1.txt contains the complete sequence of the first chromosome of the Homo Sapiens which was used for the DNA distribution analysis.
- The ZIP file Images.zip contains the 100 images used to for the complexity distribution analysis, before converting them to B&W -or binary files-.
- The ZIP file ImagesBN.zip contains the 100 B&W images used to for the complexity distribution analysis.
- The file imageslist.txt, located inside the package ImagesBN.zip, contains the names list of all the images to be imported. It needs to be in the path of Mathematica set by $Path as well as the images files directory.
- Because there is a strong dependance between several stages within the experiment, by instance for the comparisons, files need to be run in the following order:
Main.nb
ComparisonAnalysisn4.nb
ComparisonAnalysisn5.nb
ComparisonAnalysisn6.nb
ComparisonAnalysisn10.nb
ImagesExperimentn4.nb
ImagesExperimentn5.nb
ImagesExperimentn6.nb
ImagesExperimentn10.nb
DNAAnalysis.nb
DNAAnalysis2.nb
FinalArrays.nb or AllResults.nb
Grouped files depend on previous grouped files, however any file within a single group can be run once the previous group finished.
Those files represent the core of the experiment. Aditional files like Burnside.nb that are explanatory can be run after the Main.nb
- All files contain as their first line of code the command Date[] which retrieves the last date in which the file was modified or used (id the Date[] was evaluated). Anyway this date represents the key for dating the results and the code.
- The compressed ZIP file available by FTP is named according to the date of the last experiment performed (which could not necessarily coincide with the Date[] command at the beginning of each individual file). The Zip file name contains such date in the following format "ExperimentDDMMYY.zip" and determines the current or last version of the experiment -due to code changes or experiment repetitions-.
Further information, comments and suggestions:
Jean-Paul Delahaye: delahaye [at] lifl.fr
Héctor Zenil: hector.zenil-chavez [at] malix.univ-paris1.fr