The paper will introduce an algorithmic complexity program for use in linear sequential strings beyond the traditional Radix 2, binary, based numerical system. The compression levels are all beyond Radix 2 levels of compression in both random and non-random states. Examples will be taken from the chemical, biological and nanotechnology fields.

Extracto

A Compression Program for Chemical, Biological, and

Nanotechnologies

By Bradley S. Tice

Advanced Human Design, P.O. Box 3868, Turlock, California 95381 U.S.A.

Abstract

The paper will introduce a compression algorithm that will use based number systems beyond the fundamental standard of the traditional binary, or radix 2, based system in use today. A greater level of compression is noted in these radix based number systems when compared to the radix 2 base as applied to a sequential strings of various information. The application of this compression algorithm to both random and non-random sequences for compression will be reviewed in this paper. The natural sciences and engineering applications will be areas covered in this paper.

Keywords: Compression Algorithm, Chemistry, Biology, and Nanotechnology

I. Introduction

A binary, or radix 2 based, system is defined as two separate characters, or symbols, that have no semantic meaning apart from not representing the other character. This is the same notion Shaimon gave to the binary based system upon it’s publication in 1948 [1]. This paper will present research that shows how various radix based number systems have a compression value greater than the traditional radix 2 based system as in use today [2]. The compression algorithm will be used to compress various random and non-random sequences. The work has applications in theoretical and applied natural sciences and engineering.

2. Randomness

The earliest definition for randomness in a string of 1 ’s and 0’s was defined by von Mises, but it was Martin-Lof s paper of 1966 that gave a measure to randomness by the patternlessness of a sequence of 1 ’s and 0’s in a string that could be used to define a random binary sequence in a string [3 and 4]. A non-random string will be able to compress, were as a random string of characters will not be able to compress. This is the classical measure for Kolmogorov complexity, also known as Algorithmic Information Theory, of the randomness of a sequence found in a binary string.

3. Compression Program

The compression program to be used has been termed the Modified Symbolic Space Multiplier Program as it simply notes the first character in a line of characters in a binary sequence of a string and subgroups them into common or like groups of similar characters, all l’s grouped with l’s and all 0’s grouped with 0’s, in that string and is assigned a single character notation that represents the number found in that sub-group, so that it can be reduced, compressed, and decompressed, expanded, back to it’s original length and form [5]. An underlined 1 or 0 is usually used to note the notation symbol for the placement and character type in previous applications of this program. The underlined initial character to be compressed will be used for this paper.

4. Application of Theory

The compression algorithm will be used for the following radix based number systems: Radix 6, Radix 8, Radix 10, radix 12 and radix 16. These are traditional radix base numbers from the field of computer science and have strong applications to other fields of science and engineering due to the parsimonious nature of these low digit radix base number systems [6]. The compression algorithm in this paper can be both a ‘universal’ compression engine in that all members of a sequence, either random or non-random, can be compressed or a ‘specific’ compression engine that compresses only specific types of sub-groups within a random or non-random string of a sequence.

The compression algorithm will be defined by the following properties:

1. ) Starting at the far left of the string, the begiiming, and moving to the right, towards the end of the string.
2. ) Each sub-group of common characters, including singular characters, will be grouped into common sub-groups and marked accordingly.
3. ) The notation for marking each sub-group will be underling the initial character of that common sub-group. The remaining common characters in that marked sub-group will be removed. This results in a compressed sequential string.
4. ) De-compression of the compressed string is the reverse process with complete position and character count to the original pre-compressed sequential string.
5. ) This will be the same processes for both random and non-random sequential strings.

5. Chemistry

Chemistry is the science of the structure, the properties and the composition of matter and it’s changes [7].

5.1 Polymer

A polymer is macromolecule, large molecule, made up of repeating structural segments usually connected by covalent chemical bonds [8].

5.2 Copolymer

A copolymer, also known as a heteropolymer, is a polymer derived from two or more monomers [9].

Types of Copolymers;

1.) Alternating Copolymers: Regular alternating A and В units.
2.) Periodic Copolymers: A and В units arranged in a repeating sequence.
3.) Statistical Copolymers: Random sequences.
4.) Block Copolymers: Made up of two or more homopolymer subunits joined by covalent bonds.
5.) Stereoblock Copolymer: A structure formed from a monomer.

An example of the use of a compression algorithm on copolymers is as follows:

[...]

Frequently Asked Questions

What is the "Modified Symbolic Space Multiplier Program"?

It is a compression algorithm that subgroups common characters in a string and assigns a single character notation to represent that subgroup, allowing for efficient reduction and decompression without data loss.

Why does the program use number systems beyond Radix 2?

The research shows that higher radix systems (like Radix 6, 8, 10, 12, or 16) can achieve greater levels of compression compared to the traditional binary (Radix 2) system used in modern computing.

Can the algorithm compress random sequences?

According to the paper, the algorithm is designed to work with both random and non-random sequential strings, providing a "universal" compression engine for various data types.

How is a "random string" defined in this context?

Following the definitions by von Mises and Martin-Lof, a random string is characterized by patternlessness. Classically, truly random strings are considered incompressible under Kolmogorov complexity theory.

What are the practical applications of this compression method?

The method is specifically applicable to chemical, biological, and nanotechnological fields, such as representing complex polymer or copolymer structures like DNA or synthetic chains as linear strings.

How does the decompression process work?

Decompression is the exact reverse of the compression process. It uses the marked notations to restore the original character count and position, ensuring the string returns to its pre-compressed form.

Final del extracto de 8 páginas - subir

Detalles

Título: A Compression Program for Chemical, Biological, and Nanotechnologies
Curso: Statistical Physics
Calificación: A [4.00]
Autor: Professor Bradley Tice (Autor)
Año de publicación: 2008
Páginas: 8
No. de catálogo: V198602
ISBN (Ebook): 9783656333999
ISBN (Libro): 9783656645221
Idioma: Inglés
Etiqueta: compression program chemical biological nanotechnologies
Seguridad del producto: GRIN Publishing Ltd.

Citar trabajo: Professor Bradley Tice (Autor), 2008, A Compression Program for Chemical, Biological, and Nanotechnologies, Múnich, GRIN Verlag, https://www.grin.com/document/198602