One of my favorite topics of study is databases. A database is simply data that's been organized. There are 5 classes of data organization.
- Text data
- This includes text files, source code or document data and not commonly called a database.
Examples: written language, text documents, source code
- Tabular data
- This is commonly called a relational database or structured database.
Examples: spreadsheet, SQL table
- Tree data
- This is commonly called a hierarchical or a semi-structured database.
Examples: XML, JSON, directories
- Binary data
- This includes images, streams or other record-pointer files and not commonly called a database.
Examples: image files, audio files, video files
- Garbage data
- This is commonly identified as ill-formed, invalid, incorrect, non-functional, un-parse-able or noise. This can look like any other type above, but is distinguished by an inability to find a correct and purposeful or functional interpretation.
In practice, my data could be part of any data class above, or an interleaving of multiple kinds in sequence or in sandwich. And, I have an essential relationship to the data, because it's all garbage until I find a purpose for it.
I've ranked and sequenced the kinds of data above approximately in order of direct applicability, comprehensibility and usability.
- Characters (Letter, Character set, Encoding, Bit sequence)
- Atomically, what will the data be composed of?
- Values (Word, Cell, Attribute Value, Character sequence)
- Fundamentally, what answers will be stored? This is the data, and the bulk content; what I typically call a database. I form values from a list of characters.
- Properties (Phrase, Column Name, Attribute Name, Property)
- Logically, what questions will categorize answers (values) stored in my database? I name each property, and define its question. I form properties to store similar values and reference them by property name.
- Types (Part of speech, Column Type, Attribute Restriction, Primitive Type)
- Legally, what domain and format will I accept for stored values (answers)? I name each type and define its domain. I give each property a type to restrict the characters composing the values it references.
- Objects (Sentence, Row, Node, Object)
- Concretely, what objects will I track for which I want to store answers? I like to think about my reality's association to my data. I form objects from a list of related values referenced as properties.
- Keys (Proper noun, Key, ID, Hash)
- Uniquely, how will each object in the database be identified? I form an object key from a subset of values of an object which identify the object.
- References (Pronoun, Foreign Key, Reference, Pointer)
- Indirectly, how will an object be referenced from other objects? I determine which properties will contain references to other objects. I form each reference from a subset of values of an object that form a match to another object in the database.
- Prototypes (Definition, Table, Element, Class)
- Functionally, what will group objects with similarities? My main purpose for a database is to regularly, effectively and efficiently query, store and operate on data. I will name each prototype and define the properties with their type and identify the key properties and reference properties.
- Schema (Grammar, Table definition, Schema, File type)
- Summarily, what collection of definitions and rules will govern my data so I can avoid it all becoming garbage data as the database grows or ages? The schema at its heart is connected to the tools that read and write to the database. The schema is the restrictions we must place on the operation of those tools to avoid corruption. My database schema must evolve to accommodate changes to all these important aspects of my database.
I listed my prototypical name for these concepts for each class of database, respectively (text, tabular, tree, binary). While the terminology and implied means to store, query or update databases differ wildly, the concepts above are essentially immutable.
I've ranked and sequenced the important database concepts in order of learning and organic database creation. By organic, I mean this is the most common snowball sequence I follow to capture and organize data. By snowball sequence, I mean start with 1, 1&2, 1&2&3, 1&2&3&4, and so on. Until I reach a steady notion of schema and associated tool set that can support my content, it doesn't really feel like a database.
All it ever takes is 1 new anything (character, value, property, type, object, key, reference, prototype, schema) to disrupt the whole database, but my utilitarian goal is that my database will be conceptually prepared for most new data in my domain of interest to achieve minimal disruption and maximum capability.
Copy the tabular data into the left textarea, one row per line. Copy the cell separator into the input box above. Input property names into the middle textarea, one per line, first to last. Click the button to output tree data (JSON) on the right.