Welcome to the Realm: The World of 'Didgets'

Didgets (short for Data Widgets) are 'smart data objects' that can do some amazing things. There are lots of different kinds of Didgets, with each Didget performing a very specific task. The system is built to efficiently handle hundreds of millions of these Didgets within a single container (called a Pod).

Like an erector set that is made up of many, special-purpose pieces like wheels, pulleys, connectors, etc. that can be put together in various ways to create all kinds of things (cars, bridges, cranes, etc.); the Didget system is made up of special purpose data objects (files, lists, schema, etc.) that can be easily connected together to form more complex structures like relational database tables, hierarchical file systems, or key-value stores.


Overview:

The Didget Management System is a platform upon which many different applications and services can be built. The system consists of different components - Pod, Manager, API, and Browser. 

Individual Didgets are stored within a logical container called a 'Pod'. Each pod is capable of storing large numbers of all kinds of Didgets.

The Didget Manager is the control software that manages all the Didgets within a pod. It is responsible for ensuring security, integrity, and organization of all data inside a pod.

The manager has an API that applications can call to create new Didgets; modify or delete existing Didgets; or find any or all Didgets that match specific criteria.

The Didget Browser is an admin tool that calls the manager API to demonstrate the power and speed of the system. It can do some really useful tasks like create and query relational database tables; organize file data like photos or documents; or index text files to find certain words or phrases. This GUI application is still under construction, but has several useful functions completed. The Didget Browser is designed using a tabbed interface to let users view and manage Didgets within one or more pods. While this first version was written for the Windows platform, all the code will be ported to Linux and MacOS in coming versions.

The whole system is currently in beta test and is available for download by anyone who wants to try it. Feedback on bugs, problems, or feature requests is greatly appreciated.


Big Picture

Didgets is much more than just another file system, database, or NoSql solution. Its internal algorithms enable it to manage all kinds of data with speed and efficiency. Traditional unstructured data (i.e. file data) can be tagged and searched thousands of times faster with Didgets than with other file systems. Database queries execute on average about 4x faster. Searches against semi-structured data are also very fast. The system is a better file system than file systems; a better database than other databases; and a better NoSql solution than other systems. This allows it to handle all kinds of data within a single coherent system. Security measures, replication, recovery, and search can be done against all data in a unified manner.

The system is designed to be the building blocks to form a global, distributed data management system. If successful, there might one day be millions or even billions of Didget pods around the world, all connected to each other via the Internet to form the 'Didget Realm'. Like the 'World Wide Web', which consists of billions of web sites that can link to one another using hyperlinks; the Realm would consist of individual nodes (pod, manager, and API) that can each provide real value on their own, but could share information between them to add even more value.

Each node in the Didget Realm is a member of a specific 'Domain'. Nodes in the same domain can share more information with each other than can nodes in different domains. Things like security, replication, and synchronization are managed at the domain level. An individual or a company might have many nodes within their domain.


Browser

The browser is a Windows GUI application built using the Qt framework. It has tabs and uses a mouse driven interface. There are a limited number of tabs that have been enabled by default. Use the Settings->Choose Profile menu option to enable more tabs. Some of the tabs are still under construction and have limited functionality.

Pods are listed in the window on the left. If there are more than one pod, select a pod by clicking on it with the mouse to have all the tabs reflect Didgets within that pod. Any changes made on any of the other tabs will be reflected in the currently selected pod.

Each tab deals with a subset of Didgets within a pod. The 'Databases tab' for example, only shows Didgets that are used to form things like databases, relational tables, or database connectors. The 'Tags tab' likewise only shows the Didgets that represent the tagging system.

If you hover the mouse over a window or item on a tab, it often pops up a 'tool tip' that explains what the item is or what you can do in the given window. Most windows within each tab has its own context-sensitive menu that is activated when you right-click the mouse. There are often menu items to create new Didgets, edit existing ones, or display even more properties.


Pods

When the browser starts, it loads any pods listed in the Pods.json file. If there is more than one domain, it will show each domain within its own tab. All pods within the current domain are listed in the window on the left. Hold the mouse over a pod to see some of its properties or right-click on it to get more options.

Each pod has a distinctive type (FOLDER, FILE, FILE_SET, PARTITION, PARTITION_SET, or MEMORY). With the FOLDER type, each Didget within the pod is stored in a separate file under the folder path specified. For the FILE and PARTITION types, the container is a virtual disk made up of 4K blocks. For a 100GB pod, the FILE type is a single 100GB file. For the FILE_SET type, it is broken down into 25 separate 4GB files. This would allow a pod to be backed up to removable media that only handles smaller files. 


Didgets

Each Didget in the system has a distinct type, but all Didget share the same basic structure - metadata record, optional tags, and an optional data stream.

Metadata Record: Every Didget has a small, fixed-sized record that is stored in a system table. In this record, is the Didget's unique ID (a 64-bit number); its type (e.g. SCHEMA) and subtype (e.g. DB Table Definition), and other attributes. Internal information about the size and location of its metadata stream are also stored here.

Tags: Every Didget may have one or more optional tags attached. Each tag is a simple key-value pair where the key is the ID of the Didget it is attached to and the value is some number, string or other data type. For example, a tag might be attached to a Didget (ID = 123) that is a photo. This tag might be .device.camera = 'Cannon EOS'. Another tag might be attached to that same photo that is .person.firstName = 'Bill' to indicate there is a person named Bill in the photograph. Tags are defined using a two level classification system. For example a 'person' is one level and 'firstName' is the second. Tags are like columns in a relational database table. They must be named and a data type assigned before they can be used. The browser application has a set of pre-defined tags that it creates for any new pod, but it lets users define additional ones too.

Data Stream: Every Didget may have an optional data stream that contains anywhere from a single byte to trillions of bytes. For example, a jpeg photo might have several MB of photo data in this data stream. There are two general types of data streams,'Managed' and 'Unmanaged'. File Didgets have unmanaged streams so applications can directly write any data they like to the stream. All the other Didget types have managed streams so applications must use the manager API to manipulate items in this stream indirectly.

The type of Didget determines if its data stream is managed or unmanaged by the control software called the Didget Manager. A 'File Didget' is unmanaged, so like a file system, the Didget Manager treats its data stream like a 'black box'. The browser application might be able to display its contents if it is a photo or text document, but it does not control what information can be stored in it. The managed Didgets (e.g. Schema, Tags, Tables, etc.) all have their data streams strictly controlled by the Didget Manager.

Since all the managed Didgets have a distinct type and function, they have additional API calls that differ from the general Didget API calls. For example, a 'Schema Didget' will have calls to get or set a schema record from its data stream. All Didgets have calls to add or modify tags, get its ID, or determine the size of its data stream.

There are over a dozen different kinds of managed Didgets. Some of them have yet to be fully implemented, but here is a list of the most developed so far.
1) Set Didgets - These contain the IDs of other Didgets. They are used to create things like databases, folders, play lists, and photo albums.
2) Tag Didgets - These are key-value stores used to attach tags to other Didgets or form columns in relational database tables.
3) Schema Didgets - These are used to define the data stored within the Tag Didgets. Data types can be strings, numbers, booleans, dates, etc. Each database table has its own Schema Didget.
4) Marker Didgets - These are used to record timed events within a pod. They can be used to monitor performance of lengthy operations.
5) Trigger Didgets - These perform a specific task. They are used for drop zones on the create tab and queries on the database tab. 
6) Workflow Didgets - Triggers can be chained together in one of these workflows to perform a set of tasks in an assembly line fashion.


Databases

The relational database tables with accompanying analytic functions are the most developed feature of the Didgets platform so far. The databases tab in this browser encapsulates all the database functions. 

On the databases tab, the smaller windows on the left show the Didgets containing specialized database information. There is a window for table definitions; another for connectors to external databases; another for database tables; and yet another for queries against the currently selected table. The bottom window shows result sets that are currently in memory. The larger window on the right side shows the results (columns and rows) of the currently selected result set.

Each pod can have one or more databases defined. The system creates a "Default Database" when the browser is run for the first time. A 'database' is just a Set Didget that contains the IDs of all the Didgets that are members of that database. To create a new database, just click the + button and give it a name. You can switch back and forth between databases (if there is more than one) by selecting one from the drop-down list.

To create a table, a definition is needed. A definition is a description of all the columns and their data types (i.e. Schema). The default database has a couple definitions already defined - a 'Friends' definition with 6 columns and a 'Customers' definition with 10 columns. The 'Sample Data' folder has a couple files, Friends_250K.csv (it has 250,000 rows) and Customers_50K.json (it has 50,000 rows) that can be 'dragged and dropped' directly on their respective definitions to create tables. The user can also create other definitions by right-clicking in the window and creating a definition from scratch or by dropping any .csv or .json file in an open space in that definition window.

There are also three 'connectors' that are created in the default database that can be used to import data into a table directly from an external database. The user will need to edit a connector (by double clicking on the connector) to update the host name, the database name, and any user credentials. Once the connector can successfully connect to the external database, the user can create a 'View' into that database by right-clicking on the connector and choosing the 'Create View...' option. Select the table or view in the external database and press the 'Set up View' button to specify query options. Once the view is set up, it will appear below its connector. To create a table from one of these persistent views, just double click it. This will pull all rows from the external database that match the query into a table within Didgets.

All tables within the database are listed in the third window on the left side of the databases tab. To perform a 'Select All' query on a table just double click it. Each table can also have a set of persistent queries that will display in the fourth window when the table is selected. To create a new query, just right click on the table and select the 'Create Query...' option. To execute a single query, just double click on it.

A 'Result Set' is an in-memory structure created when a table is queried (either by double-clicking on the table or by double clicking on one of its persistent queries). Deleting a result set does not change anything on disk. It just frees up any memory in use for it. It is also possible to display up to four result sets simultaneously by choosing a different layout in the bottom right corner of the databases tab.

The filter at the bottom of the databases tab allows the user to further filter the rows in the currently displayed result set. Just set the filter options and enable it by checking the filter checkbox. You can also filter rows by double clicking on any value in any column displayed. This will pop up a window with only rows containing that value.

The result set display window also has menu options that can be activated by right-clicking in the window or on one of the column headers. Many options exist for selecting values, updating existing values, and exporting items to the clipboard or a file on disk. Simply clicking on a column header will also sort the table based on values in that column.

The way data is stored within Didgets, allows very fast analytics against any table or query. If you right-click on any column header, there is a menu option to view all the unique values in that column and the number of rows for each value. There is also a 'View Formats' option to check values for anomalies in formatting. For example, a 'Social Security' column may need all numbers to conform to a 999-99-9999 format. This option will convert all characters in a string value containing a digit (0-9) to a 9; all uppercase letters to an X; and all lowercase letters to an x. This will let you quickly find errors where someone entered an uppercase letter O instead of a zero, or a lowercase letter l instead of the number one.  

Another important feature of Didgets is its ability to create three-dimensional (3D) relational database tables. Because every column in a table is an independent key-value store, this means that it can have multiple values mapped to each row key. For example, with tables in conventional database systems, if a customer may have more than one phone number either multiple phone number columns must be specified in the table schema (e.g. home phone, mobile phone, office phone) or a separate table must be designed to only hold phone numbers. Each query in such a system that wants to show all phone numbers must join the customer table with the phone number table. This is not necessary when using Didgets instead.

With relational tables created with Didgets, every column (except a primary key column) has the potential for many different values for each row. Therefore; a table has not only width and length but also depth. A query to find rows where the phone number starts with area code 801, will return each row where any of its phone numbers meets that requirement. This flexible structure makes Didgets capable of handling data that is semi-structured instead of just highly-structured data and allows it to offer features normally found only in NoSql solutions.

The database tab also has a number of other data functions. It allows for the quick creation of pivot tables against the largest relational tables; charting (pie charts, bar graphs, etc.) for data sets; and many data transformation functions to clean table data. These options are usually found in contextual pop-up menus.


Files

The original intent of Didgets was to replace conventional file systems. Traditional volumes could safely store millions of files of all types and sizes, but made finding particular sets of those files very cumbersome. For example, a 10 TB drive might have 100 million files on it of which 20% might be photos. Finding all the JPEG photos out of those 20 million photo files could take a very long time. Tagging those files with meta-data so that the user could find all their Hawaiian vacation photos for example, was likewise very hard to do.

Didgets was designed to store hundreds of millions of photos, documents, videos, and other kinds of file data and easily tag each file for very fast queries. Each 'File Didget' is like every other Didget in the system in that it can have many meta-data tags attached to it. The system allows the user to find all files with particular tags in just a second or two, even if there are hundreds of millions of Didgets in the pod.

The 'Create Tab' in the browser is the import method for converting traditional files into 'File Didgets'. On this tab, there is a list of 'Drop Zones' that define what kind of files to import. There is a small set of predefined drop zones that are installed the first time the browser runs. Just drop a file, a set of files, or a folder from the file manager on one of these drop zones to convert the files. All files that match the drop zone definition are imported and file system meta-data is automatically created as tags by default. One of these tags is a checksum or 'SHA-1' value of the data found in the file's data stream. This tag will help the user find all duplicates of files in the system. 

The 'Query Tab' in the browser is how subsets of all Didgets in the pod are found. Like the drop zones on the create tab, there are a number of queries that have been created by default. Just select one of the queries to execute it and find all Didgets in the pod that match. A new query can be created by right-clicking in the query list window and creating one.

When a query is executed, all the Didgets that match are listed in the larger window to the right. Double-click on any item to see its properties (attributes, tags, and in some cases a view of its contents). In the properties dialog, if one of its tags is double-clicked it will find all the other Didgets in the same query that also share that same tag. There is also an option to find all Didgets in the pod with that same tag, even if they do not match the query.

The user can also attach additional tags to any Didget on the query tab. Just check the 'Show Attach Tags' checkbox in the lower right corner of the tab.

Each 'File Didget' is assigned a specific data type based on its extension. For example, files that end in .jpg, .bmp, or .gif are assigned the 'Image' type. The Formats tab in the browser lists all the predefined file extensions. Many more need to be added, but the user can specify their own extensions on this tab.


Indexes

Didgets has the ability to create indexes based on the contents of files. Select a set of files on the query tab and choose the 'Create Index' menu item. This will parse through each file and create an index of all the words found. There is an indexes tab that will let the user find all occurances of each word within an index.

The functionality implemented so far is fairly limited, but there are lots of plans to enhance this feature.


Sets

There is a whole category of Didgets that contain the IDs of other Didgets. We call these Didgets 'Set Didgets'. They are used to create lists, folders, databases, photo albums, play lists, etc. There is a Sets tab in the browser which will display each of these Set Didgets and list their members.