Part I - Data - Timo Heiten

Abstractions and Data

The road to becoming a full blown programmer is not easy, yet also not impossible. If I could do it, probably everyone can do it.
In this series of posts I will get the most common terms out of the way. I am going to cover a lot of ground, yet it will not cover every inch that there is. But at the end of this posts, you will know the most important concepts, so that you can focus on learning the tools one after another to reach your goal.

This parts are focused on a single aspect of programming. It is like you should structure your program, each post is concerned with a specific topic.

First things first

To solve a real world problem with software means to use abstraction to model the problem. And by applying this model in sense of data and information that are structured and grouped in a way that can solve the given problem.

So what exactly is abstraction? Abstraction means to find commonalities and rules when you look at a given problem space. Some of the most cited examples is a car. You know how to drive a car, what it generally does and so on. Yet you probably do not know all the nitty and gritty details of its implementation. In much the same way you try to model your problem space.

Other examples in the world of computing are: creating software to change the ouptut of the machine is an abstraction over changing the hardware directly (as in the first computers). Writing in higher level programming languages is an abstraction (like C#, Python, Java etc.) over writing in C, Assembler and languages that deal with machine level features such as allocating memory directly. Each abstraction layer slows exuecution of the software down, yet makes it much more handable and increases development speed.

The same is true for your application. With each layer of abstraction to model your problem space, your execution time slows, yet your development time goes up. Also it lets you concern only with the logic you need to solve the problem at hand.

So Abstraction is the way to go, when developing a program and solution. You do not want to have all the different things your application does scattered all around. This is because code is usually many times more read than written. And believe me, you want to understand what the code does in two weeks from now. And abstraction helps you with that.

Also most problems you’ll need to solve are complicated, and breaking this down into working software can become quite complex. All this complexity introduces dependencies between all the parts that make up your solution.

So at its core, programming is solving a complicated problem by applying abstraction to manage the inherent complexity of the program. Where as complexity means dependencies between the parts you created for your application.

It is all about Data

What do I mean by that. All the objects (in an oo language) you create have a single purpose: they represent information in the way that they store data. This data, is used to solve your task at hand. Objects do also have behavior, yet that is there to act on the data they represent.

To achieve this you have several data types that can be used to store different informations. Also there is the concept of data structures that tell you how to arrange data in a given construct that helps to organize the information.

DataTypes

Virtually every programming language has a similar set of basic or primitive types. And for the OO languages, you can create your own types, which we will look into in the OO section.

The primitive types are strings and/or characters to represent a human readable text. This is probably the single most important Datatype in any language because it is so versatile and flexible. Also when you store your data to disk, a database or any other storage facility, often the date is stored as a string to be accessible for humans.

The next basic datatype are numbers. There are Integers, and floating point numbers. Integers are pretty much clear, and floating point numbers come in different flavors (float, double, decimal etc.) which mostly deal with their precision and byte size.

The boolean Datatype is either true or false and is used in conditions, to control the flow of the program.

Datastructures

To organize data, there are some common structures that come up very often. I will give you an overview, and I am going to go into more detail in future posts where I explain how they work within the .Net framework with examples in code.

The first is an Array. An Array is used to store another datatype in a predefined size. Think of a beer crate, that is an Array of the size of 24 (at least in germany) with the datatype of beer bottle. This example is not perfect, because it might store other things in that crate, yet you get the point.
The data in the array can be accessed with an index. Whereas the index is a number that is between 0 and the length of the array minus one. So for the beer crate it is 0 to 23. So if you access the first beer it has the index of 0. The last one in the crate will have the index of 23.

The next structure is a List. A List is similar to an array, yet you do not need to predefine the size, as it grows with the adding or shrinks with the removal of items. Most implementations of Lists in programming languages are utilizing an array, and grow its size automatically by copying the array and assigning its values to new array.

There is also another List, where each item in the list holds a link to the element before and after it. Thats why those are called
linked lists. You can imagine those linked lists as a polonaise. Each person has the hands on the shoulder of their person in front of them, and has the hands of the person behind them on their shoulders.

The next structures are queues and stacks. A stack is a LIFO structure, which stands for Last in first out. And a Queue is a FIFO, First in first out, structure. A stack can be imagined like your notes and forms on your desk, that are piling up again. You would work from top to bottom one after another (I know that you are not constrained by paper, and could change that order, but you know it is programming, and that is about abstraction 😉 ) A queue is lexactly what you imagine. Like the queue at your favorite grocery store, first person in the queue is served first, then the next one and so on.

Another basic data structure is the Hashmap. Here you associate two types of data with one another. The first type is called the key. The second one is the value. You use the key to look up the value. You can imagine this process like a Dictionary, where you look for a word that you want the meaning of. And that is what it is aptly called in .Net.

A Dictionary uses the calculated Hash from the keys that are stored inside of it. A Hash is a number that is generated by a given algorithm, and in .Net you can wirte functions for your user defined types that do exactly this.

This key lookup procedure is very fast, and therefore a dictionary is extremely fast in terms of performance. But the downside is, taht it is unorderd. When you walk through (loop, iterate see below) a dictionary, you will not get the Items in the order you insterted them. Dictionaries are very useful for lookup of data, cache structures, and so on.

Another Data structure that uses hashes is a Set. Sets are like in mathematics a collection of distinct objects.
Sets also can use set operations like Unions, intersections and Complements. A Set is similar to a Dictionary in the sense that it uses hashes to store its values and that iterating is unorderd.

A Tree is a structure where you have elements that are associated with elements that are connected to it. Each element is
called a node. Nodes are connected via edges. that are below it. A tree has a root node.
There are different tree structures, for example a binary tree has nodes, that have at most two children. An example for a tree like structure is html/xml. A Graph is data structure, that relies on the mathematical graph construct. see wikipedia

Datastructures can be used to associate, group and retrieve data in meaningful ways. They are mostly collections of some sort or another. To make the best use of collections you have several options to access data:

First you can access the data directly, when you know where the data is located (like you know the index, or you know the given key).This is not always possible with each collection (e.g. Stack does only allow to get the topmost item).
Another way is to check if the item is contained in the given collection. This can be implemented in the datastructure itself,
or you can iterate through a collection. Iterating over a collection simply means you pick one item after the other
and inspect the state of the given item or perform some logic on it.

How do i store my data?

When you write a program and execute it, your data and information is finely grouped and aggregated and does everything you intended it to do. But what happens if you shut down your program? All the data, held in form of objects in RAM is gone. So you need to store it somehow for future use.

Serialization

The concept to convert your data in your application to storeable bits and bytes is called serialization. The reverse process is called deserialization.

The most common ways to store serialized data are following:

Databases
Files (text or other format)

Databases are often the first bet, and there are lots of different flavors, where as the most commonly used are relational databases. Those are the ones meant when you hear people talking about SQL. The relations part are the tables that are used to store the data in columns and rows. I will talk about relational databases and the mismatch that they pose to objects in general and modern computing aspects in particular.

There are also other databases, like object databases, documentdatabases, graph databases and much more. I will look into those in my upcoming section of this blog about data storage mechanisms.

You can also use files to store your data, and in lot of simpler programs that makes a lot of sense. To save your data in a file, you serialize your objects into the desired format. Examples for commonly used textfiles are XML, JSON or CSV files.

All those formats are human readable with a simple text editor. But you might have opened a file with a text editor and it was full of weird symbols. That was a format that can not be interpreted as human readable text (or at least not with a standard text editor). For Instance jpg and other imagefile formats, or some proprietary.

Similar to that, you can store your data in a schema that you yourself define. And this can also be totally sensical for your application.

Storage of data is exactly like anything in programming, use the tool that works best for the job.

Where do we go from here?

Part II: Basics of programming lanugages
Part III: How is a program created? Build and deploy
Part IV: OS and other helpling
Part V: How to improve as a Coder.

Part I – Data

Published by theiten on 04/08/201804/08/2018