Summary: “Entity” and “Object” are the keywords of this article. The differences between these concepts are well known, the catchword for that is “impedance mismatch“. This text, however, highlights their basic affinity and shows how this point of view may contribute to conceptual clarity and a reduction of modelling costs in IT-projects.
“An entity is an object that doesn’t know how to behave”. This is just one way to answer when asked to describe the connection between objects and entities. You may get other answers, a smirk for instance. Between object and entity there seems to be some sort of divide: “Entity” smells stale whereas “object” indicates modernity. Most current ideas and discussions in the domain of information technology use an object oriented approach and a vocabulary which differs in many aspects from the mindset associated with the notion of “entity”:
Table concerning „Impedance Mismatch“
|
The table names some of the differences while Charles Bachmann’s one-liner at the beginning stresses what both concepts have in common. The latter is what this text tries to explore. (Bachmann is known for one of the first modelling tools.)
A thought experiment: You are to build an application within a domain of your choice. Sketch a class diagram for that application, and then an entity-relationship-diagram. Or the other way round – start with your favourite approach. Then compare your classes with your entities.
My result: There is considerable agreement in an essential part of the diagrams. There are “boxes and lines”, there are “numbers” along the lines. Some of the object-oriented boxes carry the same headlines as their entity colleagues. In those places the two networks of lines fit very well together. So does much of the content of the boxes: the data elements. The classes, however, contain more entries than the entities: The latter lack the “methods” or operations and so are unable to express behaviour – that is what Charles Bachmann’s one-liner says.
Now let us have a look at those parts of the diagrams where there is no agreement.
Object but no entity. Consider the objects without corresponding entities. These are meant for processing data or interaction with the user. In the object oriented approach the window on the computer screen is an object, too. Entities, however, usually end up in the database and that is why such objects do not show up in the entity diagram. So we can say that the class diagram covers more ground than the entity-relationship diagram does.
Entity but no object: Conversely, you will find entities, but no corresponding object – at least at first glance. Looking closer you will find the missing data as parts of other objects. The object oriented approach allows nested data, whereas data in entities are flat. Nesting of data, however, comes at a price. We will return to this topic.
Objects are generalized entities: That is what we can conclude from what has been said so far. Small wonder: Entities were invented years earlier than objects. But the concept of “key” has been lost on the way: An object is an entity looking for its key …
Keys help clarify things: A key uses subject matter information to define what constitutes a duplicate and what does not. Whoever has engaged in defining keys with the future users of an application will be glad to have a coherent way of writing down the results. He will surely want to look them up afterwards. And he will be grateful for the effort to have taken place before much code has been written – especially if he is responsible for the budget.
Objects are numbered serially, so the above discussions will probably pop up later, perhaps too late.
Later in this text we will come back to this topic from a different angle. For the time being, we will look at practical reasons to wish for continuity between object and entity.
An extreme Example of “object equals entity”: Some time ago I had the opportunity to attend a presentation of a tool that generates code. You specify a graphical model of your application, plug it into the generator, which presents you with a working Java application and a restricted set of operations (create, retrieve, update, delete). The participants of the presentation were asked to vote for new features. The result: A database reverser. Its task is to extract the data structure from a live database and to convert it to part of the input for the generator. Not much more is needed for the generator to produce its Java-application. You can use the generated code as a basis to fill in the missing operations manually.
Whoever asks for such a database reverser must concede that for practical purposes it is quite irrelevant whether the box in the diagram is called entity or object.
Data modelling with object oriented tools: I work with programs that transfer models between modelling tools. Most of the class models on my desk are in fact data models conceived in an object oriented modelling tool. Why is that?
Object oriented tools are routinely used during conceptual work because – apart from organizing your data – you want to think through your processes. That done, the database gets its due, a task where the object oriented tools lack qualities. That is why after a while projects want to transfer their classes into a database design tool. The result of the transfer is useful as a basis for database design.
At this point the mental divide between object and entity gets in the way.
UML has no concept for “key”, (among others). The standard for object oriented modelling is called UML – Unified Modelling Language. It was preceded by a host of object oriented methods and wants to be their generalization. That accounts for the letter “U” in its name. Entities and related concepts have been treated somewhat harshly by UML, they simply don’t exist.
Additional expense because of constructs missing from UML. If a UML-tool is used for data modelling, existing UML-constructs have to be used in ways for which they were not intended. There is no “official” way to do this and so each class model that needs to be transferred into a database design tool looks different. This is obvious at the level of subject matter – but it is worse than that: The syntax is different. That is why more than just pushing the button is needed to transfer the model. The peculiarities of the model must be identified and a script must be written to neutralize them. Only then can the button be pushed.
Extending UML. There is some consolation: UML wisely refrains from completeness, see above, but it opts for extensibility as the superior concept. If constructs for a special purpose are missing, e.g. a key, then UML can be extended. Essentially the interested persons, projects or organizations themselves create the missing descriptive elements, and bundle them to form a so called “UML-profile”. After loading it into a modern UML-tool they can use the formerly missing constructs – e.g. the key of an entity. In organizations the profile is distributed together with the tool, which means that the user need not be aware of working with a profile instead of native UML.
Problem solved? Not quite. Each special purpose, each organization may create its own profile and does so. A project has to deal with various purposes and various organizations are engaged. Each occasion builds its own Babylonian Tower.
Database invisible? I can hear your objection. Who needs a database design tool to design his database? No need to bother with that. We have the data structures within the classes and we have tools to build the database on that basis.
It is in fact quite feasible to build a database application without ever laying hands on the database. This is an efficient way to build a single application. But in big organizations, you will very rarely find isolated applications.
If you have more than one application running on the same database things get a bit complicated. Nesting of data in objects is done to speed up operations, so different applications will need different ways of nesting. Having only one database, you will want a design that neutralizes the different data structures. That is what traditional database design does.
To sum up: Although continuity between entity and object is something natural and to be wished for, in practical situations there are problems. The reason? No idea. I consider it just an error that has not been corrected in time – just like the positions of letters on your standard keyboard.
Practical suggestions:
- When modelling persistent objects, try to find a key and check whether your users agree with your decision.
- If more than one OO-application is to run on the same database make sure that it is properly normalized.
- If you are modelling with an UML-tool and wish to design your own database, for instance in a multi-project-situation, try to find a UML-profile for data modelling that is already established in your organization and use it.
- Nudge projects working on “your” database in the direction of using the same modelling method as you do and be prepared to make compromises.
JJG