I spent the better part of 40 years working with data. As a scientific researcher and then as a software developer, my work was all about acquiring, managing, analyzing, and reporting data. That experience taught me lots of lessons. I shared many of those lessons by teaching database design and development at the college level and by writing books on the topic.
Chances are that as a professional in the information age, you do a lot of work with data as well. Here’s the most important thing to know right off… this doesn’t come naturally. Collecting and storing data properly is deceptively difficult. I know because as a software consultant I was called in to fix innumerable systems that were failing catastrophically because they were grown organically by clever, smart people using spreadsheets with complete confidence they were doing a great job.
Even good data gets ruined if it is badly stored. This applies to small projects as much as large ones. Poorly stored data becomes the “garbage in” for your subsequent analyses and reports. Therefore I thought I would share a brief summary of some of the most important “do’s and don’ts” when I comes to collecting and storing your data. I won’t try to explain or justify all of these rules, but trust me, if you follow them your work will be far more efficient and effective:
- Always create a key field. This field should uniquely identify each record. I recommend you identify each record with a simple incrementing number, one that has no real-word meaning and that you never report to external users of the data.
- Don’t create derived fields. It is generally not good practice to store fields of data that are derived or calculated from other fields. It is better to compute these when needed to ensure the values are current.
- Make sure every column or field of data is atomic. That is, each field value should contain only one irreducible piece of information. You never want to put say, two phone numbers in one field. Store only one piece of information per field by creating separate columns if necessary.
- Think ahead carefully about how atomic a field needs to be. Will I need to break up phone numbers into separate parts? You have to consider how the data will likely be used and the best choice is still not always obvious. Do you ever need to know the street name in an address? If so, you might consider storing the house number and street in separate fields as this is extremely difficult to parse. Will you need the month from a date? You probably still don’t need to store that separately since it is extremely easy to extract months from dates when needed.
- Avoid series of columns. You don’t want Phone1, Phone2, and Phone3 fields. Better to have a separate “phone numbers” table with a record for each phone number along with the type of phone linked through a key field. A relational table like this is much more flexible, efficient, maintainable, and expandable.
- Don’t stick data where it doesn’t belong. When you don’t have a column, it is tempting to just jam some values into a column where they do not really belong. For example, you have a pregnancy field that does not apply to men. So why not stick prostate indicator into this column in records for males? This is a major no no. Every value in a column should describe only one attribute.
- Don’t append or prepend. Always avoid appending or prepending data values to add further information. For example, adding “fax” or “mobile” after phone numbers. Instead add a separate phone type column or better yet create a separate relational table to store these values.
- Don’t vary record types in a given table. You can have a column that indicates whether the record is a Parent or Child, but you should not then change the meaning of subsequent fields depending on whether the record belongs to a parent or a child. If parents and children require different information, create separate tables.
- Never use “special” values. Never put in a reserved value like “NONE” or “1/1/1999” “to indicate some special condition. This is a very bad practice that inevitably results in errors, hair-pulling, and tooth-grinding.
- Give your columns clear and meaningful names. Don’t use cryptic names. If the column contains first names, simply call the column “First Name.” This makes it completely unambiguous, reduces errors in analysis, and makes reporting clear, consistent, and easy
- Format your data consistently. For example, store SSN with hyphens or without as you prefer, but don’t mix these. With hyphens is preferable since it is best to store data so that it does not require formatting upon reporting.
- Use Null and Empty values correctly. -Assuming your data storage system supports these, use them correctly. Null means that value was never entered. Empty means it was entered, and it was explicitly entered as empty.
- Allow Null and Empty responses on your forms. Forcing users to put in an answer they don’t know yet just to save the form only opens the door to garbage data. If you force an answer too early, you may be forcing the user to make something up just to move on, and later there is no way to spot this as “fake” data. Better to wait until the last possible point prior to finalization before verifying that all data entry is complete.
- Don’t allow bad values to be entered. Provide dropdown or selection lists wherever possible rather than text data entry. Text (freeform) data values are usually garbage data values that are very problematic to search, analyze, and report.
- Don’t duplicate any information. No information should be repeated in rows. For example, you should have one parent table with customers and another child table with orders records for each customer. You should not have one “flat” table which repeats the same customer information on every order record for that customer. This is one key difference between simple “flat table” data storage and a normalized relational database.
- Break the rules when needed. The only thing worse than not following these rules is mindlessly following them. Break these rules only when it makes good sense to break them. You can’t make that assessment if you do not understand them, and the reasons for them, intimately.
- Think long-term. Don’t assume you’ll never need your data again or that you will be the only one to ever look at it. You may know how you violated these rules, but others might need to understand this data in the future and they will not. Following these rules will ensure that this valuable data you collected is not wasted just because you could not anticipate how they might be used in the future.
Some of these rules simply cannot be achieved using a single (denormalized) table like you typically create with Excel. They require multiple tables following a normalized design structure of parent and child tables.
If you don’t see any way you can avoid breaking these rules, then your data storage requirements probably exceed the limitations of a simple table. If that is the case, the fact that you cannot implement these rules should alert you that you need to consult a data management professional to produce an efficient relational database schema using SQL Server or some other professional database.
I’m flying to Vegas this weekend to catch “Pat Benatar and Neil Geraldo: A Very Intimate Acoustic Evening.” I have been a fan of theirs since the 70’s and am thrilled to get to see them once again. Unlike so many other older artists who simply run on autopilot, Pat and Neil have just gotten better and better with age, reinventing themselves in fresh ways while still remaining unapologetically true to their rock hard roots. No longer having megastar status has actually freed them to be truly great. Rather than being carefully manicured by the contrived glitz of big concert venues, they truly shine in intimate settings where they can stroll out on stage and warmly greet the audience wearing sweat pants and holding Styrofoam mugs of coffee.
It turns out that he or she is a Northern Flicker and they are well-known to north westerners because they are regionally infamous little drummers. Of course no one can say with certainty why they do this but we can speculate. They are peckers by nature. They peck out hollows for homes with their beaks, they peck to find food, and they peck produce a unique sound that attract mates or communicate with them. Hey, they have a very efficient and powerful little hammer, and when that is all you have…
But this should come as no surprise. After all, all of religion is nothing but selling lies. It
Of course job loss is not the only concern or even the major concern. The number of people forced to retrain and change jobs is not included in these projections. Certainly jobs always evolve, but periods of rapid and widespread job disruption are nevertheless painful and difficult. Even those who can transition through the major upheaval being caused by automation and globalization may find themselves forced into jobs they no longer love. More importantly those new jobs may pay far less than the old ones. This loss of standard of living must be mitigated through responsible corporate and governmental policies.
These articles invariably cite scientific studies and statistics to support their claims. But those claims frequently go far beyond study design or the conclusions made by the scientists involved.
In science fiction, this kind of magic is routinely accomplished by means of technology. Upgraded hair-dryers transfer the person’s consciousness into a computer or some external storage unit. There it is retained until it can be transferred back to the original host or into some new person or device. This science fiction mainstay goes back at least to the 1951 novel “Izzard and the Membrane” by Walter M. Miller Jr.