It’s Not Your Code, It’s Your Data…

Ricardo Griffith
6 min readDec 6, 2020
Image shows listing of comma-delimited values.
Image from unsplash

Opinions expressed are solely my own and do not express the views or opinions of my employer.

Imagine you just launched a new website along with an equally awesome mobile app. You stand in awe of all it’s glory, showered in positive feedback and posts. The journey was difficult, all the work developers put into it, the meetings, testing, re-testing, documentation, design tweaks and all the other work put in by your dedicated staff (or yourself). Undeniably, you and your team made a remarkable product and service —you think, life is amazing!

Then, without warning, your sea of calm and amazing is rocked by waves of distain and complaints! Social media is a powerful force — offering the world a wide influence on a enormous population of connected users. In other words, ALL news travels faster these days than a few years ago! Unfortunately, bad news seems to travel even faster than the good. Unluckily for you, society typically obsesses over the less-than-stellar and spread it like nobody’s business!

You start asking yourself questions like, “How did you get here?” Perfect programming does not mean perfect data and data is often overlooked for being a source of potential disaster. Data plays an important and vital role in the success of any app or service. My article highlights some common issues developers may encounter.

Missing Data

Image from unsplashed

My Country missing from online forms inspired this article! You may be surprised to learn that even well-known social platforms, services, and mobile apps can fall victim to this issue. One of my biggest frustrations is attempting to sign-up for a service or whatever — only to be stopped by missing data. I live on a small (but well-known) island (not going to say, hint —think three-sided shapes). My country is most often missing from site taunting they are worldwide — do not list my country in their country dropdown. I’m not talking about purposely excluding the country, my country is simply overlooked. How do I know this — I contact the service to inform of my missing country and most add the country to their list shortly after!

There is a very simple and most-effective fix — obtain your country list from a standards-based source! (Duh!) Where you ask? Well, there is a little organization you might have heard of: the International Organization for Standardization (ISO) that has developed over 23,000 International Standards and all are included in their ISO Standards catalogue. What do you know, a quick search on their site reveals a standard listing of all the countries in the world: https://www.iso.org/publication/PUB500001.html.

If you live in any other country other than the United States of America, you probably can relate to the joys of filling out a simple address when registering for whatever. You are sometimes impressed with the site’s ability to recognize your country (too soon?) and even further amazed by the dynamic form controls when you select your country. At times, I’m amazed that they have auto-complete or a dropdown list (with the correct values). Then all that amazement is completely erased when you click that submit button and discover that you can’t continue because you need a state to continue. But wait, your country doesn’t have State?!

You might at this point be thinking this is a programming issue and you might be right. But, here’s the debate, the underlying reason the user cannot submit the form is missing country data. Further, if the programmer had the data to switch to —the feature would have been included in the application (if a tree falls in the forest, does it make a sound). Ultimately, countries such as Canada, Brazil, England, and India do not have states. If users in those countries are unable to sign-up for your service the problem becomes a barrier to potential profit and poises your product and/or service for a weakening of your reputation and maybe even your brand.

Configuration Data

Image from unsplashed

The least thought-about and tested is data. The next type of data is commonly not thought about when testing an application but is vital to its operation — configuration data. While this data is easier to correct if misconfigured, sometimes, it is not obvious that the wrong data in the configuration is the reason for a site or app’s troubles.

Over the years, there have been some fairly large organizations that suffered from misconfigurations. For instance, in November of 2017, the Australian Broadcasting Corporation had a security misconfiguration that results in leakage of hashed passwords, keys and internal resources (Sukianto 2020). Similarly, the same year, Accenture experienced a similar misconfiguration resulting in Authentication information, including certificates, keys, plaintext passwords, as well as sensitive customer information (Sukianto 2020). The significance of the similar security incidents was their reliance on Amazon’s S3 data storage technology and keeping mind that this type of misconfiguration is the most dangerous allowing attackers to obtain sensitive data or cause other havoc for site or app owners.

Given the software development patterns and the flexibility of using configurations, are the best way to avoid an application reentering the development phase to correct a value. Extra care is recommended for developers using configuration values. Using a reviewer or reviewer could reduce the likelihood of a mistake but then we are only human…

Dirty Data

Image from unsplashed

If your application relies on data entered by humans, chances are you will encounter inaccurate, incomplete or inconsistent data a.k.a. dirty data. The problem snowballs as more data is entered\stored making it more difficult for developers to troubleshoot production data issues. Dirty data could originate from a number of sources and cause all types of havoc for application owners and users.

Dirty issues can be complex and difficult to solve. All offending data must be identified and its source determined in order to remediate the immediate issue. This could be particularly difficult when dealing with high data bandwidth applications. Even if the all the data has been identified and the cause determined, the problem could still be immediately fixable.

One work-around can be deployed to help mitigate the issue. Consider the situation where end-users are entering in the offending data — in this case — developers might take a crack at solving it. Validation may be deployed to prevent the dirty data from entering a system unfortunately, not all data can be validated.

Think about user comments — they enter some text then submit the form. You might be able to validate the length, check for spelling and “bad” words. However, you might not get an offending statement unless you training some sort of Artificial Intelligence (AI) to process comments, but that of course comes at a cost. The remediation of dirty data could eat away at profit margins and increase expenses by deploying additional initiatives to handle data issues.

Final Thoughts

Managing data is not a simple task. Small to medium businesses that don’t properly manage their data their data potentially imped their growth and profitability. Larger businesses usually have the man power to manage their data have similar difficulty for different reasons. Some of them operate in highly auditable and regulated industries and subject to fines for non-compliance with regulations. Developers in larger organizations often only wear their developer hat, meaning measures are often put in place to restrict accessing production data. This restriction is intended to protect business and customers from the misuse of that data, but also severely restricts checking or validating data. Managing large amounts of data can prove to be expensive which could influence the cost of product and services.

References

Sukianto, Axel, June 19, 2020, https://www.horangi.com/blog/real-life-examples-of-web-vulnerabilities. Horangi Cyber Security.

--

--

I am a passionate technology leader, entrepreneur, husband, and father who loves to help others through collaboration, writing, and mentoring.