View on YouTube here:
1. When we say "clean data" what do we mean exactly?
I think we can equate clean data to "quality data", meaning it is accurate, and there are no irrelevant, incomplete or redundant information. Data is written or it was keyed in the same manner/format.
2. What is data harmonization and why is it important?
I could also say that it is the process of bringing together your data of different file formats, from relevant varying sources and transforming it into one cohesive data set. It is important because it enhances the quality of data and it lowers the amount of time spent in data analysis. It's easier to get insights if data is harmonized.
3. Why are data processing procedures essential?
For me, for the very obvious reason that information could be retrieved easily anytime when data undergo processing. As the data continues to grow at unprecedented rates, if companies do not have a proper procedure in place – like how data is collected, analyzed, filtered, sorted, stored, and protected – it will surely compromise data integrity, and companies who experience data loss/breach always face legal consequences.
4. Why is it important to keep data up-to-date and clean?
It is important because having quality data helps an organization make informed decisions. It also increases the company's negotiation power to secure a deal, because I think there's nothing as convincing or as persuasive as having reliable data to back you up. Like in our case, I think it's easier for us to position ourselves when it comes to looking for a potential client because we know we have data that we can use to make a campaign/project a success. Data can be considered as one of the biggest assets any company could have. I think investing in someone or something (tool/platform) to manage and maintain data accuracy and cleanliness is very crucial because it helps companies create business strategies that will drive their business forward.
5. How long does a data set last before it gets outdated?
I think it depends on what types of data is being collected and for what purpose. However, it is important to keep in mind that data is a living and breathing organism – it is changing all the time. People change roles within the companies they work for, or they leave one company to work for another. Some companies are merged with other companies or they get acquired and then the company name changes. Companies move offices and their phone numbers and addresses change. You can't think "oh now that my data is clean I'm done." No! It's like when you clean your house. You have to keep cleaning it week after week, month after month. You are never just done.
6. What are some ways that your team builds new data sets?
Well I can't give away the "secret sauce" of how we build data, but let's say that we use a combination of robust algorithms and actual humans who will use online tools to source data continually overtime. Of course, that is then combined with the outbound initiatives we are doing in order to confirm the validity of such data before it is used for a particular purpose.
7. For you, what are some best practices when it comes to database management?
I think being organized is of utmost importance. Data tends to be complex so creating or having a standard format helps a lot when it comes to data retrieval or extractions. Whether it's the filenames, content, values … its best if it's written in uniform. How you select and segment your data is also another factor to consider. Having metadata, so the information you store is easily discoverable for future use, where you store and secure your data is very important, data backup should be prevalent as well. Smooth operations go hand in hand with accessibility to vital information at any given time.

 
											
				 
			
											
				





