It’s been one of those weeks when the same topics and questions find their way into my inbox from different sources. You know how it goes: you don’t hear about something for weeks or months and then suddenly you get a series of similar requests as if there was an epidemic. Must be the full moon, I guess.

Read more: DataMapper tidbits

The Post function

One such topic was the Post function in the Field definition section of the extraction step. Well… in all honesty, the Post function wasn’t the actual problem being asked about, but it just happened to be the solution to what was being requested. And looking at our blog history, I have to admit we didn’t shine a spotlight on that feature very often. More like.. never.

So before we look at a couple of use cases, let’s explain what the Post function actually is.

As you probably know by now, the DataMapper reads every single piece of data from any source data as a string. That’s because all data streams are ultimately composed of characters. Now some combination of characters like 12345 happen to contain only digits, and we humans understand that combo to mean that it probably represents a number. But to a computer, that combination of characters doesn’t represent anything, other than a string of characters that could just as well have been abced.

Yet even we humans sometime have trouble figuring out what a sequence of character means. For instance, what does the string 24/7 mean to you? Is it a mathematical operation (24 divided by 7) that ultimately resolves to the value 3.43? Or is it the opening hours for a well known coffee and donuts chain of stores (Open 24 hours, 7 days a week)? Or worse, is it a date… in which case does it mean July 2024… or the 24th of July of an unspecified year?

Note!: I asked chatGPT how it interprets 24/7 and by default, it considers it as “a means to express continuous availability“. When I pointed out it could also be a date, it replied “Yes, you’re correct. “24/7” could indeed represent the 24th of July in a date format, particularly in contexts where dates are abbreviated in this manner […] Thank you for pointing out this interpretation. It’s important to consider the context to understand the intended meaning accurately.

So it all depends on the context, right? Even chatGPT says so! Accordingly, the DataMapper allows you to provide context for some of those data items you want to extract. That context is provided through the Data format section of the Extract step:

This tells the DataMapper that the string of characters it’s about to extract should be treated as a currency. The sequence of operations, from the DM’s perspective, is as follows:

  1. Read the string of characters from the data source
  2. Apply the format from the Data format section to the string of characters
  3. Check if this results in output that matches the expected field type (currency, in the above example)
  4. Store the field in the data model

This is the sequence of operations when you do not specify a Post function (yes, I know, it took a while to get to the point. I often get comments about my tendency to get sidetracked or to tell long-winded stories before I get to the punch line. I guess this all dates back to my childhood when… oh but wait! I digress. again..)

But if you do use a Post function, it inserts itself between steps 1 and 2 above. Let’s number it 1.5 in the list:

  • 1.5. Transform the string of characters with the Post function

If you’re a JavaScript afficionado, you know that under the hood step 1 is just a data.extract() statement. When you specify a Post function, the DataMapper virtually adds a dot (.) after that statement, allowing you to daisy-chain string operations. So let’s imagine that for some – unfathomable – reason, you want to reverse the string of characters before storing it into the data model. You would use the following Post function:

split("").reverse().join("")

The DataMapper will internally convert this to :

data.extract(some_location).split("").reverse().join("")

So if your original data contained enoonmai, what will actually get extracted is iamnoone. (Extra points to those who recognize the literary reference!).

Now remember that in step 1, the DataMapper always reads the sequence of characters as a string, so whatever you use in your Post function must be a daisy-chainable string function. For instance, you couldn’t use Math.parseInt() to convert what it reads into an integer. But there’s almost always a way to achieve what you want. The reverse() method above is a good example of that: the original string is first converted into an array (with split()), which is reversed and then concatenated back into a string in the end (with join()).

And that’s also an important thing to keep in mind: your Post function must return a string, because step 2 in our list will be applying the Data format to a string, as if it had just been read directly from the data.

So now that you have mastered the entire process, what can you do with the Post function?

Here are some examples:

// replace parenthesis around negative numbers with - sign
// original data: (247.65)
// Post function: replace(/(?:\()([^\)\(]*)(?:\)$)/,"-$1")
// result: -247.65

// convert string to lowercase
// original data: PLEASE DON'T SHOUT
// Post function: toLowerCase()
// result: please don't shout

// properly format a Canadian zip code (A#A #A#)
// original data: g0u1b7
// Post function: replace(/([A-Z0-9]{3})([A-Z0-9]{3})/i,"$1 $2").toUpperCase()
// result: G0U 1B7

// extract first name from full name string
// original data: John Doe
// Post function: split(" ")[0]
// result: John

// extract email server name from address
// original data: John.Doe@emailserver.com
// Post function: split("@")[1]
// result: emailserver.com

// format North-American phone number
// original data: 15149876543
// Post function: (on next line)
// replace(/([\d]{1})([\d]{3})([\d]{3})([\d]{4})/,"$1($2)$3-$4")
// result: 1(514)987-6543

// format American-style date to ISO 8601
// original data: 03/01/24 (that's March 1st 2024)
// Post function: (on next line)
// replace(/([\d]{2})\/([\d]{2})\/([\d]{2})/,"20$3-$2-$3")
// result: 2024-03-01

Yes, for all of the above, you could have used a JavaScript extraction instead. But this is way more efficient and since the Data selection is made the “normal” way (i.e. when it is based on Location), it gets highlighted in the Data Viewer pane whenever you click on that field in the data model, something the DataMapper can’t do with a JS-extracted field.

So before switching to a JavaScript extraction, try and see if you can achieve the same thing through a Post function.

Data types in the Data Model

The other topic that seemed popular this week revolved around how the data is displayed in the Data Model pane. I had to explain, for the umpteenth time, that what is displayed in that pane is neither what is stored in the database nor what will ultimately be displayed on the template. But then I realized that this blanket statement should be tempered a bit because… well… it’s complicated.

There are, however, two statements that do apply to all data types:

  1. In the database, each value is stored as a specific data type that has no format. A format is a way to view a value, but it’s not that view that gets stored, it’s the actual value. This is especially important for dates, floats and currencies. Granted, as soon as you start talking about currencies and dates, things get confusing. That’s because we all know what a currency is (an amount of money) and what a date is (a precise moment in time). But globally, we all use various formats to display those values.
  2. You have full control over how the data is displayed on the template, regardless of the data type. So what you see in the Data Model pane is not necessarily what you will see on the template, because it all depends on the format you’ll be applying on the template.

With that out of the way, let’s explore each data type individually

String/HTML string

Yes, I know, strings and HTML strings are two separate types in the DataMapper, but for the purpose of this discussion, they are the same: what gets extracted from the data is what gets stored in the database. No transformation occurs.

The only difference between the two types is that HTML tags stored in an HTML string will be inserted and interpreted in your template, whereas a String would simply display the tags as standard text.

Boolean

Those fields are stored as boolean fields in the database. The actual format of the value in the database is irrelevant, it can only have one of two values: true or false. The data model always displays true or false, in English, regardless of your language preferences in the application or even in Windows for that matter.

Integer

An integer is any whole number between -(2^63) and (2^63). It gets displayed in the data model exactly as it is stored. I could have lumped it in with the String types above, but that might only have added to the confusion.

Float/Currency

Again, those are similar: for the purpose of the DataMapper, a currency is a float that’s rounded off after two decimal digits, and a float is a decimal value that’s rounded off to the 3rd decimal digit.

In both cases, the data model displays those values with minimal formatting: a dot for the decimal separator, and no thousands separator, regardless of your Windows preferences. But, once more, remember it’s the actual value that’s stored in the database, not that format.

Dates

Aahhh… dates. We all love them… NOT!

As mentioned earlier, a date is a value that represents a precise moment in time. To the millisecond, to be precise. In that respect, it should actually be called a timestamp, but let’s not quibble over words for the purpose of this conversation.

The date value is stored in the database as an insanely long number that represents a number of milliseconds before/after a certain arbitrary point in time (usually January 1st 1970, but not necessarily). But again, you don’t care how it’s stored in the database as long as its value is correct. The data model displays this value according to your Windows preferences. And once more in unison: this has no bearing on how the date is ultimately displayed on your template.

Now I’ve also been asked if the date is stored as UTC or local time zone in the database. After all, the time zone makes a big difference: Monday 9PM in New-York is actually Tuesday 2AM in London. So we need to know how it is stored in the database, don’t we?

NO WE DON’T!

Think about it. If you’re in New-York at 9PM and you give your brother a phone call in London, there are two certainties:

  1. Your brother will be mad at you for calling him in the middle of the night.
  2. Even though it’s 2AM for him, he’s not picking up your call 5 hours later, is he? He’s picking it up now. At a precise moment in time. A moment in time that you and him describe in a different way (9PM vs. 2AM) but that is nevertheless the same moment in time. Therefore, time zones are just another type of representation of the same exact moment in time. It’s a format that has no impact on the actual value. Accordingly, no time zone information is stored in the database.

I hope this one is cleared up for good, now!

JSON

JSON is a slightly different beast. It gets stored as a string in the database (because JSON is a string format to begin with). That’s what you see in the data model right next to the field name. But to improve the UX, the data model also displays each property of that JSON object on its own line, with its own value. It must be pointed out that JSON dates are always displayed as strings that use the ISO 8601 format because JSON does not have a native date format.

To wrap it up

Hopefully, these couple of tidbits will have helped clear up a few misunderstandings about data types and the Post function. Let me know in the comments if that was helpful and feel free to suggest other topics you’d like us to address!

Leave a Reply