Parsing Products and Events in ClickStream Data Feeds

A lot of companies that I’ve worked with are initially confused when processing Adobe Analytics Data Feeds. The data comes out of Adobe Analytics in TSV format and you’d naturally expect that the data is ‘flat’ (meaning just rows and columns).

Unfortunately, this isn’t the case. Columns like ‘post_product_list’ and ‘event_list’ are lists of data that are stored in a single column. Parsing these columns into something like Hive or Tableau can be a major pain in the neck, especially if you don’t understand some of the nuances involved. I hope to address those in this post.

Product List

The product list contains a list of products, categories, quantities, costs, and other interesting information. It is usually set with JavaScript when a customer interacts with the shopping cart in some way. For example, when adding products to the cart, viewing the cart, moving through the order process, cancelling orders, and completing orders.

Please consider the following product list as an example:

Example Product List
Example Product List

Let’s pick this apart. The product items (green boxes) are separated by a comma (red box). Let’s take a closer look at the first product item in the list.

Example Product Item
Example Product Item

Each product item has its own fields that are separated by semi-colons.

  1. The product category (grey box) — this is deprecated. I recommend using SAINT classifications to organize your product taxonomy instead.
  2. The product SKU (blue box)
  3. The quantity (orange box)
  4. The product item total (green box) — note: This is the total amount for this product item. In this example, you can find the cost of each product item by dividing the number in the green box by the number in the orange box. In this example, these shoes cost $29.95 / pair. If you calculate revenue and it seems unusually high, it’s usually due to misunderstanding how this field works.
  5. Serialized events (red box) — this is a list of events that are set along with this product. They are separated by ‘|’ characters and are considered events that occurred on this hit, but are specifically associated with this product item and not other product items in the product string.
  6. There are no merchandising eVars here, but if there were, they’d be at the end of this list. They are also separated by ‘|’ characters.  If you don’t know what merchandising eVars are, I’d recommend reading the documentation here, and reading about how they appear in the product string here.

With this in mind, we could calculate number of items purchased, cost, and some custom metrics from this event list. It would look like this:

  1. 2 x Jupiter Cross Trainers that cost $29.95 each, totaling $59.90. Custom event 2 is incremented by ‘1’, and custom event 4 is incremented by ‘2’ and are associated with this product SKU. There are no merchandising eVars.
  2. 6 x Honeycress Running Socks that cost $5.95 each, totaling $35.7. There are no custom events here or merchandising eVars.

Event List

The event list contains a list of events that are set on the page. An event is used to track when the end user does something interesting on the page or app for example:

  • Adding a product to the shopping cart
  • Completing a purchase
  • Performing a search of the site
  • Submitting contact information

The event list is a bit easier to process than the product list. What makes it a bit tricky is that in your JavaScript implementation (hopefully you’re using Adobe’s tag manager Activation – aka “DTM” – by now and not s.code), the events are identified with strings (i.e., ‘event1’, ‘event2’, etc.) rather than the numeric strings you’ll find in the Data Feed. Adobe Analytics converts these friendly strings to numeric identifiers and provides the raw numbers in the event list. For example, ‘event1’ would show up as ‘200’. A lookup table of event IDs and event names is available in the “event_lookup.tsv” file found in the Data Feed payload.

Below is an example of the event list structure as it might appear in a Data Feed:

1,201,210=17.37

Its easy to see that this list is comma separated, but that some events can have a certain value (in contrast to just incrementing an event by ‘1’). When loading this data into a table of some sort, it’s sometimes helpful to append a comma to the beginning and end of the string, creating something like this:

,1,201,210=17.37,

This allows a regular ‘contains’ search to work on the event list. For example, lets say I want to filter records based on the event list containing the event ID ‘1’ (which happens to be the purchase event). Without the leading and trailing commas, searching for ‘1’ or ‘1,’ would incorrectly match that entire string because of event ID “201” and event ID “210”.

Appending commas to both ends allows you to search for ‘,1,’ and get it right every time.

Merchandising eVars

You’re probably thinking, ‘Wait! You can’t be done! We haven’t talked about merchandising eVars yet!’. Well, I’m saving that for its own post. Merchandising eVars have a very specific use case and aren’t used very often. They also aren’t initially intuitive and I want to keep this post simple and focused. Finally, some changes to the way merchandising eVars are presented in data feeds have been rolling out to customers over the last few quarters that make them even more complicated. I just want to make sure I give them the attention they deserve in a post of their own.

Common Problems

Here are some common problems that I’ve seen and their usual cause.

Revenue is really low compared to Analytics

Make sure that you’re parsing all of the product items in the product list and not just the first item.

Revenue is really high compared to Analytics

Make sure that you’re not multiplying the product item total by the quantity.

When searching for and totaling a certain event, the result is really high

Make sure you’re appending commas to the beginning and end of the event list and using a ‘contains’ match. If you’re not doing this, you’re matching other custom events that shouldn’t be matched.

My custom event counts are really low

Make sure you’re handling the ‘=’ characters correctly. They allow certain events to be incremented more than once. If running a custom metric report against a product SKU, make sure you’re processing the serialized events in the product list.  See Trevor’s post “Visitor Level Aggregations Using R and Adobe Analytics Data Feeds” post for an example of using a regular expression to handle this use case.

Jared Stevens

Jared Stevens is a software engineer at Adobe and works on the Analytics reporting APIs. He has also worked as a Software Engineering consultant at Adobe for 7 years and has assisted many of Adobe's top tier customers with custom integrations and data processing requests. When he's not knee deep in data, he enjoys backpacking, video games, and learning about new things.