A lot of companies that I’ve worked with are initially confused when processing Adobe Analytics Data Feeds. The data comes out of Adobe Analytics in TSV format and you’d naturally expect that the data is ‘flat’ (meaning just rows and columns).
Unfortunately, this isn’t the case. Columns like ‘post_product_list’ and ‘event_list’ are lists of data that are stored in a single column. Parsing these columns into something like Hive or Tableau can be a major pain in the neck, especially if you don’t understand some of the nuances involved. I hope to address those in this post.
Please consider the following product list as an example:
Let’s pick this apart. The product items (green boxes) are separated by a comma (red box). Let’s take a closer look at the first product item in the list.
Each product item has its own fields that are separated by semi-colons.
- The product category (grey box) — this is deprecated. I recommend using SAINT classifications to organize your product taxonomy instead.
- The product SKU (blue box)
- The quantity (orange box)
- The product item total (green box) — note: This is the total amount for this product item. In this example, you can find the cost of each product item by dividing the number in the green box by the number in the orange box. In this example, these shoes cost $29.95 / pair. If you calculate revenue and it seems unusually high, it’s usually due to misunderstanding how this field works.
- Serialized events (red box) — this is a list of events that are set along with this product. They are separated by ‘|’ characters and are considered events that occurred on this hit, but are specifically associated with this product item and not other product items in the product string.
- There are no merchandising eVars here, but if there were, they’d be at the end of this list. They are also separated by ‘|’ characters. If you don’t know what merchandising eVars are, I’d recommend reading the documentation here, and reading about how they appear in the product string here.
With this in mind, we could calculate number of items purchased, cost, and some custom metrics from this event list. It would look like this:
- 2 x Jupiter Cross Trainers that cost $29.95 each, totaling $59.90. Custom event 2 is incremented by ‘1’, and custom event 4 is incremented by ‘2’ and are associated with this product SKU. There are no merchandising eVars.
- 6 x Honeycress Running Socks that cost $5.95 each, totaling $35.7. There are no custom events here or merchandising eVars.
The event list contains a list of events that are set on the page. An event is used to track when the end user does something interesting on the page or app for example:
- Adding a product to the shopping cart
- Completing a purchase
- Performing a search of the site
- Submitting contact information
Below is an example of the event list structure as it might appear in a Data Feed:
Its easy to see that this list is comma separated, but that some events can have a certain value (in contrast to just incrementing an event by ‘1’). When loading this data into a table of some sort, it’s sometimes helpful to append a comma to the beginning and end of the string, creating something like this:
This allows a regular ‘contains’ search to work on the event list. For example, lets say I want to filter records based on the event list containing the event ID ‘1’ (which happens to be the purchase event). Without the leading and trailing commas, searching for ‘1’ or ‘1,’ would incorrectly match that entire string because of event ID “201” and event ID “210”.
Appending commas to both ends allows you to search for ‘,1,’ and get it right every time.
You’re probably thinking, ‘Wait! You can’t be done! We haven’t talked about merchandising eVars yet!’. Well, I’m saving that for its own post. Merchandising eVars have a very specific use case and aren’t used very often. They also aren’t initially intuitive and I want to keep this post simple and focused. Finally, some changes to the way merchandising eVars are presented in data feeds have been rolling out to customers over the last few quarters that make them even more complicated. I just want to make sure I give them the attention they deserve in a post of their own.
Here are some common problems that I’ve seen and their usual cause.
Revenue is really low compared to Analytics
Make sure that you’re parsing all of the product items in the product list and not just the first item.
Revenue is really high compared to Analytics
Make sure that you’re not multiplying the product item total by the quantity.
When searching for and totaling a certain event, the result is really high
Make sure you’re appending commas to the beginning and end of the event list and using a ‘contains’ match. If you’re not doing this, you’re matching other custom events that shouldn’t be matched.
My custom event counts are really low
Make sure you’re handling the ‘=’ characters correctly. They allow certain events to be incremented more than once. If running a custom metric report against a product SKU, make sure you’re processing the serialized events in the product list. See Trevor’s post “Visitor Level Aggregations Using R and Adobe Analytics Data Feeds” post for an example of using a regular expression to handle this use case.