The Crowdskout Tech Stack and Data Design

The Crowdskout Tech Stack And Data Design

Dan Fey
Dan Fey

Recently, I led two sessions at php[world], a conference that highlights the trends and best practices in the PHP programming language and aims to strengthen and build community. One session was about how Crowdskout’s data model allows multiple customers to share large subsets of data instantly. The other session was about how Crowdskout uses a technology called Elasticsearch to provide efficient audience segmentation and charting among very large datasets that traditional databases cannot support. Here is an inside look at how our database technologies, powerful data model, and matching algorithm enable the flexibility and performance our customers need to run robust, data-driven marketing campaigns.

The Crowdskout platform enables on-demand, real-time segmenting and charting for any combination of over 80 profile categories among billions of data points. This data comes from many different sources including page views, web forms, voter data, imports, and an ever-growing list of synchronized integrations. In addition, with our app Quartermaster, we support complex, fine-grained data-sharing relationships between partner accounts.

Regardless of the source, this data must pass through a code project we call Cerebro via an API. Cerebro contains the logic to validate, clean, and save a customer’s data into our three database technologies, MySQL, MongoDB, and Elasticsearch. The MySQL and MongoDB databases are our databases of record. This means they act as the authoritative and consistent source of truth for your data. Once a customer’s data passes through Cerebro and is recorded in one of these two databases, it is now a part of their account. While MySQL and MongoDB are proven database technologies that are great at ensuring data is consistent, they are not very good at providing performant search and aggregation capabilities. For this, we replicate data in Elasticsearch, a search and analytics engine that powers the lightning-fast search, segmenting, and charting capabilities at scale that MySQL and MongoDB cannot support. Using MySQL, MongoDB, and Elasticsearch, we ensure data stays consistent while also providing performant access to analytics and aggregation tools.

As data funnels into the Crowdskout platform, each unit of data is tagged with the source from where it originated. For example, a page view will be tagged as a ‘digital’ source. All items in a spreadsheet import will be tagged as a specific ‘import’ source. These sources enable sharing relationships with partner accounts through Quartermaster. Customers can then select which specific sources they wish to share among accounts, whether it’s a single spreadsheet, all digital data, or even their entire account. Through data-sharing with Quartermaster, our customers can build their networks, develop mutually beneficial relationships with partner organizations, and magnify their outreach.  

Crowdskout also matches profile data between different data sources. For example, a person may visit a few pages on a website, and then fill out a form with their name and email address.  That person may also register for and attend an Eventbrite event. In this example, the website data arrives through a digital source, and the Eventbrite data arrives through an integration source. We use our matching algorithm to match the two sources together so the customer can see a single view of the profile and how the person interacted with the customer’s organization. The customer can similarly create a segment of people who have both visited his organization’s website and attended a specific event to continue engagement with a targeted email. Our matching algorithm allows our customers to create audience segments that are based on actions and interactions from many different sources of data. This enables deep analysis of the customer’s audience and empowers organizations to create messaging that truly resonates.

Our database technologies, source-tagging data model, and matching algorithm work together to provide the data backbone of the Crowdskout platform. The source data model implemented in MySQL and MongoDB provides a consistent view of a customer’s audience and enables the customer to configure information sharing relationships with partners to magnify outreach. The matching algorithm allows organizations to aggregate their data across different sources to understand the complete picture and create targeted messaging. All of this data is replicated into Elasticsearch to provide organizations the ability to search, segment, and chart their entire data set in real time and on demand. These insights allow organizations to dig deeper into their data, monitor the progress of their campaigns over time, and ultimately amplify their impact.