Technology

Predictvia applies two core technologies, Seenatra and DAS: machine learning-based classification and Dynamic-Adaptive sampling to the analysis of data obtained from Internet users, either from social media or directly from them.

  • Social Media Data: We do not focus on what people say through tweets. Even though we do analyze messages, we also analyze structured data, semi- structured data, metadata and previously modeled data.
  • Direct Data: We are able to obtain direct data from users on a large scale through our partnership with Google Consumer Surveys and our own data collection procedures, which include extensively using Facebook and Twitter for such purposes.

Seenatra

Seenatra is a predictive analysis, machine learning-based method fed with large datasets gathered from social media and directly from users. This method accomplishes two major results:

  1. It trains a model that classifies any given user as positive or negative for having the Unifying Property.
  2. As a result of the modeling process, it identifies certain variables, mostly “human interests”, that are predictive of the Unifying Property. Essentially, a few “human interests” are “machine-picked” from thousands of options.

Step by step process:

  1. Input: defined Unifying Property and region.
  2. Training Set Selection: implies selecting a set of users who have the Unifying Property and a set of users who do not have the Unifying Property. This is accomplished by executing two procedures:
    1. Selecting users through a heuristic. For example: users who follow certain Twitter accounts related to the Unifying Property.
    2. Running a survey among a set of previously classified users and therefore obtaining two deterministic subsets: those who say “yes” and those who say “no”.
  3. Data Gathering: data from users of the subset “yes” and from users of the subset “no” is downloaded and stored. Often, these are social media users although they could also be found through other platforms.
  4. Modeling: this process involves applying a number of algorithmic procedures to train a unified, mathematical data-model that can be used both to classify any user as positive (or negative) for a Unifying Property, and to identify parameters that are effective for such classification outcome.
  5. Validation: the process is completed by validating the output of the model. We do this by asking directly to a sample of the users who were either directly or indirectly classified, if they do have the Unifying Property. This step yields an indicator of how good the model is at finding people who have the Unifying Property.

Dynamic Adaptive Sampling

Dynamic Adaptive Sampling is an intelligent sampling technique that optimizes a sample design as the data is being collected. The method analyzes data from every stratum of a sample design and then decides if it needs more sample, according to how “uniform” such stratum is. Because this occurs simultaneously with the surveying process, DAS is able to “ask” for more completed questionnaires on an as-needed basis. Therefore, this technique improves the sampling process by reducing collection time, sample size and cost. But more importantly, it guarantees a sample that efficiently represents the universe.

Step by step process:

  1. Input: budget, maximum sampling error and base count (% of the universe with the Unifying Property)
  2. Preliminary Sample Design: estimation of required sample per stratum (using the auxiliary information of the specified universe of people).
  3. Sampling:
    1. Collect sample and evaluate it as is collected.
    2. Optimization cycle (an heuristic approach):
      1. Calculate new sample required using each stratum variance.
      2. If sample required = 0 THEN “end” ELSE “go to step 3.1”.
  4. Result: universe representative sample.

Contact Us