Semex Download

Semex, short for SEMantics EXplorer, is a system that helps you organize your personal information in a semantically meaningful way. Rather than considering your personal information as isolated files spread over the disk, Semex views it as a network consisting of instances, such as persons, articles, emails and pictures, and the associations between these instances, such as authors, coauthors, and email-senders. With this logical view of personal information, Semex provides an enriched desktop search experiences.

Features

Semex has the following features:

  1. Automatic instance-and-association extraction
    Semex scans your hard disk or the specified directories and consider the following file formats:

    From them Semex extracts the following types of instances:

  2. Reference reconciliation
    Semex automatically reconciles the extracted instances that refer to the same real-world instance. For example, it can recognize the person names and email addresses that refer to the same real-world person, and treat them as attributes of a single person instance.
  3. Association navigation
    Semex lists all classes of instances extracted from the personal data. When the user poses a query (or selects one class from the list), Semex classifies the returned objects into their classes. The user can select a particular object instance to see detailed information, including its attribute values and its associated instances, which have been grouped by associations. The user can then browse the data by following association links, much like web browsing.
  4. Keyword search
    Semex
    supports a search mechanism that is more intelligent than simple keyword search. For example, when searching for the keyword “schema matching”, Semex returns not only the instances representing papers and presentations that contain “schema matching” in the text, but also instances representing people working on this area, conferences and journals that have published papers on this topic, etc. In addition, Semex goes to the Web and returns webpages that are relevant to the keyword search.
  5. Advanced search
    As a more complex search mechanism, the user can compose predicate queries and triple queries using semex interface.

    Predicate query: A predicate query describes a desired instance by a set of predicates, each of which describes either an attribute value or an associated instance. Semex returns all instances in the association network that satisfy all or some of the predicates in the query. In addition, Semex searches unstructured data in the personal information space and searches the Web to return text documents and webpages that are relevant to the query.

    Triple query: A triple query is a conjunctive query over triplets, each triplet describing an association between a pair of objects or an attribute of an object. It is more powerful than a predicate query in that it can describe a chain of associations, where the length of the chain is larger than one. Semex searches over personal data and the Web to answer triple queries.

  6. Ranking
    Semex provides three options for ranking: by relevance score, by importance score, and by timeline. By default, the returned instances are ranked by their relevance to the query, computed in a way close to the TF/IDF measure. A user can choose to rank the returned instances according to how important they are in the personal information space, computed in a way close to PageRank. In addition, a user can choose to rank certain instances such as Articles and Emails by their latest modification time.
  7. Lineage information
    For each instance in the personal information space, Semex finds out how it is associated with the data owner. For example, given a person instance, the user can quickly figure out how she knows the person, when is the first time she heard about this person and when is the last time she contacted this person.
  8. Tell me in my own words
    Semex offers an interface through which the user can browse a webpage, and extract instances from the webpage. In other words, given an external data source, Semex will look for the instances that already exist in the user's information space. For example, Semex will look for persons, papers, cities that the user already knows about.

How to ...

  1. Download and execute Semex

    You will see the interface as below (slightly different for Linux and Mac). Fill in your name and email.

     

  2. Extract instances and associations from your hard disk
  3. Reference reconciliation.
  4. Navigate your personal data
  5. Search your personal data
  6. Rank the results: By default, search results are ranked by their relevance to the query. One can also choose to rank the results by timestamp by clicking the "Timestamp" button, or rank the results by importance by clicking the "Importance" button in the query bar.
  7. Find out lineage information
  8. Browse external websites
    Click "Web Browser" on the left pane, and a web browser will be launched on the upper-right pane and instances associated with the webpage will be shown in the lower-right pane. Click the "Extract" button and Semex will extract instances that have occurred in the personal information space, and refresh the results on the lower-right pane.

Note

The current version of Semex has the following problems:

  1. The system uses the Jena engine to store the data. Since Jena reads everything to memory, the system can run out of memory if there are a large number of extracted instances and associations.
  2. If the platform does not match, launching web browser can cause an error.