Semex Download
Semex, short for SEMantics EXplorer, is a system that helps you organize your
personal information in a semantically meaningful way. Rather than considering
your personal information as isolated files spread over the disk, Semex views it
as a network consisting of instances, such as persons, articles, emails and
pictures, and the associations between these instances, such as authors,
coauthors, and email-senders. With this logical view
of personal information, Semex provides an enriched desktop search experiences.
Features
Semex has the following features:
- Automatic instance-and-association extraction
Semex scans your hard disk or the specified directories and consider the
following file formats:
- Emails: Outlook emails, Pine mails,
- LATEX files: .tex, .bib, .bbl
- MS office files: .doc, .ppt
- Text files: .txt, .pdf, .ps, .html/.htm
- Images: .gif, .jpg/.jpeg, .png, .tiff, .eps, .bmp
From them Semex extracts the following types of instances:
- Person
- Message
- Article
- Presentation
- Conference
- Journal
- Image
- Documents
- Webpage
- Reference reconciliation
Semex automatically reconciles the extracted instances that refer to the
same real-world instance. For example, it can recognize the person names and
email addresses that refer to the same real-world person, and treat them as
attributes of a single person instance.
- Association navigation
Semex lists
all classes of instances extracted from the personal data. When the user
poses a query (or selects one class from the list),
Semex classifies the returned objects
into their classes. The user can select a particular object instance to see
detailed information, including its attribute values and its associated
instances, which have been grouped by associations. The user can then browse
the data by following association links, much like web browsing.
- Keyword search
Semex supports a search mechanism that is
more intelligent than simple keyword search. For example, when searching for
the keyword “schema matching”, Semex
returns not only the instances
representing papers and presentations that contain “schema matching” in the
text, but also instances representing people working on this area,
conferences and journals that have published papers on this topic, etc. In
addition, Semex
goes to the Web and returns webpages that are
relevant to the keyword search.
- Advanced search
As a more complex search mechanism, the user can compose predicate
queries and triple queries using semex interface.
Predicate query:
A predicate query describes a desired instance
by a set of predicates, each of which describes either an attribute value or
an associated instance. Semex
returns all instances in the association network
that satisfy all or some of the predicates in the query. In addition,
Semex
searches unstructured data in the personal information space and searches
the Web to return text documents and webpages that are relevant to the
query.
Triple query:
A triple query is a conjunctive query over
triplets, each triplet describing an association between a pair of objects
or an attribute of an object. It is more powerful than a predicate query in
that it can describe a chain of associations, where the length of the chain
is larger than one. Semex
searches over personal data and the Web to
answer triple queries.
- Ranking
Semex
provides three options for ranking: by
relevance score, by
importance score,
and by timeline.
By default, the returned instances are ranked by their relevance to the
query, computed in a way close to the TF/IDF measure. A user can choose to
rank the returned instances according to how important they are in the
personal information space, computed in a way close to PageRank. In
addition, a user can choose to rank certain instances such as
Articles and
Emails
by their latest modification time.
- Lineage information
For each instance in the personal information space, Semex finds out how
it is associated with the data owner. For example, given a person instance,
the user can quickly figure out how she knows the person, when is the first
time she heard about this person and when is the last time she contacted this
person.
- Tell me in my own words
Semex offers an interface through which the user can browse a webpage, and
extract instances from the webpage. In other words, given an external data
source, Semex will look for the instances that already exist in the user's
information space. For example, Semex will look for persons, papers, cities
that the user already knows about.
How to ...
- Download and execute Semex
You will see the interface as below (slightly different for Linux and Mac).
Fill in your name and email.

- Extract instances and associations from your hard disk
- Reference reconciliation.
- One can choose to do reconciliation at extraction time. Reconciliation
can take a long time; however, it is a one-time process and can
significantly improve querying and browsing.

- If the user does not do reconciliation at extraction time, she or he can
also do reconciliation using the menu by selecting Collect Data -> Reference
Reconciliation.
- Navigate your personal data
- Click on a class type on the left pane and you will see a list of
instances on the middle-pane.
- Click on an instance in the middle pane and you will see its attributes
on the right-pane.
- Click on a plus mark in the middle pane and you will span the tree and
see associated instances.
- Double click on a file (PDI) instance in the middle pane and semex
will launch the default application for that specific file type.
- Search your personal data
- Rank the results: By default, search results are ranked by
their relevance to the query. One can also choose to rank the results by
timestamp by clicking the "Timestamp" button, or rank the results by
importance by clicking the "Importance" button in the query bar.
- Find out lineage information
- Shortest lineage: In the middle pane, hover over an instance and Semex
will show the shortest association path between the instance and the
instance that represents the owner of the dataset.
- Earliest/Latest lineage: In the right pane, click the button "Earliest
Lineage" or "Latest Lineage", Semex will show a graph depicting the
association path with the earliest timestamp or the association path with
the latest timestamp.
- Browse external websites
Click "Web Browser" on the left pane, and a web browser will be launched
on the upper-right pane and instances associated with the webpage will be
shown in the lower-right pane. Click the "Extract" button and Semex will
extract instances that have occurred in the personal information space, and
refresh the results on the lower-right pane.

Note
The current version of Semex has the following problems:
- The system uses the Jena engine to store the data. Since Jena
reads everything to memory, the system can run out of memory if
there are a large number of extracted instances and associations.
- If the platform does not match, launching web browser can cause
an error.