- All items in a set A have one or more properties
- The items in a subset B of this set also have these properties, but it is unclear if items in subset B can be distinguished via the values of these properties from the items not in B.
- So, first, identify which items in B have common values for one or more of these properties, the lower approximation to B.
- And, second, identify which items in A are somewhat alike those in B.
Name Education Age Chance for a job? -------- ----------- --------- ----------------- Joe High School Old No Mary High School Young Yes Peter Elementary Young No Paul University Young Yes Cathy Doctorate Old YesWhat can we deduce about the set of people with good job perspectives? We try with: education only, age only, and both.}
# # Set up the database # set props(Joe) {HighSchool Old No} set props(Mary) {HighSchool Young Yes} set props(Peter) {Elementary Young Yes} set props(Paul) {University Young Yes} set props(Cathy) {Doctorate Old Yes} # # Procedure to determine the equivalence classes # proc equivClasses {which_props} { global props global class array unset class * foreach person [array names props] { set cvalues {} foreach propno $which_props { lappend cvalues [lindex $props($person) $propno] } lappend class($cvalues) $person } } # # Print the results so far # puts "Equivalence classes:" # # Only education # puts "Education only - Mary, Paul, Cathy?" equivClasses 0 parray class # # Only age # puts "\nAge only - Mary, Paul, Cathy?" equivClasses 1 parray class # # Education and age # puts "\nEducation and age - Mary, Paul, Cathy?" equivClasses [list 0 1] parray classif { 0 } { Studying the output, we see, that no criterium has an equivalence class that is identical to the complete set of successful persons.So, now the next step: can we find a criterium that gives a "good" fit? For this, we look for equivalence classes that have only successful people. }
# # Determine the set of classes that contain only persons from the # group we consider # proc subsetOnlyClasses {subset} { global class set equiv_classes {} set found_persons {} foreach c [array names class] { set okay 1 foreach p $class($c) { if { [lsearch $subset $p] < 0 } { set okay 0 break } } if { $okay } { set found_persons [concat $found_persons $class($c)] lappend equiv_classes $c } } set found_persons [lsort -unique $found_persons] return [list $found_persons $equiv_classes] } puts "Classes with successful persons only:" # # Education? # puts "\nEducation only?" equivClasses 0 puts "[subsetOnlyClasses {Mary Paul Cathy}]" # # Age? # puts "\nAge only?" equivClasses 1 puts "[subsetOnlyClasses {Mary Paul Cathy}]" # # Education and age? # puts "\nEducation and age?" equivClasses {0 1} puts "[subsetOnlyClasses {Mary Paul Cathy}]"if { 0 } { From this output we see that education only can explain the job chances for Cathy and Paul, but not for Mary. Combining education with age solves this: these two "explain" the subset B of people with a good chance for a job exactly.What however about the classes that might play a role? That is the last part ... }
# # Classes that possibly play any role # proc possibleClasses {subset} { global class set equiv_classes {} set found_persons {} foreach c [array names class] { set okay 0 foreach p $class($c) { if { [lsearch $subset $p] >= 0 } { set okay 1 break } } if { $okay } { set found_persons [concat $found_persons $class($c)] lappend equiv_classes $c } } set found_persons [lsort -unique $found_persons] return [list $found_persons $equiv_classes] } puts "Classes with at least one successful person:" # # Education? # puts "\nEducation only?" equivClasses 0 puts "[possibleClasses {Mary Paul Cathy}]" # # Age? # puts "\nAge only?" equivClasses 1 puts "[possibleClasses {Mary Paul Cathy}]" # # Education and age? # puts "\nEducation and age?" equivClasses {0 1} puts "[possibleClasses {Mary Paul Cathy}]"if { 0 } { I probably should have used different names for the procedures - subsetOnlyClasses seems to deal with possible classes and possibleClasses with necessary classes. Oh well ...Conclusion:
- Age alone has no predictive capability (phew!)
- Education does, but not exactly: Cathy and Paul have a high enough education, it would seem, but Mary not. So the lower approximation is {Cathy Paul} and the upper approximation is {Cathy Paul, Mary, Joe}
- Education and age together give an exact prediction
AM (9 july 2009) I found this article on an extension to Tcl to deal with rough sets:
Rough set extension of Tcl for data mining Knowledge-Based Systems, Volume 11, Issues 3-4, 12 November 1998, Pages 249-253 G. Griffin, Z. Chen
GAM -- 12 July 2009See Rough sets a la Relations for one relational take on this very interesting idea.
[ Category Concept ]