« prev next »

I am not Computer Science Major

by Jian Shuo Wang (@jianshuo) on December 13, 2009 under Wendy

When I have to write algorithms related code, I feel the limitation of my knowledge. From time to time, I need to turn to Wendy for help. She does not like to code as I do, but she completed 6 years of under-graduate and graduate study in Computer Science from Jiaotong University. I didn't. My major was Automation, not pure computer science.

I have been puzzled by this for three days. Is there any chance that my readers can help me to solve this puzzle?

I have used query builder to create two structured query tree: q1, and q2.
I want to know whether the set represented by q1 is a subset of q2 or not.

Example in PHP:

$q1 = new AndQuery(new Query('id', 123), new Query('age', 13));
$q2 = new Query('age', 13);
echo $q1->isSubsetOf($q2);

I need the function of "isSubsetOf".

In the example, it should return true, but the real world situation is much complicated than that.

The need for such a function is, I have several search database optimized for each use, and each of them is represented by a query. For any user issued query, I can compare the query and the existing datastores. If I can confirm that the query result is subset of a datastore, I can safely use it to speed up the return of the results.

Related Entries: Wendy
  1. Wendy is Back via FM9116 May 22, 2010
  2. 7 Year Anniversary of Wedding March 16, 2010
  3. New Year Eve 2010 January 1, 2010
  4. I am not Computer Science Major December 13, 2009
  5. Milk Tea Business November 1, 2009
  6. Wendy Opened a Milk Tea Shop October 29, 2009
  7. Recording of Memories October 12, 2009
  8. End of a Holiday without Wendy May 30, 2009
  9. 6 Years of Marriage March 17, 2009
Comments

Hi Jianshuo,
I think this is typical data structure question. I recommend you to read something about it, it help for programming. Return to this question, I think it depend on how "query" storted in PHP. If you determine it, you can choice suitable method to search like bisection method, bubble method and so on. If I misunderstand your question, please skip my comment :-)

richard

Posted by: richard on December 15, 2009 8:56 AM

Jianshuo,

Is your Query some sort of results - i.e. a list of records you get from the DB as result of executing the query, or it is a structure describing the search condition?

If it's the first case, then the problem can be described as: Given two data sets S1 and S2, decide if S1 is the subset of S2. Richard's solution works, you will need to decide if every element in S1 is in S2. You can use different ways to optimize your solution, say if those elements are comparable, you can use sort; if those elements have a hash function, you can use hash, etc.

If it's the second case, then this is not as easy. Basically you will need to build a (query) condition tree, which describes your search condition. For q1 in your example, the tree is like: (id=123) AND (age=13), and for q2, the tree has one node: age=13. As you may already seen, each condition tree has its corresponding logic equation.

You will need to "compute" both equations, so as to get the factors, and use those factors to decide if one equation is the subset of the other. This may not be easy, but good luck. :)

Posted by: 美人她爹 on December 15, 2009 9:56 AM

richard, it IS a data structure question, and I have bothered many people to help on it and have dragged Wendy and other people into the problem challenge, but till now, I cannot find a good solution. For simple scenario, like either Q1, or Q2 is a plain query (without AND or OR), it is relatively simple, but if both Q1 and Q2 is very complicated query structure (several level of trees), and there are RangeQuery involved (like Q1 = age > 13 AND height > 170, Q2 = age > 11 AND height > 168), it can be hard.

Song, it is the second case - not a set, just a "query".

I am still working on it.

Posted by: Jian Shuo Wang on December 15, 2009 10:46 PM

Now I know the problem, there're systematic solutions.

Let's try to formally define your problem, let me know if I'm over-simplify it.

Define a query on a list of columns, where each column has a name, and each column's value range is from negative infinity to positive infinity. The query is an expression tree, where the non-leaf nodes are boolean expressions: NOT, AND and OR; and the leaf nodes are comparing expressions: (name comparator value), where name is the column name, comparator is >,=, there is no difference :)

Now let's deal with non-leaf nodes, first eliminate all NOT operators, then work with those ANDs and ORs until your express is a group of OR-ed factors, like this:

factor1 OR factor2 OR ...

and each factor is a list of AND-ed comparing expressions, like this:

(comp1.1 AND comp1.2 AND comp1.3 AND ...) OR (comp2.1 AND comp2.2 AND comp2.3 AND ...) OR ...

Those steps are standard boolean operations.

The 3rd step is to convert comparing expressions into value ranges, i.e.

column1 > 12 -> column1:(12, +infinity)
column2 column2:(-infinity, 24]

Now comes the tricky part: Make sure each column name appears in each factor once, and only once. Basically, you want each factor look like this:

(column1:(12,+infinity) AND column2:[53,66) AND column3:[42,+infinity))

If you found some column names are missing, it means their value rangs are (-infinity, +infinity), add their ranges to each factor that do not have their ranges.

Do the same thing for both your queries, now to decide if Q2 is a subset of Q1, for each factor in Q2, see if it is a subset of any factor in Q1, that's it.

All those steps can be done using a program.

If you are thinking visually (like myself), a table with N columns is actually a N-dimension space, and a query defines a list of portions in that space. And your problem is actually asking: can we fit the portion Q2 defines, into Q1's portion?

Posted by: 美人她爹 on December 16, 2009 1:52 AM

This problem is proved to be a NP-Hard problem, which is believed that there is no polynomial algorithm~

I wrote a post about this: http://zhiqiang.org/blog/science/computer-science/database-query-is-np-hard.html

Posted by: zhang on December 23, 2009 9:55 AM

jianshuo,

this is, in database term, the query containment problem (see http://en.wikipedia.org/wiki/Conjunctive_query#Formal_properties_of_conjunctive_queries). it is undecidable for relational algebra (and thus SQL) and np-complete for conjunctive queries. so you should try to look for heuristics or limit your query scenarios.

Posted by: congyu on December 27, 2009 9:29 PM

Hi Jianshuo,

I'm not sure if I have misunderstood the question, but to look at it simply, q1 has to be strictly logically stronger than q2. One would require to formally verify such property before making such claim. Are you after an actual implementation to test for such property?

@Song: converting an arbitrary boolean expression to DNF is NP-hard. Otherwise, a propositional SAT-solver can simply convert the problem to DNF and checking satisfiability would be trivial.

Cheers,
Jason

Posted by: dijkstrapuppy on January 14, 2010 6:52 PM

To tell the truth I'm glad to find out that I'm not the one. You see, I'm only a beginner in programming. As for me, I'm learning programming on my own. I found a lot of useful books at the rapidshare http://rapid4me.com , maybe they can help you too. Usually when I do not know anything I use books, then websites and only then I ask my friend for help. You know, inthe process of searching I may not find what I've been looking for but I find out a lot of new and it's great..

Posted by: Sam on January 16, 2010 2:36 AM

Not really familiar with PHP but more with ASP.

Posted by: Dreamer on March 16, 2010 12:15 PM

Check some specialized coders' social networks or facebook groups, for sure you'll finв some info there or search for some e-books on file search engines. good luck...

Posted by: rapidok on April 29, 2010 1:32 AM

dear sir
I'm not sure if I have misunderstood the question, but to look at it simply, q1 has to be strictly logically stronger than q2. One would require to formally verify such property before making such claim. Are you after an actual implementation to test for such property?

Posted by: Trailers for Sale on May 23, 2010 5:24 AM

dear sir
I'm not sure if I have misunderstood the question, but to look at it simply, q1 has to be strictly logically stronger than q2. One would require to formally verify such property before making such claim. Are you after an actual implementation to test for such property?

Posted by: Trailers for Sale on May 23, 2010 5:27 AM


I'm not sure if I have misunderstood the question, but to look at it simply, q1 has to be strictly logically stronger than q2. One would require to formally verify such property before making such claim. Are you after an actual implementation to test for such property?

Posted by: Trailers for Sale on May 23, 2010 5:29 AM

This action proof to be a win, win situation. This is a true art work, which will be a success story. The above statement is seen to be contradictory. The situation is very critical and need an experience complainer to resolve it. This conversation is going no where. It’s lacking the place of a good leader to head the things to come out on conclusion.
=================================================
New Technology

Posted by: tanygeo on June 7, 2010 6:44 PM
Post a comment
Name:

Email Address: (will not show)

URL: (optional)

Comments:


It may take up to 30 seconds before the server returns a result. IP address recorded.
Remember my information?
<-- Please click POST only once
© 2001 - 2009 Jian Shuo Wang. All right reserved.