When I have to write algorithms related code, I feel the limitation of my knowledge. From time to time, I need to turn to Wendy for help. She does not like to code as I do, but she completed 6 years of under-graduate and graduate study in Computer Science from Jiaotong University. I didn’t. My major was Automation, not pure computer science.

I have been puzzled by this for three days. Is there any chance that my readers can help me to solve this puzzle?

I have used query builder to create two structured query tree: q1, and q2.

I want to know whether the set represented by q1 is a subset of q2 or not.

Example in PHP:

$q1 = new AndQuery(new Query(‘id’, 123), new Query(‘age’, 13));

$q2 = new Query(‘age’, 13);

echo $q1->isSubsetOf($q2);

I need the function of “isSubsetOf”.

In the example, it should return true, but the real world situation is much complicated than that.

The need for such a function is, I have several search database optimized for each use, and each of them is represented by a query. For any user issued query, I can compare the query and the existing datastores. If I can confirm that the query result is subset of a datastore, I can safely use it to speed up the return of the results.

Hi Jianshuo,

I think this is typical data structure question. I recommend you to read something about it, it help for programming. Return to this question, I think it depend on how “query” storted in PHP. If you determine it, you can choice suitable method to search like bisection method, bubble method and so on. If I misunderstand your question, please skip my comment :-)

richard

Jianshuo,

Is your Query some sort of results – i.e. a list of records you get from the DB as result of executing the query, or it is a structure describing the search condition?

If it’s the first case, then the problem can be described as: Given two data sets S1 and S2, decide if S1 is the subset of S2. Richard’s solution works, you will need to decide if every element in S1 is in S2. You can use different ways to optimize your solution, say if those elements are comparable, you can use sort; if those elements have a hash function, you can use hash, etc.

If it’s the second case, then this is not as easy. Basically you will need to build a (query) condition tree, which describes your search condition. For q1 in your example, the tree is like: (id=123) AND (age=13), and for q2, the tree has one node: age=13. As you may already seen, each condition tree has its corresponding logic equation.

You will need to “compute” both equations, so as to get the factors, and use those factors to decide if one equation is the subset of the other. This may not be easy, but good luck. :)

richard, it IS a data structure question, and I have bothered many people to help on it and have dragged Wendy and other people into the problem challenge, but till now, I cannot find a good solution. For simple scenario, like either Q1, or Q2 is a plain query (without AND or OR), it is relatively simple, but if both Q1 and Q2 is very complicated query structure (several level of trees), and there are RangeQuery involved (like Q1 = age > 13 AND height > 170, Q2 = age > 11 AND height > 168), it can be hard.

Song, it is the second case – not a set, just a “query”.

I am still working on it.

Now I know the problem, there’re systematic solutions.

Let’s try to formally define your problem, let me know if I’m over-simplify it.

Define a query on a list of columns, where each column has a name, and each column’s value range is from negative infinity to positive infinity. The query is an expression tree, where the non-leaf nodes are boolean expressions: NOT, AND and OR; and the leaf nodes are comparing expressions: (name comparator value), where name is the column name, comparator is >,=, there is no difference :)

Now let’s deal with non-leaf nodes, first eliminate all NOT operators, then work with those ANDs and ORs until your express is a group of OR-ed factors, like this:

factor1 OR factor2 OR …

and each factor is a list of AND-ed comparing expressions, like this:

(comp1.1 AND comp1.2 AND comp1.3 AND …) OR (comp2.1 AND comp2.2 AND comp2.3 AND …) OR …

Those steps are standard boolean operations.

The 3rd step is to convert comparing expressions into value ranges, i.e.

column1 > 12 -> column1:(12, +infinity)

column2 column2:(-infinity, 24]

Now comes the tricky part: Make sure each column name appears in each factor once, and only once. Basically, you want each factor look like this:

(column1:(12,+infinity) AND column2:[53,66) AND column3:[42,+infinity))

If you found some column names are missing, it means their value rangs are (-infinity, +infinity), add their ranges to each factor that do not have their ranges.

Do the same thing for both your queries, now to decide if Q2 is a subset of Q1, for each factor in Q2, see if it is a subset of any factor in Q1, that’s it.

All those steps can be done using a program.

If you are thinking visually (like myself), a table with N columns is actually a N-dimension space, and a query defines a list of portions in that space. And your problem is actually asking: can we fit the portion Q2 defines, into Q1’s portion?

This problem is proved to be a NP-Hard problem, which is believed that there is no polynomial algorithm~

I wrote a post about this: http://zhiqiang.org/blog/science/computer-science/database-query-is-np-hard.html

jianshuo,

this is, in database term, the query containment problem (see http://en.wikipedia.org/wiki/Conjunctive_query#Formal_properties_of_conjunctive_queries). it is undecidable for relational algebra (and thus SQL) and np-complete for conjunctive queries. so you should try to look for heuristics or limit your query scenarios.

Hi Jianshuo,

I’m not sure if I have misunderstood the question, but to look at it simply, q1 has to be strictly logically stronger than q2. One would require to formally verify such property before making such claim. Are you after an actual implementation to test for such property?

@Song: converting an arbitrary boolean expression to DNF is NP-hard. Otherwise, a propositional SAT-solver can simply convert the problem to DNF and checking satisfiability would be trivial.

Cheers,

Jason

To tell the truth I’m glad to find out that I’m not the one. You see, I’m only a beginner in programming. As for me, I’m learning programming on my own. I found a lot of useful books at the rapidshare http://rapid4me.com , maybe they can help you too. Usually when I do not know anything I use books, then websites and only then I ask my friend for help. You know, inthe process of searching I may not find what I’ve been looking for but I find out a lot of new and it’s great..

Not really familiar with PHP but more with ASP.

Check some specialized coders’ social networks or facebook groups, for sure you’ll finв some info there or search for some e-books on file search engines. good luck…

dear sir

I’m not sure if I have misunderstood the question, but to look at it simply, q1 has to be strictly logically stronger than q2. One would require to formally verify such property before making such claim. Are you after an actual implementation to test for such property?

dear sir

I’m not sure if I have misunderstood the question, but to look at it simply, q1 has to be strictly logically stronger than q2. One would require to formally verify such property before making such claim. Are you after an actual implementation to test for such property?

This action proof to be a win, win situation. This is a true art work, which will be a success story. The above statement is seen to be contradictory. The situation is very critical and need an experience complainer to resolve it. This conversation is going no where. It’s lacking the place of a good leader to head the things to come out on conclusion.

=================================================

New Technology