本文发表在 rolia.net 枫下论坛since there are 1M records, my guess is the keywords could have maybe 1000 words? so it will need about 120 bytes to save this big integer.
no on each records, you will need to do "and" on 120 bytes, probably equal to compare blank string to a 120 bytes string?
if the keywords set has 2000 words, then its 240 bytes, 1M times.
so this solution only work for small keywords set records.
for big set keywords, I think what you need to do is like a small full text index engine.
1. save all keywords to keywords array.
k[1]=air
k[2]=home ...
2. create a tree to save all records:
level 1:root->
level2 :2keywords records,3keywords records, 4 keywords records...
level3: first_keyword is 1, first_keyword is 2,.....
level4: all the records as a keywords array.
2333927 [1 3 4 5 ]
2333433 [1 3 4 7]
3. when start q search, convert keywords to number, "home based free job whatever" become 2 4 7 9, whatever is not in keywords list so automatically removed.
4. search tree on level2 2,3,4 branch only (this might limit records to 0.5M) , lever 3 with first_keyword is 2 4 7 9. if there are about 100 different first_keyword, then the result record is about 0.5M *4/100 = 20,000 records.
5. compare the 20,000 arrays with [2 4 7 9] to see if it's a sub set, since it's all number comparing with order, should be fast than string.更多精彩文章及讨论,请光临枫下论坛 rolia.net