Home | Links | Contact Us | More About Intellectual Property | Bookmark
Search patents:
Home Fault Detection Methods-and-apparatus-for-estimating-similarity

 Rescheduling conflicting issued instructions by delaying one conflicting instruction into the same pipeline stage as a third non-conflicting instruction
The present invention discloses the apparatus and methods necessary to correct the noted ...


 Moving data in and out of processor units using idle register/storage functional units
What is claimed is: 1. A computer processor functionally connected to an operating system, ...


 Matrix-vector multiplication apparatus
It is an object of the present invention to obtain high-precision matrix-vector multiplication ...


 Optical fibre handling apparatus and method of using same
The present invention provides apparatus for handling an optical fibre, the apparatus comprising a ...


 Method of and apparatus for controlling production process
The present invention has been devised in consideration of such circumstances, and the primary ...


 Switched network of optical buses
Having thus described my invention, what I claim as new, and desire to secure by Letters Patent is: ...


 Procedure for evaluating the metastability characteristics of a synchronizing device
We claim: 1. A method for evaluating the metastability characteristics of a synchronizing device ...


 Flexible parity generation circuit for intermittently generating a parity for a plurality of data channels in a redundant array of storage units
The invention comprises a redundant array computer system having a high-speed CPU bus and lower-...


 Ultraviolet solid state laser, method of using same and laser surgery apparatus
According to the invention, a solid state laser system producing coherent radiations at deep ...


 Error detection and correction system for long burst errors
The mathematical operations involving Reed-Solomon code ECC redundancy symbols are performed, ...


 Methods and apparatus for estimating similarity

Details
Inventors: Charikar, Moses Samson;
Assignee: Google, Inc. (Mountain View, CA)
Primary Examiner: Robinson; Greta
Assistant Examiner: Dodd, Jr.; Harold E.
Attorney, Agent or Firm: Harrity Snyder LLP

A similarity engine generates compact representations of objects called sketches. Sketches of different objects can be compared to determine the similarity between the two objects. The sketch for an object may be generated by creating a vector corresponding to the object, where each coordinate of the vector is associated with a corresponding weight. The weight associated with each coordinate in the vector is multiplied by a predetermined hashing vector to generate a product vector, and the product vectors are summed. The similarity engine may then generate a compact representation of the object based on the summed product vector.

DETAILED DESCRIPTION Systems and methods consistent with the present invention address this and other needs by providing a similarity engine that generates compact representations of objects that can be compared to determine similarity between the objects.
In one aspect, the present invention is a method for generating a compact representation of an object.
The method includes generating a vector corresponding to the object, each coordinate of the vector being associated with a corresponding weight and multiplying the weight associated with each coordinate in the vector by a corresponding hashing vector to generate a product vector.
The method further includes summing the product vectors and generating the compact representation of the object using the summed product vectors.
A second method consistent with the present invention includes creating a similarity sketch for each of first and second objects based on the application of a hashing function to a vector representation of the first and second objects.
Additionally, the method compares, on a bit-by-bit basis, the similarity sketches for the first and second objects, and generates a value defining the similarity between the first and second objects based on a correspondence in the bit-by-bit comparison.
Another aspect of the present invention is directed to a server that includes at least one processor, a database containing a group of objects, and a memory operatively coupled to the processor.
The memory stores program instructions that when executed by the processor, cause the processor to remove similar objects from the database by comparing similarity sketches of pairs of objects in the database and removing one of the objects of the pair when the comparison indicates that the pair of objects are more similar than a threshold level of similarity.
The processor generates the similarity sketches for each of the pair of objects based on an application of a hashing function to vector representations of the objects.
Yet another aspect of the invention is directed to a method for generating a compact representation of a first object



Related patents
  Attribute inductive data analysis
The invention generally features a system and method for determining which populations in an attribute characterized data set best maximize the significance of ...
  Restaurant directory and marketing system
Accordingly, it is the general purpose and object of the present invention to provide a system imbedded on a computer network for convenient searching of a restaurant. O...
  Arrangement for varying the rate of recording of information
What is claimed as new and desired to be protected by Letters Patent is set forth in the appended claims: 1. A method for changing the recording speed of a signal from a ...
  Generalized system for generating computer programs
OF THE PREFERRED EMBODIMENT As a general overview, in the present invention, four simultaneous processes function in compliance with two constraints to produce the ...
  Extensible entity management system including a dispatching kernel and modules which independently interpret and execute commands
The invention provides a new and improved control arrangement for controlling and monitoring a complex system, such as a distributed digital data processing system in ...
  Method and system for minimizing the effects of disruptive hardware actions in a data processing system
What is claimed is: 1. A method for minimizing the effects of disruptive hardware actions in a data processing system having an operating system, said method comprising: ...
  Method and apparatus for handling multiple level-triggered and edge-triggered interrupts
In one aspect of the present invention, a method is provided for determining whether the highest priority pending interrupt needing service is an active level-triggered ...
  Method and system for global optimization of device allocation
What is claimed is: 1. An automated, computer implemented method for allocating devices in order to satisfy requests for said devices, said method comprising the steps ...
  Method for extending a fourth generation programming language
What is claimed is: 1. A method for extending the functionality of an existing programming language comprising the steps of: (a) defining an extension class to perform ...
  Inhibit circuit for a differential amplifier
What is claimed is: 1. An inhibit circuit for a differential amplifier, comprising: a pair of non-additive combiners, each having an output connected to a respective ...

0.024

Archive: All patents - Links

Copyright (c)2006 Eipa-patents.org - All rights reserved