MOST PHOTOS - CASE STUDY

Mostphotos is a fast-growing stock photography marketplace that receives thousands of new user-uploaded photos every day. In order to connect clients with the images they’re looking for, Mostphotos had to find a way to consistently tag all these photos and make them easily searchable in their constantly growing 17 million images data bank.

CHALLENGES AND SOLUTIONS

INCONSISTENT
TAGGING

furry golden cute face mammal

Because most of MP’s photos come from individual photographers, the tags and metadata attached are often inconsistent. For example, an image of a dog might be tagged “dog” or it might be tagged “cute furry tailed barker” or even “cat”. They needed a solution that could “see” each image and apply the appropriate tags in an accurate and consistent way. Inconsistent tags not only affected the overall accuracy of the search but also made filtering and flagging inappropriate content based on keywords difficult.

Image Rank solves this using a keyword relevancy scoring model which is based on word vector model. This could be used to find the distance between any two words in the model, in other words, word-similarity and contextual relationships.

Image Rank also features Image Captioning which can automatically tag or recommend tags for an image. This effectively reduces the errors caused by inconsistent tags and improves search accuracy.
Additionally, image Rank also features an Image Classifier used to flag inappropriate content or recommend images based on user preferences.

HIGHLY
DYNAMIC

MP has a constantly expanding highly dynamic content with a growing user base. This means that there are thousands of new images from new photographers that needs to be ranked high fast enough, for them to make a sale. Also, the images higher in the rank should be rescored periodically to make sure the content stays fresh and relevant based on current trends and season.

Image Rank uses a multitude of machine learning algorithms to learn the trends of images from their various features such as age, views, sales, etc. Image Rank keeps track of a photographer’s past sales and uploads, which is used to score their future content. New users get a boost for their images which again The platform also uses the feedback from users in various forms such as search impressions, manual votes, reports of irrelevant images to constantly make the search better.

VALUE
EXTRACTION

MP is a marketplace so it serves both the photographers and the buyers. This model is based on how effectively a user finds the image she is looking for through our search. Image Rank uses various data points such as views, sales, contributor’s stats to estimate a quality score for an image. Additionally to capture the dynamics of the image, the rate of growth in views, sales and search impressions also play an important role and as a feedback in estimating the score of an image.

Image Rank’s word embedding based keyword relevancy scoring model ensures the search results always feature the most relevant images to the user’s search terms.

There is a manual vote system where the admins/ moderators can up-vote or down-vote image or a collection of images. In addition to facilitating the moderators to control the ranks of the image this way, they can also train Image rank system to find new patterns in the voted images and a new trend can be learned.

HIGH
VOLUME

The sheer size of highly dynamic MP’s Image bank makes it difficult to process, score, rescore and rank images. Adding to this complexity, MP also operates in all of Scandinavia, hence the keyword scoring model processes keywords in 5 Languages.

Image Rank uses a method for prioritizing the scoring of images based on their activity or search impressions. There is also an applied tradeoff in prioritizing between the frequency of rescoring older images and scoring new images.

OBSOLETE SYSTEM
-LIMITED DOCUMENTATION

MP’s old scoring system was obsolete with no updates in their scores for its images and taking longer to score new images. This was mainly because the system is not compute optimized, not scalable and was built at a time when the size of the image bank was fairly small. The system was also lacking documentation and most of the scoring logic was like a black box which was difficult to tweak, maintain or develop new enhancements to cater to the growing needs of the site.