{"id":7240,"date":"2024-07-18T14:58:15","date_gmt":"2024-07-18T06:58:15","guid":{"rendered":"https:\/\/aict.nkust.edu.tw\/digitrans\/?p=7240"},"modified":"2024-12-13T21:11:39","modified_gmt":"2024-12-13T13:11:39","slug":"%e5%a6%82%e4%bd%95%e5%b0%87-gpt4o-mini-%e8%88%87-rag-%e7%b5%90%e5%90%88%e5%89%b5%e5%bb%ba%e6%9c%8d%e8%a3%9d%e5%aa%92%e4%ba%ba%e6%87%89%e7%94%a8%e7%a8%8b%e5%bc%8f","status":"publish","type":"post","link":"https:\/\/aict.nkust.edu.tw\/digitrans\/?p=7240","title":{"rendered":"\u5982\u4f55\u5c07 GPT-4 Mini \u7d50\u5408 RAG \u4f86\u6253\u9020\u4e00\u500b\u670d\u88dd\u642d\u914d\u61c9\u7528"},"content":{"rendered":"\n<p>2024-07-18 | Teodora Musatoiu<\/p>\n\n\n\n<p>\u6b61\u8fce\u4f86\u5230\u670d\u88dd\u642d\u914d\u61c9\u7528 Jupyter Notebook\uff01\u672c\u5c08\u6848\u5c55\u793a\u4e86 GPT-4o mini \u6a21\u578b\u5728\u5206\u6790\u670d\u88dd\u5716\u7247\u4e26\u63d0\u53d6\u95dc\u9375\u7279\u5fb5\uff08\u5982\u984f\u8272\u3001\u98a8\u683c\u548c\u985e\u578b\uff09\u65b9\u9762\u7684\u5f37\u5927\u529f\u80fd\u3002\u6211\u5011\u7684\u61c9\u7528\u6838\u5fc3\u4f9d\u8cf4\u65bc\u9019\u6b3e\u7531 OpenAI \u958b\u767c\u7684\u5148\u9032\u5716\u50cf\u5206\u6790\u6a21\u578b\uff0c\u4f7f\u6211\u5011\u80fd\u5920\u6e96\u78ba\u8b58\u5225\u8f38\u5165\u670d\u88dd\u7684\u7279\u5fb5\u3002<\/p>\n\n\n\n<p>GPT-4o mini \u662f\u4e00\u6b3e\u7d50\u5408\u81ea\u7136\u8a9e\u8a00\u8655\u7406\u8207\u5716\u50cf\u8b58\u5225\u7684\u5c0f\u578b\u6a21\u578b\uff0c\u80fd\u5920\u57fa\u65bc\u6587\u5b57\u548c\u8996\u89ba\u8f38\u5165\u4f86\u7406\u89e3\u548c\u751f\u6210\u56de\u61c9\uff0c\u4e14\u5177\u5099\u4f4e\u5ef6\u9072\u7684\u7279\u9ede\u3002<\/p>\n\n\n\n<p>\u57fa\u65bc GPT-4o mini \u6a21\u578b\u7684\u80fd\u529b\uff0c\u6211\u5011\u4f7f\u7528\u4e86\u4e00\u500b\u81ea\u8a02\u7684\u5339\u914d\u7b97\u6cd5\u4ee5\u53ca RAG \u6280\u8853\u4f86\u641c\u5c0b\u77e5\u8b58\u5eab\u4e2d\u8207\u8b58\u5225\u7279\u5fb5\u76f8\u7b26\u7684\u9805\u76ee\u3002\u8a72\u7b97\u6cd5\u8003\u616e\u4e86\u984f\u8272\u642d\u914d\u8207\u98a8\u683c\u4e00\u81f4\u6027\u7b49\u56e0\u7d20\uff0c\u70ba\u7528\u6236\u63d0\u4f9b\u5408\u9069\u7684\u63a8\u85a6\u3002\u900f\u904e\u9019\u500b Notebook\uff0c\u6211\u5011\u5e0c\u671b\u5c55\u793a\u9019\u4e9b\u6280\u8853\u5728\u6253\u9020\u670d\u88dd\u63a8\u85a6\u7cfb\u7d71\u4e2d\u7684\u5be6\u969b\u61c9\u7528\u3002<\/p>\n\n\n\n<p>\u7d50\u5408 GPT-4o mini \u8207 RAG\uff08\u6aa2\u7d22\u589e\u5f37\u751f\u6210\uff09\u7684\u512a\u52e2\u6709\u4ee5\u4e0b\u5e7e\u9ede\uff1a<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>\u4e0a\u4e0b\u6587\u7406\u89e3<\/strong>\uff1aGPT-4o mini \u53ef\u4ee5\u5206\u6790\u8f38\u5165\u5f71\u50cf\u4e26\u7406\u89e3\u4e0a\u4e0b\u6587\uff0c\u4f8b\u5982\u6240\u63cf\u7e6a\u7684\u7269\u4ef6\u3001\u5834\u666f\u548c\u6d3b\u52d5\u3002\u9019\u6a23\u53ef\u4ee5\u5728\u5404\u500b\u9818\u57df\uff08\u7121\u8ad6\u662f\u5ba4\u5167\u8a2d\u8a08\u3001\u70f9\u98ea\u6216\u6559\u80b2\uff09\u63d0\u4f9b\u66f4\u6e96\u78ba\u3001\u66f4\u76f8\u95dc\u7684\u5efa\u8b70\u6216\u8cc7\u8a0a\u3002<\/li>\n\n\n\n<li><strong>\u8c50\u5bcc\u7684\u77e5\u8b58\u5eab<\/strong>\uff1aRAG \u5c07 GPT-4 \u7684\u751f\u6210\u529f\u80fd\u8207\u53ef\u5b58\u53d6\u8de8\u4e0d\u540c\u9818\u57df\u7684\u5927\u91cf\u8cc7\u8a0a\u7684\u6aa2\u7d22\u5143\u4ef6\u7d50\u5408\u3002\u9019\u610f\u5473\u8457\u8a72\u7cfb\u7d71\u53ef\u4ee5\u6839\u64da\u5f9e\u6b77\u53f2\u4e8b\u5be6\u5230\u79d1\u5b78\u6982\u5ff5\u7684\u5ee3\u6cdb\u77e5\u8b58\u63d0\u4f9b\u5efa\u8b70\u6216\u898b\u89e3\u3002<\/li>\n\n\n\n<li><strong>\u5b9a\u5236<\/strong>\uff1a\u8a72\u65b9\u6cd5\u5141\u8a31\u8f15\u9b06\u5b9a\u5236\uff0c\u4ee5\u6eff\u8db3\u5404\u7a2e\u61c9\u7528\u7a0b\u5f0f\u4e2d\u7684\u7279\u5b9a\u7528\u6236\u9700\u6c42\u6216\u504f\u597d\u3002\u7121\u8ad6\u662f\u6839\u64da\u4f7f\u7528\u8005\u7684\u85dd\u8853\u54c1\u5473\u5ba2\u88fd\u5316\u5efa\u8b70\uff0c\u9084\u662f\u6839\u64da\u5b78\u751f\u7684\u5b78\u7fd2\u7a0b\u5ea6\u63d0\u4f9b\u6559\u80b2\u5167\u5bb9\uff0c\u7cfb\u7d71\u90fd\u53ef\u4ee5\u9069\u61c9\u63d0\u4f9b\u500b\u4eba\u5316\u7684\u9ad4\u9a57\u3002<\/li>\n<\/ol>\n\n\n\n<p>\u7e3d\u9ad4\u800c\u8a00\uff0cGPT-4o mini + RAG \u65b9\u6cd5\u5229\u7528\u57fa\u65bc\u751f\u6210\u548c\u6aa2\u7d22\u7684\u4eba\u5de5\u667a\u6167\u6280\u8853\u7684\u512a\u52e2\uff0c\u70ba\u5404\u7a2e\u6642\u5c1a\u76f8\u95dc\u61c9\u7528\u63d0\u4f9b\u4e86\u5feb\u901f\u3001\u5f37\u5927\u4e14\u9748\u6d3b\u7684\u89e3\u6c7a\u65b9\u6848\u3002<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"environment-setup\">\u74b0\u5883\u8a2d\u5b9a<\/h3>\n\n\n\n<p>\u9996\u5148\uff0c\u6211\u5011\u5c07\u5b89\u88dd\u5fc5\u8981\u7684\u5957\u4ef6\uff0c\u7136\u5f8c\u5c0e\u5165\u76f8\u95dc\u7684\u51fd\u5f0f\u5eab\uff0c\u4e26\u64b0\u5beb\u4e00\u4e9b\u4e4b\u5f8c\u6703\u7528\u5230\u7684\u5de5\u5177\u51fd\u5f0f\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>%pip install openai <em>--<\/em>quiet\n%pip install tenacity <em>--<\/em>quiet\n%pip install tqdm <em>--<\/em>quiet\n%pip install numpy <em>--<\/em>quiet\n%pip install typing <em>--<\/em>quiet\n%pip install tiktoken <em>--<\/em>quiet\n%pip install concurrent <em>--<\/em>quiet<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>import pandas as pd\nimport numpy as np\nimport json\nimport ast\nimport tiktoken\nimport concurrent\nfrom openai import OpenAI\nfrom tqdm import tqdm\nfrom tenacity import retry, wait_random_exponential, stop_after_attempt\nfrom IPython.display import Image, display, HTML\nfrom typing import List\n\nclient = OpenAI()\n\nGPT_MODEL = \"gpt-4o-mini\"\nEMBEDDING_MODEL = \"text-embedding-3-large\"\nEMBEDDING_COST_PER_1K_TOKENS = 0.00013<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"creating-the-embeddings\">\u5efa\u7acb\u5d4c\u5165<\/h3>\n\n\n\n<p>\u6211\u5011\u73fe\u5728\u5c07\u900f\u904e\u9078\u64c7\u8cc7\u6599\u5eab\u4e26\u70ba\u5176\u7522\u751f\u5d4c\u5165\u4f86\u5efa\u7acb\u77e5\u8b58\u5eab\u3002\u6211\u6b63\u5728<code>sample_styles.csv<\/code>\u8cc7\u6599\u8cc7\u6599\u593e\u4e2d\u4f7f\u7528\u8a72\u6587\u4ef6\u3002\u9019\u662f\u5305\u542b\u5c08\u6848\u7684\u66f4\u5927\u8cc7\u6599\u96c6\u7684\u7bc4\u4f8b<code>~44K<\/code>\u3002\u6b64\u6b65\u9a5f\u4e5f\u53ef\u4ee5\u900f\u904e\u4f7f\u7528\u73fe\u6210\u7684\u5411\u91cf\u8cc7\u6599\u5eab\u4f86\u53d6\u4ee3\u3002\u4f8b\u5982\uff0c\u60a8\u53ef\u4ee5\u6309\u7167\u5176\u4e2d\u4e00\u672c\u98df\u8b5c\u4f86\u8a2d\u5b9a\u5411\u91cf\u8cc7\u6599\u5eab\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>styles_filepath = \"data\/sample_clothes\/sample_styles.csv\"\nstyles_df = pd.read_csv(styles_filepath, on_bad_lines='skip')\nprint(styles_df.head())\nprint(\"Opened dataset successfully. Dataset has {} items of clothing.\".format(len(styles_df)))<\/code><\/pre>\n\n\n\n<p>\u73fe\u5728\u6211\u5011\u5c07\u70ba\u6574\u500b\u8cc7\u6599\u96c6\u7522\u751f\u5d4c\u5165\u3002\u6211\u5011\u53ef\u4ee5\u4e26\u884c\u57f7\u884c\u9019\u4e9b\u5d4c\u5165\uff0c\u4ee5\u78ba\u4fdd\u8173\u672c\u53ef\u4ee5\u64f4\u5c55\u5230\u66f4\u5927\u7684\u8cc7\u6599\u96c6\u3002\u900f\u904e\u9019\u7a2e\u908f\u8f2f\uff0c\u70ba\u5b8c\u6574\u689d\u76ee\u8cc7\u6599\u96c6\u5efa\u7acb\u5d4c\u5165\u7684\u6642\u9593<code>44K<\/code>\u5f9e\u7d04 4 \u5c0f\u6642\u6e1b\u5c11\u5230\u7d04 2-3 \u5206\u9418\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>## Batch Embedding Logic\n\n# Simple function to take in a list of text objects and return them as a list of embeddings\n@retry(wait=wait_random_exponential(min=1, max=40), stop=stop_after_attempt(10))\ndef get_embeddings(input: List):\n    response = client.embeddings.create(\n        input=input,\n        model=EMBEDDING_MODEL\n    ).data\n    return &#91;data.embedding for data in response]\n\n\n# Splits an iterable into batches of size n.\ndef batchify(iterable, n=1):\n    l = len(iterable)\n    for ndx in range(0, l, n):\n        yield iterable&#91;ndx : min(ndx + n, l)]\n     \n\n# Function for batching and parallel processing the embeddings\ndef embed_corpus(\n    corpus: List&#91;str],\n    batch_size=64,\n    num_workers=8,\n    max_context_len=8191,\n):\n    # Encode the corpus, truncating to max_context_len\n    encoding = tiktoken.get_encoding(\"cl100k_base\")\n    encoded_corpus = &#91;\n        encoded_article&#91;:max_context_len] for encoded_article in encoding.encode_batch(corpus)\n    ]\n\n    # Calculate corpus statistics: the number of inputs, the total number of tokens, and the estimated cost to embed\n    num_tokens = sum(len(article) for article in encoded_corpus)\n    cost_to_embed_tokens = num_tokens \/ 1000 * EMBEDDING_COST_PER_1K_TOKENS\n    print(\n        f\"num_articles={len(encoded_corpus)}, num_tokens={num_tokens}, est_embedding_cost={cost_to_embed_tokens:.2f} USD\"\n    )\n\n    # Embed the corpus\n    with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:\n        \n        futures = &#91;\n            executor.submit(get_embeddings, text_batch)\n            for text_batch in batchify(encoded_corpus, batch_size)\n        ]\n\n        with tqdm(total=len(encoded_corpus)) as pbar:\n            for _ in concurrent.futures.as_completed(futures):\n                pbar.update(batch_size)\n\n        embeddings = &#91;]\n        for future in futures:\n            data = future.result()\n            embeddings.extend(data)\n\n        return embeddings\n    \n\n# Function to generate embeddings for a given column in a DataFrame\ndef generate_embeddings(df, column_name):\n    # Initialize an empty list to store embeddings\n    descriptions = df&#91;column_name].astype(str).tolist()\n    embeddings = embed_corpus(descriptions)\n\n    # Add the embeddings as a new column to the DataFrame\n    df&#91;'embeddings'] = embeddings\n    print(\"Embeddings created successfully.\")<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"two-options-for-creating-the-embeddings\">\u5efa\u7acb\u5d4c\u5165\u7684\u5169\u500b\u9078\u9805\uff1a<\/h4>\n\n\n\n<p>\u4e0b\u4e00\u884c\u7a0b\u5f0f\u78bc\u5c07\u70ba\u6a23\u672c\u670d\u88dd\u6578\u64da\u96c6\u5275\u5efa\u5d4c\u5165\u3002\u9019\u500b\u904e\u7a0b\u5927\u7d04\u9700\u8981 0.02 \u79d2\u8655\u7406\u6642\u9593\uff0c\u4e26\u4e14\u9700\u8981\u7d04 30 \u79d2\u5c07\u7d50\u679c\u5beb\u5165\u672c\u5730 .csv \u6587\u4ef6\u3002\u8a72\u904e\u7a0b\u4f7f\u7528\u7684\u662f\u6211\u5011\u7684 <code>text_embedding_3_large<\/code> \u6a21\u578b\uff0c\u6bcf 1,000 \u500b token \u6536\u8cbb $0.00013\u3002\u7531\u65bc\u6578\u64da\u96c6\u7d04\u6709 1,000 \u689d\u8a18\u9304\uff0c\u56e0\u6b64\u9019\u500b\u64cd\u4f5c\u7684\u6210\u672c\u7d04\u70ba $0.001\u3002\u5982\u679c\u9078\u64c7\u4f7f\u7528\u5b8c\u6574\u7684 44,000 \u689d\u8a18\u9304\u6578\u64da\u96c6\uff0c\u9019\u500b\u64cd\u4f5c\u5c07\u9700\u8981 2-3 \u5206\u9418\u8655\u7406\u6642\u9593\uff0c\u6210\u672c\u7d04\u70ba $0.07\u3002<\/p>\n\n\n\n<p>\u5982\u679c\u60a8\u4e0d\u60f3\u81ea\u884c\u5275\u5efa\u5d4c\u5165\uff0c\u6211\u5011\u5c07\u4f7f\u7528\u9810\u5148\u8a08\u7b97\u7684\u5d4c\u5165\u6578\u64da\u96c6\u3002\u60a8\u53ef\u4ee5\u8df3\u904e\u6b64\u55ae\u5143\uff0c\u4e26\u53d6\u6d88\u8a3b\u89e3\u4e0b\u4e00\u55ae\u5143\u4e2d\u7684\u7a0b\u5f0f\u78bc\u4ee5\u8f09\u5165\u9810\u5148\u8a08\u7b97\u7684\u5411\u91cf\u3002\u6b64\u64cd\u4f5c\u5927\u7d04\u9700\u8981 1 \u5206\u9418\u5c07\u6240\u6709\u6578\u64da\u8f09\u5165\u5230\u8a18\u61b6\u9ad4\u4e2d\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>generate_embeddings(styles_df, 'productDisplayName')\nprint(\"Writing embeddings to file ...\")\nstyles_df.to_csv('data\/sample_clothes\/sample_styles_with_embeddings.csv', index=False)\nprint(\"Embeddings successfully stored in sample_styles_with_embeddings.csv\")<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code># styles_df = pd.read_csv('data\/sample_clothes\/sample_styles_with_embeddings.csv', on_bad_lines='skip')\n\n# # Convert the 'embeddings' column from string representations of lists to actual lists of floats\n# styles_df&#91;'embeddings'] = styles_df&#91;'embeddings'].apply(lambda x: ast.literal_eval(x))\n\nprint(styles_df.head())\nprint(\"Opened dataset successfully. Dataset has {} items of clothing along with their embeddings.\".format(len(styles_df)))<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"building-the-matching-algorithm\">\u5efa\u69cb\u5339\u914d\u6f14\u7b97\u6cd5<\/h3>\n\n\n\n<p>\u5728\u672c\u7bc0\u4e2d\uff0c\u6211\u5011\u5c07\u958b\u767c\u4e00\u500b\u9918\u5f26\u76f8\u4f3c\u5ea6\u6aa2\u7d22\u7b97\u6cd5\uff0c\u7528\u65bc\u5728\u6578\u64da\u6846\u4e2d\u67e5\u627e\u76f8\u4f3c\u7684\u9805\u76ee\u3002\u6211\u5011\u5c07\u4f7f\u7528\u81ea\u8a02\u7684\u9918\u5f26\u76f8\u4f3c\u5ea6\u51fd\u5f0f\u4f86\u9054\u6210\u6b64\u76ee\u7684\u3002\u96d6\u7136 sklearn \u51fd\u5f0f\u5eab\u63d0\u4f9b\u5167\u5efa\u7684\u9918\u5f26\u76f8\u4f3c\u5ea6\u51fd\u5f0f\uff0c\u4f46\u7531\u65bc\u6700\u8fd1\u7684 SDK \u66f4\u65b0\u5c0e\u81f4\u76f8\u5bb9\u6027\u554f\u984c\uff0c\u56e0\u6b64\u6211\u5011\u6c7a\u5b9a\u5be6\u4f5c\u81ea\u5df1\u7684\u6a19\u6e96\u9918\u5f26\u76f8\u4f3c\u5ea6\u8a08\u7b97\u3002<\/p>\n\n\n\n<p>\u5982\u679c\u60a8\u5df2\u7d93\u8a2d\u7f6e\u4e86\u5411\u91cf\u8cc7\u6599\u5eab\uff0c\u53ef\u4ee5\u8df3\u904e\u9019\u500b\u6b65\u9a5f\u3002\u5927\u591a\u6578\u6a19\u6e96\u8cc7\u6599\u5eab\u90fd\u9644\u5e36\u81ea\u5df1\u7684\u641c\u5c0b\u529f\u80fd\uff0c\u53ef\u4ee5\u7c21\u5316\u672c\u6307\u5357\u5f8c\u7e8c\u7684\u64cd\u4f5c\u3002\u4e0d\u904e\uff0c\u6211\u5011\u5e0c\u671b\u5c55\u793a\u5982\u4f55\u6839\u64da\u7279\u5b9a\u9700\u6c42\u4f86\u5b9a\u5236\u5339\u914d\u7b97\u6cd5\uff0c\u4f8b\u5982\u8a2d\u5b9a\u7279\u5b9a\u7684\u9580\u6abb\u503c\u6216\u8fd4\u56de\u6307\u5b9a\u6578\u91cf\u7684\u5339\u914d\u7d50\u679c\u3002<\/p>\n\n\n\n<p><code>find_similar_items<\/code> \u51fd\u5f0f\u63a5\u53d7\u56db\u500b\u53c3\u6578\uff1a<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>embedding<\/code>\uff1a\u6211\u5011\u60f3\u8981\u627e\u5230\u5339\u914d\u9805\u7684\u5d4c\u5165\u3002<\/li>\n\n\n\n<li><code>embeddings<\/code>\uff1a\u7528\u65bc\u641c\u5c0b\u6700\u4f73\u5339\u914d\u7684\u5d4c\u5165\u6e05\u55ae\u3002<\/li>\n\n\n\n<li><code>threshold<\/code>\uff08\u53ef\u9078\uff09\uff1a\u6b64\u53c3\u6578\u6307\u5b9a\u5339\u914d\u88ab\u8996\u70ba\u6709\u6548\u7684\u6700\u5c0f\u76f8\u4f3c\u5ea6\u5206\u6578\u3002\u8f03\u9ad8\u7684\u95be\u503c\u6703\u5c0e\u81f4\u66f4\u63a5\u8fd1\uff08\u66f4\u597d\uff09\u7684\u5339\u914d\uff0c\u800c\u8f03\u4f4e\u7684\u95be\u503c\u5141\u8a31\u8fd4\u56de\u66f4\u591a\u9805\u76ee\uff0c\u5118\u7ba1\u5b83\u5011\u53ef\u80fd\u8207\u521d\u59cb \u4e0d\u90a3\u9ebc\u7dca\u5bc6\u5339\u914d<code>embedding<\/code>\u3002<\/li>\n\n\n\n<li><code>top_k<\/code>\uff08\u53ef\u9078\uff09\uff1a\u6b64\u53c3\u6578\u78ba\u5b9a\u8fd4\u56de\u8d85\u904e\u7d66\u5b9a\u95be\u503c\u7684\u9805\u76ee\u6578\u3002\u9019\u4e9b\u5c07\u662f\u6240\u63d0\u4f9b\u7684\u5f97\u5206\u6700\u9ad8\u7684\u6bd4\u8cfd<code>embedding<\/code>\u3002<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>def cosine_similarity_manual(vec1, vec2):\n    \"\"\"Calculate the cosine similarity between two vectors.\"\"\"\n    vec1 = np.array(vec1, dtype=float)\n    vec2 = np.array(vec2, dtype=float)\n\n\n    dot_product = np.dot(vec1, vec2)\n    norm_vec1 = np.linalg.norm(vec1)\n    norm_vec2 = np.linalg.norm(vec2)\n    return dot_product \/ (norm_vec1 * norm_vec2)\n\n\ndef find_similar_items(input_embedding, embeddings, threshold=0.5, top_k=2):\n    \"\"\"Find the most similar items based on cosine similarity.\"\"\"\n    \n    # Calculate cosine similarity between the input embedding and all other embeddings\n    similarities = &#91;(index, cosine_similarity_manual(input_embedding, vec)) for index, vec in enumerate(embeddings)]\n    \n    # Filter out any similarities below the threshold\n    filtered_similarities = &#91;(index, sim) for index, sim in similarities if sim &gt;= threshold]\n    \n    # Sort the filtered similarities by similarity score\n    sorted_indices = sorted(filtered_similarities, key=lambda x: x&#91;1], reverse=True)&#91;:top_k]\n\n    # Return the top-k most similar items\n    return sorted_indices<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>def find_matching_items_with_rag(df_items, item_descs):\n   \"\"\"Take the input item descriptions and find the most similar items based on cosine similarity for each description.\"\"\"\n   \n   # Select the embeddings from the DataFrame.\n   embeddings = df_items&#91;'embeddings'].tolist()\n\n   \n   similar_items = &#91;]\n   for desc in item_descs:\n      \n      # Generate the embedding for the input item\n      input_embedding = get_embeddings(&#91;desc])\n    \n      # Find the most similar items based on cosine similarity\n      similar_indices = find_similar_items(input_embedding, embeddings, threshold=0.6)\n      similar_items += &#91;df_items.iloc&#91;i] for i in similar_indices]\n    \n   return similar_items<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"analysis-module\">\u5206\u6790\u6a21\u7d44<\/h3>\n\n\n\n<p>\u5728\u672c\u6a21\u7d44\u4e2d\uff0c\u6211\u5011\u5229\u7528<code>gpt-4o-mini<\/code>\u5206\u6790\u8f38\u5165\u5f71\u50cf\u4e26\u63d0\u53d6\u91cd\u8981\u7279\u5fb5\uff0c\u4f8b\u5982\u8a73\u7d30\u63cf\u8ff0\u3001\u6a23\u5f0f\u548c\u985e\u578b\u3002\u5206\u6790\u662f\u900f\u904e\u7c21\u55ae\u7684 API \u547c\u53eb\u57f7\u884c\u7684\uff0c\u6211\u5011\u63d0\u4f9b\u7528\u65bc\u5206\u6790\u7684\u5716\u50cf\u7684 URL \u4e26\u8acb\u6c42\u6a21\u578b\u8b58\u5225\u76f8\u95dc\u7279\u5fb5\u3002<\/p>\n\n\n\n<p>\u70ba\u4e86\u78ba\u4fdd\u6a21\u578b\u8fd4\u56de\u6e96\u78ba\u7684\u7d50\u679c\uff0c\u6211\u5011\u5728\u63d0\u793a\u4e2d\u4f7f\u7528\u7279\u5b9a\u6280\u8853\uff1a<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>\u8f38\u51fa\u683c\u5f0f\u898f\u683c<\/strong>\uff1a\u6211\u5011\u6307\u793a\u6a21\u578b\u50b3\u56de\u5177\u6709\u9810\u5148\u5b9a\u7fa9\u7d50\u69cb\u7684 JSON \u5340\u584a\uff0c\u5176\u4e2d\u5305\u62ec\uff1a\n<ul class=\"wp-block-list\">\n<li><code>items<\/code>(str[])\uff1a\u5b57\u4e32\u5217\u8868\uff0c\u6bcf\u500b\u5b57\u4e32\u4ee3\u8868\u4e00\u4ef6\u8863\u670d\u7684\u7c21\u6f54\u6a19\u984c\uff0c\u5305\u62ec\u98a8\u683c\u3001\u984f\u8272\u548c\u6027\u5225\u3002\u9019\u4e9b\u6a19\u984c <code>productDisplayName<\/code> \u8207\u6211\u5011\u539f\u59cb\u8cc7\u6599\u5eab\u4e2d\u7684\u5c6c\u6027\u975e\u5e38\u76f8\u4f3c\u3002<\/li>\n\n\n\n<li><code>category<\/code>(str)\uff1a\u6700\u80fd\u4ee3\u8868\u7d66\u5b9a\u9805\u76ee\u7684\u985e\u5225\u3002\u6b64\u6a21\u578b<code>articleTypes<\/code>\u5f9e\u539f\u59cb\u6a23\u5f0f\u8cc7\u6599\u6846\u4e2d\u5b58\u5728\u7684\u6240\u6709\u552f\u4e00\u7684\u6e05\u55ae\u4e2d\u9032\u884c\u9078\u64c7\u3002<\/li>\n\n\n\n<li><code>gender<\/code>(str)\uff1a\u6307\u793a\u8a72\u9805\u76ee\u7684\u76ee\u6a19\u6027\u5225\u7684\u6a19\u7c64\u3002\u8a72\u6a21\u578b\u5f9e\u9078\u9805\u4e2d\u9032\u884c\u9078\u64c7<code>[Men, Women, Boys, Girls, Unisex]<\/code>\u3002<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>\u6e05\u6670\u7c21\u6f54\u7684\u8aaa\u660e<\/strong>\uff1a\n<ul class=\"wp-block-list\">\n<li>\u6211\u5011\u63d0\u4f9b\u4e86\u95dc\u65bc\u5c08\u6848\u6a19\u984c\u61c9\u5305\u542b\u54ea\u4e9b\u5167\u5bb9\u4ee5\u53ca\u8f38\u51fa\u683c\u5f0f\u61c9\u662f\u4ec0\u9ebc\u7684\u660e\u78ba\u8aaa\u660e\u3002\u8f38\u51fa\u61c9\u70ba JSON \u683c\u5f0f\uff0c\u4f46\u4e0d\u5305\u542b<code>json<\/code>\u6a21\u578b\u56de\u61c9\u901a\u5e38\u5305\u542b\u7684\u6a19\u7c64\u3002<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>\u4e00\u64ca\u7bc4\u4f8b<\/strong>\uff1a\n<ul class=\"wp-block-list\">\n<li>\u70ba\u4e86\u9032\u4e00\u6b65\u95e1\u660e\u9810\u671f\u8f38\u51fa\uff0c\u6211\u5011\u70ba\u6a21\u578b\u63d0\u4f9b\u4e86\u7bc4\u4f8b\u8f38\u5165\u63cf\u8ff0\u548c\u76f8\u61c9\u7684\u7bc4\u4f8b\u8f38\u51fa\u3002\u5118\u7ba1\u9019\u53ef\u80fd\u6703\u589e\u52a0\u4f7f\u7528\u7684\u4ee4\u724c\u6578\u91cf\uff08\u5f9e\u800c\u589e\u52a0\u547c\u53eb\u6210\u672c\uff09\uff0c\u4f46\u5b83\u6709\u52a9\u65bc\u6307\u5c0e\u6a21\u578b\u4e26\u5e36\u4f86\u66f4\u597d\u7684\u6574\u9ad4\u6548\u80fd\u3002<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<p>\u900f\u904e\u9075\u5faa\u9019\u7a2e\u7d50\u69cb\u5316\u65b9\u6cd5\uff0c\u6211\u5011\u7684\u76ee\u6a19\u662f\u5f9e<code>gpt-4o-mini<\/code>\u6a21\u578b\u4e2d\u7372\u53d6\u7cbe\u78ba\u4e14\u6709\u7528\u7684\u4fe1\u606f\uff0c\u4ee5\u4fbf\u9032\u4e00\u6b65\u5206\u6790\u4e26\u6574\u5408\u5230\u6211\u5011\u7684\u8cc7\u6599\u5eab\u4e2d\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def analyze_image(image_base64, subcategories):\n    response = client.chat.completions.create(\n        model=GPT_MODEL,\n        messages=&#91;\n            {\n            \"role\": \"user\",\n            \"content\": &#91;\n                {\n                \"type\": \"text\",\n                \"text\": \"\"\"Given an image of an item of clothing, analyze the item and generate a JSON output with the following fields: \"items\", \"category\", and \"gender\". \n                           Use your understanding of fashion trends, styles, and gender preferences to provide accurate and relevant suggestions for how to complete the outfit.\n                           The items field should be a list of items that would go well with the item in the picture. Each item should represent a title of an item of clothing that contains the style, color, and gender of the item.\n                           The category needs to be chosen between the types in this list: {subcategories}.\n                           You have to choose between the genders in this list: &#91;Men, Women, Boys, Girls, Unisex]\n                           Do not include the description of the item in the picture. Do not include the ```json ``` tag in the output.\n                           \n                           Example Input: An image representing a black leather jacket.\n\n                           Example Output: {\"items\": &#91;\"Fitted White Women's T-shirt\", \"White Canvas Sneakers\", \"Women's Black Skinny Jeans\"], \"category\": \"Jackets\", \"gender\": \"Women\"}\n                           \"\"\",\n                },\n                {\n                \"type\": \"image_url\",\n                \"image_url\": {\n                    \"url\": f\"data:image\/jpeg;base64,{image_base64}\",\n                },\n                }\n            ],\n            }\n        ]\n    )\n    # Extract relevant features from the response\n    features = response.choices&#91;0].message.content\n    return features<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"testing-the-prompt-with-sample-images\">\u4f7f\u7528\u7bc4\u4f8b\u5716\u50cf\u6e2c\u8a66\u63d0\u793a<\/h3>\n\n\n\n<p>\u70ba\u4e86\u8a55\u4f30\u63d0\u793a\u7684\u6709\u6548\u6027\uff0c\u8b93\u6211\u5011\u4f7f\u7528\u8cc7\u6599\u96c6\u4e2d\u9078\u64c7\u7684\u5716\u50cf\u8f09\u5165\u4e26\u6e2c\u8a66\u5b83\u3002\u6211\u5011\u5c07\u4f7f\u7528\u8a72<code>\"data\/sample_clothes\/sample_images\"<\/code>\u8cc7\u6599\u593e\u4e2d\u7684\u5716\u50cf\uff0c\u78ba\u4fdd\u5404\u7a2e\u6a23\u5f0f\u3001\u6027\u5225\u548c\u985e\u578b\u3002\u4ee5\u4e0b\u662f\u6240\u9078\u6a23\u672c\uff1a<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>2133.jpg<\/code>: \u7537\u58eb\u896f\u886b<\/li>\n\n\n\n<li><code>7143.jpg<\/code>: \u5973\u5f0f\u896f\u886b<\/li>\n\n\n\n<li><code>4226.jpg<\/code>: \u4f11\u9592\u7537\u58eb\u5370\u82b1T\u5379<\/li>\n<\/ul>\n\n\n\n<p>\u900f\u904e\u4f7f\u7528\u9019\u4e9b\u4e0d\u540c\u7684\u5716\u50cf\u6e2c\u8a66\u63d0\u793a\uff0c\u6211\u5011\u53ef\u4ee5\u8a55\u4f30\u5176\u5f9e\u4e0d\u540c\u985e\u578b\u7684\u670d\u88dd\u548c\u914d\u4ef6\u4e2d\u6e96\u78ba\u5206\u6790\u548c\u63d0\u53d6\u76f8\u95dc\u7279\u5fb5\u7684\u80fd\u529b\u3002<\/p>\n\n\n\n<p>\u6211\u5011\u9700\u8981\u4e00\u500b\u5be6\u7528\u51fd\u6578\u4f86\u5c07 .jpg \u5716\u7247\u7de8\u78bc\u70ba base64<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import base64\n\ndef encode_image_to_base64(image_path):\n    with open(image_path, 'rb') as image_file:\n        encoded_image = base64.b64encode(image_file.read())\n        return encoded_image.decode('utf-8')<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code># Set the path to the images and select a test image\nimage_path = \"data\/sample_clothes\/sample_images\/\"\ntest_images = &#91;\"2133.jpg\", \"7143.jpg\", \"4226.jpg\"]\n\n# Encode the test image to base64\nreference_image = image_path + test_images&#91;0]\nencoded_image = encode_image_to_base64(reference_image)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code># Select the unique subcategories from the DataFrame\nunique_subcategories = styles_df&#91;'articleType'].unique()\n\n# Analyze the image and return the results\nanalysis = analyze_image(encoded_image, unique_subcategories)\nimage_analysis = json.loads(analysis)\n\n# Display the image and the analysis results\ndisplay(Image(filename=reference_image))\nprint(image_analysis)<\/code><\/pre>\n\n\n\n<p>\u63a5\u4e0b\u4f86\uff0c\u6211\u5011\u8655\u7406\u5f71\u50cf\u5206\u6790\u7684\u8f38\u51fa\uff0c\u4e26\u4f7f\u7528\u5b83\u4f86\u904e\u6ffe\u548c\u986f\u793a\u8cc7\u6599\u96c6\u4e2d\u7684\u5339\u914d\u9805\u76ee\u3002\u9019\u662f\u7a0b\u5f0f\u78bc\u7684\u7d30\u5206\uff1a<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>\u63d0\u53d6\u5716\u50cf\u5206\u6790\u7d50\u679c<\/strong>\uff1a\u6211\u5011\u5f9e\u8a5e\u5178\u4e2d\u63d0\u53d6\u9805\u76ee\u63cf\u8ff0\u3001\u985e\u5225\u548c\u6027\u5225 <code>image_analysis<\/code>\u3002<\/li>\n\n\n\n<li><strong>\u904e\u6ffe\u8cc7\u6599\u96c6<\/strong>\uff1a\u6211\u5011\u904e\u6ffe <code>styles_df<\/code>DataFrame \u4ee5\u50c5\u5305\u542b\u8207\u5f71\u50cf\u5206\u6790\u4e2d\u7684\u6027\u5225\u76f8\u7b26\uff08\u6216\u7537\u5973\u7686\u5b9c\uff09\u7684\u9805\u76ee\uff0c\u4e26\u6392\u9664\u8207\u5206\u6790\u5f71\u50cf\u76f8\u540c\u985e\u5225\u7684\u9805\u76ee\u3002<\/li>\n\n\n\n<li><strong>\u5c0b\u627e\u5339\u914d\u9805\u76ee<\/strong>\uff1a\u6211\u5011\u4f7f\u7528\u8a72 <code>find_matching_items_with_rag<\/code> \u51fd\u6578\u5728\u904e\u6ffe\u5f8c\u7684\u8cc7\u6599\u96c6\u4e2d\u5c0b\u627e\u8207\u5f9e\u5206\u6790\u5716\u50cf\u4e2d\u63d0\u53d6\u7684\u63cf\u8ff0\u76f8\u7b26\u7684\u9805\u76ee\u3002<\/li>\n\n\n\n<li><strong>\u986f\u793a\u5339\u914d\u9805<\/strong>\uff1a\u6211\u5011\u5efa\u7acb\u4e00\u500b HTML \u5b57\u4e32\u4f86\u986f\u793a\u5339\u914d\u9805\u7684\u5716\u50cf\u3002\u6211\u5011\u4f7f\u7528\u5c08\u6848 ID \u5efa\u7acb\u5716\u50cf\u8def\u5f91\u4e26\u5c07\u6bcf\u500b\u5716\u50cf\u9644\u52a0\u5230 HTML \u5b57\u4e32\u3002\u6700\u5f8c\uff0c\u6211\u5011\u7528\u4f86<code>display(HTML(html))<\/code>\u6e32\u67d3\u7b46\u8a18\u672c\u4e2d\u7684\u5716\u50cf\u3002<\/li>\n<\/ol>\n\n\n\n<p>\u672c\u55ae\u5143\u6709\u6548\u5730\u793a\u7bc4\u5982\u4f55\u4f7f\u7528\u5f71\u50cf\u5206\u6790\u7d50\u679c\u4f86\u7be9\u9078\u8cc7\u6599\u96c6\u4e26\u76f4\u89c0\u5730\u986f\u793a\u8207\u5206\u6790\u5f71\u50cf\u7279\u5fb5\u76f8\u7b26\u7684\u9805\u76ee\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Extract the relevant features from the analysis\nitem_descs = image_analysis&#91;'items']\nitem_category = image_analysis&#91;'category']\nitem_gender = image_analysis&#91;'gender']\n\n\n# Filter data such that we only look through the items of the same gender (or unisex) and different category\nfiltered_items = styles_df.loc&#91;styles_df&#91;'gender'].isin(&#91;item_gender, 'Unisex'])]\nfiltered_items = filtered_items&#91;filtered_items&#91;'articleType'] != item_category]\nprint(str(len(filtered_items)) + \" Remaining Items\")\n\n# Find the most similar items based on the input item descriptions\nmatching_items = find_matching_items_with_rag(filtered_items, item_descs)\n\n# Display the matching items (this will display 2 items for each description in the image analysis)\nhtml = \"\"\npaths = &#91;]\nfor i, item in enumerate(matching_items):\n    item_id = item&#91;'id']\n        \n    # Path to the image file\n    image_path = f'data\/sample_clothes\/sample_images\/{item_id}.jpg'\n    paths.append(image_path)\n    html += f'&lt;img src=\"{image_path}\" style=\"display:inline;margin:1px\"\/&gt;'\n\n# Print the matching item description as a reminder of what we are looking for\nprint(item_descs)\n# Display the image\ndisplay(HTML(html))<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"guardrails\">\u4fdd\u8b77\u63aa\u65bd<\/h3>\n\n\n\n<p>\u5728\u4f7f\u7528\u50cf GPT-4o mini \u9019\u6a23\u7684\u5927\u578b\u8a9e\u8a00\u6a21\u578b\uff08LLMs\uff09\u6642\uff0c<strong>\u300c\u4fdd\u8b77\u63aa\u65bd\uff08guardrails\uff09\u300d<\/strong> \u6307\u7684\u662f\u70ba\u78ba\u4fdd\u6a21\u578b\u8f38\u51fa\u4fdd\u6301\u5728\u9810\u671f\u53c3\u6578\u6216\u754c\u9650\u5167\u800c\u8a2d\u7f6e\u7684\u6a5f\u5236\u6216\u6aa2\u67e5\u3002\u9019\u4e9b\u4fdd\u8b77\u63aa\u65bd\u5c0d\u65bc\u7dad\u6301\u6a21\u578b\u56de\u61c9\u7684\u8cea\u91cf\u548c\u76f8\u95dc\u6027\u975e\u5e38\u91cd\u8981\uff0c\u7279\u5225\u662f\u5728\u8655\u7406\u8907\u96dc\u6216\u5fae\u5999\u7684\u4efb\u52d9\u6642\u3002<\/p>\n\n\n\n<p>\u8a2d\u7f6e\u4fdd\u8b77\u63aa\u65bd\u7684\u539f\u56e0\u5305\u62ec\u4ee5\u4e0b\u5e7e\u9ede\uff1a<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>\u6e96\u78ba\u6027<\/strong>\uff1a\u5b83\u5011\u6709\u52a9\u65bc\u78ba\u4fdd\u6a21\u578b\u7684\u8f38\u51fa\u6e96\u78ba\u4e14\u8207\u6240\u63d0\u4f9b\u7684\u8f38\u5165\u76f8\u95dc\u3002<\/li>\n\n\n\n<li><strong>\u4e00\u81f4\u6027<\/strong>\uff1a\u5b83\u5011\u4fdd\u6301\u6a21\u578b\u56de\u61c9\u7684\u4e00\u81f4\u6027\uff0c\u7279\u5225\u662f\u5728\u8655\u7406\u985e\u4f3c\u6216\u76f8\u95dc\u7684\u8f38\u5165\u6642\u3002<\/li>\n\n\n\n<li><strong>\u5b89\u5168\u6027<\/strong>\uff1a\u5b83\u5011\u53ef\u4ee5\u9632\u6b62\u6a21\u578b\u7522\u751f\u6709\u5bb3\u7684\u3001\u653b\u64ca\u6027\u7684\u6216\u4e0d\u9069\u7576\u7684\u5167\u5bb9\u3002<\/li>\n\n\n\n<li><strong>\u4e0a\u4e0b\u6587\u76f8\u95dc\u6027<\/strong>\uff1a\u5b83\u5011\u78ba\u4fdd\u6a21\u578b\u7684\u8f38\u51fa\u8207\u5176\u6240\u4f7f\u7528\u7684\u7279\u5b9a\u4efb\u52d9\u6216\u9818\u57df\u5728\u4e0a\u4e0b\u6587\u4e0a\u76f8\u95dc\u3002<\/li>\n<\/ol>\n\n\n\n<p>\u5728\u6211\u5011\u7684\u6848\u4f8b\u4e2d\uff0c\u6211\u5011\u4f7f\u7528 GPT-4o mini \u4f86\u5206\u6790\u6642\u5c1a\u5716\u7247\u4e26\u5efa\u8b70\u8207\u539f\u59cb\u670d\u88dd\u76f8\u8f14\u76f8\u6210\u7684\u55ae\u54c1\u3002\u70ba\u4e86\u5be6\u65bd\u4fdd\u8b77\u63aa\u65bd\uff0c\u6211\u5011\u53ef\u4ee5\u5c0d\u7d50\u679c\u9032\u884c\u7cbe\u7149\uff1a\u5728\u5f9e GPT-4o mini \u7372\u5f97\u521d\u6b65\u5efa\u8b70\u5f8c\uff0c\u6211\u5011\u53ef\u4ee5\u5c07\u539f\u59cb\u5716\u7247\u548c\u5efa\u8b70\u7684\u55ae\u54c1\u518d\u6b21\u767c\u9001\u7d66\u6a21\u578b\u3002\u7136\u5f8c\uff0c\u6211\u5011\u53ef\u4ee5\u8acb\u6c42 GPT-4o mini \u8a55\u4f30\u6bcf\u500b\u5efa\u8b70\u7684\u55ae\u54c1\u662f\u5426\u78ba\u5be6\u9069\u5408\u539f\u59cb\u670d\u88dd\u3002<\/p>\n\n\n\n<p>\u9019\u6a23\u7684\u505a\u6cd5\u4f7f\u6a21\u578b\u80fd\u5920\u6839\u64da\u53cd\u994b\u6216\u984d\u5916\u4fe1\u606f\u9032\u884c\u81ea\u6211\u4fee\u6b63\u548c\u8abf\u6574\u5176\u8f38\u51fa\u3002\u901a\u904e\u5be6\u65bd\u9019\u4e9b\u4fdd\u8b77\u63aa\u65bd\u4e26\u555f\u7528\u81ea\u6211\u4fee\u6b63\uff0c\u6211\u5011\u53ef\u4ee5\u589e\u5f37\u6a21\u578b\u5728\u6642\u5c1a\u5206\u6790\u548c\u63a8\u85a6\u65b9\u9762\u8f38\u51fa\u7684\u53ef\u9760\u6027\u548c\u5be6\u7528\u6027\u3002<\/p>\n\n\n\n<p>\u70ba\u4e86\u4fc3\u9032\u9019\u4e00\u904e\u7a0b\uff0c\u6211\u5011\u64b0\u5beb\u4e00\u500b\u63d0\u793a\uff0c\u8981\u6c42 LLM \u7d66\u51fa\u4e00\u500b\u7c21\u55ae\u7684\u300c\u662f\u300d\u6216\u300c\u5426\u300d\u56de\u7b54\uff0c\u4ee5\u5224\u65b7\u5efa\u8b70\u7684\u55ae\u54c1\u662f\u5426\u8207\u539f\u59cb\u670d\u88dd\u76f8\u5339\u914d\u3002\u9019\u7a2e\u4e8c\u5143\u56de\u61c9\u6709\u52a9\u65bc\u7c21\u5316\u7cbe\u7149\u904e\u7a0b\uff0c\u4e26\u78ba\u4fdd\u5f9e\u6a21\u578b\u7372\u5f97\u660e\u78ba\u4e14\u53ef\u884c\u7684\u53cd\u994b\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def check_match(reference_image_base64, suggested_image_base64):\n    response = client.chat.completions.create(\n        model=GPT_MODEL,\n        messages=&#91;\n            {\n            \"role\": \"user\",\n            \"content\": &#91;\n                {\n                \"type\": \"text\",\n                \"text\": \"\"\" You will be given two images of two different items of clothing.\n                            Your goal is to decide if the items in the images would work in an outfit together.\n                            The first image is the reference item (the item that the user is trying to match with another item).\n                            You need to decide if the second item would work well with the reference item.\n                            Your response must be a JSON output with the following fields: \"answer\", \"reason\".\n                            The \"answer\" field must be either \"yes\" or \"no\", depending on whether you think the items would work well together.\n                            The \"reason\" field must be a short explanation of your reasoning for your decision. Do not include the descriptions of the 2 images.\n                            Do not include the ```json ``` tag in the output.\n                           \"\"\",\n                },\n                {\n                \"type\": \"image_url\",\n                \"image_url\": {\n                    \"url\": f\"data:image\/jpeg;base64,{reference_image_base64}\",\n                },\n                },\n                {\n                \"type\": \"image_url\",\n                \"image_url\": {\n                    \"url\": f\"data:image\/jpeg;base64,{suggested_image_base64}\",\n                },\n                }\n            ],\n            }\n        ],\n        max_tokens=300,\n    )\n    # Extract relevant features from the response\n    features = response.choices&#91;0].message.content\n    return features<\/code><\/pre>\n\n\n\n<p>\u6700\u5f8c\uff0c\u8b93\u6211\u5011\u78ba\u5b9a\u4e0a\u9762\u78ba\u5b9a\u7684\u54ea\u4e9b\u7269\u54c1\u771f\u6b63\u8207\u9019\u5957\u670d\u88dd\u76f8\u5f97\u76ca\u5f70\u3002<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Select the unique paths for the generated images\npaths = list(set(paths))\n\nfor path in paths:\n    # Encode the test image to base64\n    suggested_image = encode_image_to_base64(path)\n    \n    # Check if the items match\n    match = json.loads(check_match(encoded_image, suggested_image))\n    \n    # Display the image and the analysis results\n    if match&#91;\"answer\"] == 'yes':\n        display(Image(filename=path))\n        print(\"The items match!\")\n        print(match&#91;\"reason\"])<\/code><\/pre>\n\n\n\n<p>\u6211\u5011\u53ef\u4ee5\u89c0\u5bdf\u5230\uff0c\u6700\u521d\u7684\u6f5b\u5728\u7269\u54c1\u6e05\u55ae\u5df2\u88ab\u9032\u4e00\u6b65\u7d30\u5316\uff0c\u5f9e\u800c\u7522\u751f\u4e86\u66f4\u7cbe\u5fc3\u7b56\u5283\u7684\u9078\u64c7\uff0c\u8207\u670d\u88dd\u5b8c\u7f8e\u5951\u5408\u3002\u6b64\u5916\uff0c\u8a72\u6a21\u578b\u9084\u89e3\u91cb\u4e86\u70ba\u4ec0\u9ebc\u6bcf\u500b\u9805\u76ee\u88ab\u8a8d\u70ba\u662f\u5f88\u597d\u7684\u5339\u914d\uff0c\u70ba\u6c7a\u7b56\u904e\u7a0b\u63d0\u4f9b\u4e86\u5bf6\u8cb4\u7684\u898b\u89e3\u3002<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"conclusion\">\u7d50\u8ad6<\/h3>\n\n\n\n<p>\u5728\u9019\u500b Jupyter Notebook \u4e2d\uff0c\u6211\u5011\u63a2\u8a0e\u4e86 GPT-4o mini \u53ca\u5176\u4ed6\u6a5f\u5668\u5b78\u7fd2\u6280\u8853\u5728\u6642\u5c1a\u9818\u57df\u7684\u61c9\u7528\u3002\u6211\u5011\u5c55\u793a\u4e86\u5982\u4f55\u5206\u6790\u670d\u88dd\u5716\u7247\u3001\u63d0\u53d6\u76f8\u95dc\u7279\u5fb5\uff0c\u4e26\u5229\u7528\u9019\u4e9b\u8cc7\u8a0a\u627e\u5230\u8207\u539f\u59cb\u670d\u88dd\u76f8\u8f14\u76f8\u6210\u7684\u642d\u914d\u55ae\u54c1\u3002\u900f\u904e\u5be6\u65bd\u4fdd\u8b77\u63aa\u65bd\u548c\u81ea\u6211\u4fee\u6b63\u6a5f\u5236\uff0c\u6211\u5011\u7cbe\u7149\u4e86\u6a21\u578b\u7684\u5efa\u8b70\uff0c\u4ee5\u78ba\u4fdd\u5b83\u5011\u7684\u6e96\u78ba\u6027\u548c\u60c5\u5883\u76f8\u95dc\u6027\u3002<\/p>\n\n\n\n<p>\u9019\u7a2e\u65b9\u6cd5\u5728\u73fe\u5be6\u4e16\u754c\u4e2d\u5177\u6709\u591a\u7a2e\u5be6\u7528\u7528\u9014\uff0c\u5305\u62ec\uff1a<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>\u500b\u4eba\u5316\u8cfc\u7269\u52a9\u7406<\/strong>\uff1a\u96f6\u552e\u5546\u53ef\u4ee5\u5229\u7528\u9019\u9805\u6280\u8853\u5411\u9867\u5ba2\u63d0\u4f9b\u500b\u4eba\u5316\u7684\u670d\u88dd\u63a8\u85a6\uff0c\u589e\u5f37\u8cfc\u7269\u9ad4\u9a57\u4e26\u63d0\u9ad8\u9867\u5ba2\u6eff\u610f\u5ea6\u3002<\/li>\n\n\n\n<li><strong>\u865b\u64ec\u8863\u6ac3\u61c9\u7528\u7a0b\u5f0f<\/strong>\uff1a\u4f7f\u7528\u8005\u53ef\u4ee5\u4e0a\u50b3\u81ea\u5df1\u670d\u88dd\u7684\u5716\u50cf\u4f86\u5efa\u7acb\u865b\u64ec\u8863\u6ac3\uff0c\u4e26\u63a5\u6536\u8207\u73fe\u6709\u8863\u670d\u76f8\u7b26\u7684\u65b0\u7269\u54c1\u7684\u5efa\u8b70\u3002<\/li>\n\n\n\n<li><strong>\u6642\u88dd\u8a2d\u8a08\u548c\u9020\u578b<\/strong>\uff1a\u6642\u88dd\u8a2d\u8a08\u5e2b\u548c\u9020\u578b\u5e2b\u53ef\u4ee5\u4f7f\u7528\u6b64\u5de5\u5177\u5617\u8a66\u4e0d\u540c\u7684\u7d44\u5408\u548c\u98a8\u683c\uff0c\u5f9e\u800c\u7c21\u5316\u5275\u4f5c\u904e\u7a0b\u3002<\/li>\n<\/ol>\n\n\n\n<p>\u7136\u800c\uff0c\u9700\u8003\u616e\u7684\u4e00\u500b\u56e0\u7d20\u662f\u6210\u672c\u3002\u4f7f\u7528\u5927\u578b\u8a9e\u8a00\u6a21\u578b\uff08LLMs\uff09\u548c\u5716\u50cf\u5206\u6790\u6a21\u578b\u53ef\u80fd\u6703\u7522\u751f\u6210\u672c\uff0c\u5c24\u5176\u662f\u5728\u983b\u7e41\u4f7f\u7528\u7684\u60c5\u6cc1\u4e0b\u3002\u56e0\u6b64\uff0c\u8003\u616e\u9019\u4e9b\u6280\u8853\u7684\u6210\u672c\u6548\u76ca\u975e\u5e38\u91cd\u8981\u3002GPT-4o mini \u7684\u5b9a\u50f9\u70ba\u6bcf 1,000 \u500b tokens $0.01\uff0c\u9019\u610f\u5473\u8457\u8655\u7406\u4e00\u5f35 256px x 256px \u7684\u5716\u50cf\u9700\u8981\u7d04 $0.00255\u3002<\/p>\n\n\n\n<p>\u7e3d\u7684\u4f86\u8aaa\uff0c\u9019\u500b Notebook \u70ba\u6642\u5c1a\u8207 AI \u4ea4\u96c6\u7684\u9032\u4e00\u6b65\u63a2\u7d22\u548c\u958b\u767c\u5960\u5b9a\u4e86\u57fa\u790e\uff0c\u70ba\u500b\u6027\u5316\u548c\u667a\u80fd\u5316\u7684\u670d\u88dd\u63a8\u85a6\u7cfb\u7d71\u6253\u958b\u4e86\u5927\u9580\u3002<\/p>\n\n\n\n<p>\u8cc7\u6599\u4f86\u6e90: <a href=\"https:\/\/cookbook.openai.com\/examples\/how_to_combine_gpt4o_with_rag_outfit_assistant\">https:\/\/cookbook.openai.com\/examples\/how_to_combine_gpt4o_with_rag_outfit_assistant<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>2024-07-18 | Teodora Musatoiu \u6b61\u8fce\u4f86\u5230\u670d\u88dd\u642d\u914d\u61c9\u7528 Jupyter Notebook\uff01\u672c\u5c08\u6848\u5c55\u793a\u4e86 GPT-4o mini \u6a21\u578b\u5728\u5206\u6790\u670d\u88dd\u5716\u7247\u4e26\u63d0\u53d6\u95dc&hellip;<\/p>\n","protected":false},"author":4,"featured_media":7241,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[579,4],"tags":[40],"class_list":["post-7240","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-579","category-industry-news","tag-40"],"gutentor_comment":0,"jetpack_featured_media_url":"https:\/\/i0.wp.com\/aict.nkust.edu.tw\/digitrans\/wp-content\/uploads\/2024\/10\/%E8%9E%A2%E5%B9%95%E6%93%B7%E5%8F%96%E7%95%AB%E9%9D%A2-2024-10-18-150059.png?fit=775%2C143&ssl=1","jetpack-related-posts":[],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/aict.nkust.edu.tw\/digitrans\/index.php?rest_route=\/wp\/v2\/posts\/7240","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aict.nkust.edu.tw\/digitrans\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aict.nkust.edu.tw\/digitrans\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aict.nkust.edu.tw\/digitrans\/index.php?rest_route=\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/aict.nkust.edu.tw\/digitrans\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7240"}],"version-history":[{"count":4,"href":"https:\/\/aict.nkust.edu.tw\/digitrans\/index.php?rest_route=\/wp\/v2\/posts\/7240\/revisions"}],"predecessor-version":[{"id":7254,"href":"https:\/\/aict.nkust.edu.tw\/digitrans\/index.php?rest_route=\/wp\/v2\/posts\/7240\/revisions\/7254"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/aict.nkust.edu.tw\/digitrans\/index.php?rest_route=\/wp\/v2\/media\/7241"}],"wp:attachment":[{"href":"https:\/\/aict.nkust.edu.tw\/digitrans\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7240"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aict.nkust.edu.tw\/digitrans\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7240"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aict.nkust.edu.tw\/digitrans\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7240"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}