-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Description
System Info
OS: Apple M1 Max
Name: langchain
Version: 0.0.349
Summary: Building applications with LLMs through composability
Home-page: https://github.com/langchain-ai/langchain
Author:
Author-email:
License: MIT
Requires: aiohttp, async-timeout, dataclasses-json, jsonpatch, langchain-community, langchain-core, langsmith, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity
Required-by:
Who can help?
Information
- The official example notebooks/scripts
- My own modified scripts
Related Components
- LLMs/Chat Models
- Embedding Models
- Prompts / Prompt Templates / Prompt Selectors
- Output Parsers
- Document Loaders
- Vector Stores / Retrievers
- Memory
- Agents / Agent Executors
- Tools / Toolkits
- Chains
- Callbacks/Tracing
- Async
Reproduction
Steps to reproduce:
I have followed the instructions provided here : https://python.langchain.com/docs/integrations/llms/llamacpp.
Though not able inference it correctly.
Model path : https://huggingface.co/TheBloke/deepseek-coder-6.7B-instruct-GGUF
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains import LLMChain, QAGenerationChain
from langchain.llms import LlamaCpp
from langchain.prompts import PromptTemplate
template = """Question: {question}
Answer: Let's work this out in a step by step way to be sure we have the right answer."""
prompt = PromptTemplate(template=template, input_variables=["question"])
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
n_gpu_layers = 1 # Change this value based on your model and your GPU VRAM pool.
n_batch = 512 # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
llm = LlamaCpp(
model_path="../models/deepcoder-gguf/deepseek-coder-6.7b-instruct.Q2_K.gguf",
n_gpu_layers=n_gpu_layers,
max_tokens=2000,
top_p=1,
n_batch=n_batch,
callback_manager=callback_manager,
f16_kv=True,
verbose=True, # Verbose is required to pass to the callback manager
)
llm(
"Question: Write python program to add two numbers ? Answer:"
)
Result: < """"""""""""""""""""""/"
Requesting you to look into it.
Please let me know in case you need more information.
Thank you.
I have tried the same model file with llama-cpp-python package and it works as expected.
Please find below the code that I have tried:
import json
import time
from llama_cpp import Llama
n_gpu_layers = 1 # Change this value based on your model and your GPU VRAM pool.
n_batch = 512
llm = Llama(model_path="../models/deepcoder-gguf/deepseek-coder-6.7b-instruct.Q5_K_M.gguf" , chat_format="llama-2", n_gpu_layers=n_gpu_layers,n_batch=n_batch)
start_time = time.time()
pp = llm.create_chat_completion(
messages = [
{"role": "system", "content": "You are an python language assistant."},
{
"role": "user",
"content": "Write quick sort ."
}
])
end_time = time.time()
print("execution time:", {end_time - start_time})
print(pp["choices"][0]["message"]["content"])
Output :
## Quick Sort Algorithm in Python
Here is a simple implementation of the quicksort algorithm in Python:
```python
def partition(arr, low, high):
i = (low-1) # index of smaller element
pivot = arr[high] # pivot
for j in range(low , high):
if arr[j] <= pivot:
i += 1
arr[i],arr[j] = arr[j],arr[i]
arr[i+1],arr[high] = arr[high],arr[i+1]
return (i+1)
def quickSort(arr, low, high):
if low < high:
pi = partition(arr,low,high)
quickSort(arr, low, pi-1)
quickSort(arr, pi+1, high)
# Test the code
n = int(input("Enter number of elements in array: "))
print("Enter elements: ")
arr = [int(input()) for _ in range(n)]
quickSort(arr,0,n-1)
print ("Sorted array is:")
for i in range(n):
print("%d" %arr[i]),
This code first defines a helper function `partition()` that takes an array and two indices. It then rearranges the elements of the array so that all numbers less than or equal to the pivot are on its left, while all numbers greater than the pivot are on its right. The `quickSort()` function is then defined which recursively applies this partitioning process until the entire array is sorted.
The user can input their own list of integers and the program will output a sorted version of that list.
[/code]
Conclusion
In conclusion, Python provides several built-in functions for sorting lists such as `sort()` or `sorted()` but it's also possible to implement quick sort algorithm from scratch using custom function. This can be useful in situations where you need more control over the sorting process or when dealing with complex data structures.
Expected behavior
It should inference the model just like the native llama-cpp-python package.