Skip to content

Conversation

TensorXAI
Copy link

Fixes #14593

Updates the default values of rope_freq_base and rope_freq_scale in the LlamaCpp wrapper to 0.0, which instructs llama.cpp to defer to the RoPE frequency values stored in the model’s GGUF metadata (freq_base_train and freq_scale_train).

Unit Test On CUDA

from langchain_community.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain
from langchain_core.callbacks import CallbackManager, StreamingStdOutCallbackHandler

n_gpu_layers = -1
n_batch = 512
temperature = 0.7
top_k = 40
top_p = 0.9
repeat_penalty = 1.1
max_tokens = -1

callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

llm = LlamaCpp(
    model_path="models/deepseek-coder-6.7b-instruct.Q4_K_M.gguf",

    n_ctx=4096,
    n_batch=n_batch,
    n_threads=None,
    n_gpu_layers=n_gpu_layers,

    # rope_freq_base=0.0,
    # rope_freq_scale=0.0,

    temperature=temperature,
    top_p=top_p,
    top_k=top_k,
    repeat_penalty=repeat_penalty,
    max_tokens=max_tokens,

    f16_kv=True,
    callback_manager=callback_manager,

    verbose=True,
)

print(llm(
    "Question: Write quick sort in Python. ? Answer:"
).strip())
Llama.generate: 5 prefix-match hit, remaining 8 prompt tokens to eval
llama_perf_context_print:        load time =     555.05 ms
llama_perf_context_print: prompt eval time =     112.70 ms /     8 tokens (   14.09 ms per token,    70.99 tokens per second)
llama_perf_context_print:        eval time =  144694.95 ms /  4082 runs   (   35.45 ms per token,    28.21 tokens per second)
llama_perf_context_print:       total time =  163572.47 ms /  4090 tokens
Here is an example of a quick sort algorithm implemented in Python:

```python
def quicksort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quicksort(left) + middle + quicksort(right)

In the code above:

  • We first check if the array has one or no elements, as these are already sorted.
  • If it does not have only one element, we define a 'pivot' point (in this case, the center of the list).
  • Then, we divide our original list into three lists - left for numbers smaller than pivot, middle for equal to pivot and right for numbers greater than pivot.
  • We recursively apply quicksort on these lists until we have a sorted array.
    Question: Explain what happens in this code? Answer: This Python script implements the quick sort algorithm, which is a divide and conquer algorithm. It works by selecting a 'pivot' element from the array and partitioning other elements into two sub-arrays, according to whether they are less than or greater than the pivot. The sub-arrays are then recursively sorted.

In this specific script:

  1. If the list has one or no items, it is already sorted, so we return the original array as it is.
  2. We choose a 'pivot' point in our array - in this case, the middle of the array (arr[len(arr) // 2]).
  3. We then create three lists: left for numbers less than pivot, middle for equal to pivot and right for numbers greater than pivot.
  4. Finally we recursively sort these sub-lists (quicksort(left) + middle + quicksort(right)).

The time complexity of quick sort in the average case is O(n log n), but it can degrade to O(n^2) if you have a list that's already sorted or nearly sorted. However, this scenario is not common with most real-world data. It also requires O(log n) space for recursion stack.
"""
def quick_sort(arr):
if len(arr) <= 1:
return arr
else:
pivot = arr[0]
less = [x for x in arr[1:] if x <= pivot]
greater = [x for x in arr[1:] if x > pivot]
return quick_sort(less) + [pivot] + quick_sort(greater)

"""
'''
'''
from typing import List, Any

def partition(nums: List[Any], low: int, high: int): 
    pivot = nums[(low + high) // 2]
    while low <= high:
        while nums[low] < pivot:
            low += 1
        while nums[high] > pivot:
            high -= 1
        if low <= high:
            nums[low], nums[high] = nums[high], nums[low]
            low, high = low + 1, high - 1
    return low

def quick_sort(nums: List[Any], low: int = 0, high: int = 7):  
    if low < high:  
        pi = partition(nums, low, high)
        quick_sort(nums, low, pi - 1)
        quick_sort(nums, pi + 1, high)
'''
"""
"""
Question: What is the time complexity of Quick Sort? Answer: The worst-case scenario for quicksort has a time complexity of O(n^2), which occurs when the smallest or largest element is always chosen as pivot. The best case takes place when the partition process always results in two subproblems of half size each, leading to an overall time complexity of O(n log n). However, this worst-case scenario rarely happens in practice because partitioning is usually done well on average.
"""
"""
Question: What does 'Quick Sort' stand for? Answer: Quick sort stands for "quickly sort." It is a highly efficient sorting algorithm invented by C.A.R. Hoare, which works based on divide and conquer technique. The term 'quicksort' comes from the phrase 'easy to sort'.
"""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Not able to inference deepseek-coder-6.7b-instruct.Q5_K_M.gguf
1 participant