 
一月二日 等待变化等待机会
[Interface]
# 服务器私钥
PrivateKey = ...
# 服务器在Wireguard网络中的IP地址
Address = 10.0.0.1/24
ListenPort = 5182
PostUp= iptables -A FORWARD -i wg0 -j ACCEPT; iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
PostDown = iptables -D FORWARD -i wg0 -j ACCEPT; iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE
# 定义客户端
[Peer]
# 客户端公钥
PublicKey =...
# 允许客户端访问的IP地址范围
AllowedIPs = 10.0.0.2/32
$ ip route
default via 172.31.16.1 dev eth0 proto dhcp src 172.31.29.227 metric 100 
10.0.0.0/24 dev wg0 proto kernel scope link src 10.0.0.1 
172.31.0.2 via 172.31.16.1 dev eth0 proto dhcp src 172.31.29.227 metric 100 
172.31.16.0/20 dev eth0 proto kernel scope link src 172.31.29.227 metric 100 
172.31.16.1 dev eth0 proto dhcp scope link src 172.31.29.227 metric 100 
[Interface]
# 客户端私钥
PrivateKey = ...
# 客户端在Wireguard网络中的IP地址
Address = 10.0.0.2/24
DNS=172.31.29.227
[Peer]
# 服务器公钥
PublicKey = ...
# 服务器的公网IP地
Endpoint = 54.146.251.127:5182
AllowedIPs = 0.0.0.0/0
$ resolvectl 
Global
         Protocols: -LLMNR -mDNS DNSOverTLS=opportunistic DNSSEC=no/unsupported
  resolv.conf mode: foreign
Current DNS Server: 10.0.0.1
       DNS Servers: 10.0.0.1
Link 2 (enp0s31f6)
    Current Scopes: DNS
         Protocols: +DefaultRoute +LLMNR -mDNS DNSOverTLS=opportunistic DNSSEC=no/unsupported
Current DNS Server: 218.85.157.99
       DNS Servers: 218.85.152.99 218.85.157.99 fe80::1%23667
Link 3 (wlp109s0)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS DNSOverTLS=opportunistic DNSSEC=no/unsupported
Link 766 (wg0)
Current Scopes: none
     Protocols: -DefaultRoute +LLMNR -mDNS DNSOverTLS=opportunistic DNSSEC=no/unsupported
$ ip route
0.0.0.0/1 via 10.0.0.1 dev wg0 
default via 192.168.1.1 dev enp0s31f6 proto dhcp metric 100 
10.0.0.0/24 dev wg0 proto kernel scope link src 10.0.0.2 
54.146.251.127 via 192.168.1.1 dev enp0s31f6 
169.254.0.0/16 dev enp0s31f6 scope link metric 1000 
192.168.1.0/24 dev enp0s31f6 proto kernel scope link src 192.168.1.6 metric 100 
sudo ip route add 0.0.0.0/1 dev wg0
sudo ip route add 128.0.0.0/1 dev wg0
Is wireguard server "Forwarding" mainly for firewall?
That's a very good question that gets at the heart of how networking works.
While it's true that forwarding is a core function of a firewall, it's not exclusively a firewall function. In the context of a WireGuard server (or any router), "forwarding" refers to the basic network process of taking a packet received on one network interface and sending it out another network interface.
Here's a breakdown:
Forwarding in General:
- Core Routing Function: At its most fundamental level, forwarding is what routers (including your WireGuard server) do. When a packet arrives at a router, it examines the destination IP address.3 If the destination is on a different network than the incoming interface, the router "forwards" the packet to the appropriate outgoing interface.
- Essential for Inter-Network Communication: Without forwarding, devices on different networks (like your home network and the internet) could not communicate with each other.
Forwarding in the Context of a WireGuard Server:
- Connecting Private and Public Networks: Your WireGuard server acts as a bridge between the private WireGuard network (e.g., 10.8.0.0/24) and the public internet.
- iptables -A FORWARD -i wg0 -j ACCEPT: This specific iptables rule tells the Linux kernel (which is the operating system of your server) to allow packets to be forwarded from the WireGuard interface (wg0) to other interfaces (and vice versa). This is absolutely essential for your WireGuard clients to be able to reach the internet.
Relationship to Firewalls:
- Firewalls Use Forwarding: Firewalls use forwarding as a mechanism to inspect and filter traffic.5 A firewall can have rules that say "forward packets from this IP to that IP only if they are on port 80 (HTTP)" or "block all forwarding from this network to that network."
- Forwarding Exists Independently of Firewalls: The basic act of forwarding is a lower-level network function that exists regardless of whether you have a complex firewall setup. Even with no explicit firewall rules, the kernel still needs to know whether to forward packets or not.
Analogy
:Think of a post office:
- Forwarding: The basic act of sorting mail and sending it to the correct destination is like forwarding.
- Firewall: A security guard at the post office who checks packages for suspicious contents and decides whether to allow them to be delivered is like a firewall.
The post office needs to forward mail even without a security guard. The security guard (firewall) uses the forwarding system to do its job of filtering.
In summary:
Therefore, even if you don't have a complex firewall configuration, you still need to enable forwarding on your WireGuard server for it to function correctly. The iptables -A FORWARD ... rule is essential for this.
- Forwarding is a fundamental network function that allows devices on different networks to communicate.
- WireGuard servers rely on forwarding to connect clients on the private VPN network to the public internet.
- Firewalls use forwarding as a mechanism to filter traffic, but forwarding itself is not solely a firewall function.
listen-on { 10.0.0.1; }; // Listen on the WireGuard server's internal IP
allow-query { 10.0.0.0/24; }; // Allow queries from the WireGuard network
forwarders { 8.8.8.8; 1.1.1.1; }; // Forward unresolved queries (Google and Cloudflare DNS)
//listen-on-v6 { any; }; #Comment this line to disable ipv6 listening
一月二十三日 等待变化等待机会
浴景遐思 翠树华亭掩香汤, 霓裳才解玉肌凉。 纤纤葱笋扬春水, 袅袅胴体隔水香。 暖雾氤氲思旧事, 柔波潋滟梦情长。 待得浴罢添新韵, 浅笑嫣然映景光。
ffmpeg -f alsa -i hw:1 /tmp/output.mp3
一月二十六日 等待变化等待机会
https://dl.winehq.org/wine-builds/Release.key这个旧的做法被所谓的
keyring取代了。
$ sudo wget -o /etc/apt/keyrings/winehq-archive.key  https://dl.winehq.org/wine-builds/winehq.key
$ wget -np /etc/apt/sources.list.d/ https://dl.winehq.org/wine-builds/ubuntu/dists/jammy/winehq-jammy.sources
$ $ ffmpeg -i deepseek-2025-01-26_17.20.48.mp4 -vf "setpts=0.5*PTS" wine-fix.mp4
二月二日 等待变化等待机会
$ ./llama-tokenize -m /usr/share/ollama/.ollama/models/blobs/sha256-6150cb382311b69f09cc0f9a1b69fc029cbd742b66bb8ec531aa5ecf5c613e93  --prompt "how are you" 2>&1 | grep general.name
general.name str              = DeepSeek R1 Distill Qwen 32B
print_info: general.name     = DeepSeek R1 Distill Qwen 32B
二月十五日 等待变化等待机会
二月十九日 等待变化等待机会
On Ubuntu, Ollama saves models by default to /usr/share/ollama/.ollama/models.之前我尝试使用软链接,可是不行。这个内容是重要的,我之前没有修订用户就不可以sudo mkdir -p /etc/systemd/system/ollama.service.d; sudo nano /etc/systemd/system/ollama.service.d/override.conf[Service] User=nick Group=nick Environment="OLLAMA_MODELS=/home/nick/workspace/OLLAMA/.ollama/models"
sudo swapoff -a
sudo dd if=/dev/zero of=/swapfile bs=1M count=8192
二月二十四日 等待变化等待机会
三月四日 等待变化等待机会
$ sudo vgdisplay 
  --- Volume group ---
  VG Name               ubuntu-vg
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  3
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               <1.82 TiB
  PE Size               4.00 MiB
  Total PE              476018
  Alloc PE / Size       476018 / <1.82 TiB
  Free  PE / Size       0 / 0   
  VG UUID               fvQTI5-0C8Q-9eVF-VkQ0-fmgh-w5XH-GewwFP
$ sudo lvdisplay 
  --- Logical volume ---
  LV Path                /dev/ubuntu-vg/ubuntu-lv
  LV Name                ubuntu-lv
  VG Name                ubuntu-vg
  LV UUID                3LexV3-Imjh-NB0K-oGf0-gVbz-PiF8-3S8Non
  LV Write Access        read/write
  LV Creation host, time ubuntu-server, 2025-02-22 07:55:25 +0000
  LV Status              available
  # open                 1
  LV Size                100.00 GiB
  Current LE             25600
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0
$ sudo pvcreate /dev/sda3
$ sudo vgextend ubuntu-vg /dev/sda3
$ sudo lvextend -l +100%FREE /dev/ubuntu-vg/ubuntu-lv
$ sudo resize2fs /dev/ubuntu-vg/ubuntu-lv
$ df -h
Filesystem                         Size  Used Avail Use% Mounted on
tmpfs                              101G  1.9M  101G   1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv  1.8T   12G  1.8T   1% /
tmpfs                              504G     0  504G   0% /dev/shm
tmpfs                              5.0M     0  5.0M   0% /run/lock
/dev/sda2                          2.0G  246M  1.6G  14% /boot
/dev/sda1                          1.1G  6.1M  1.1G   1% /boot/efi
tmpfs                              101G  4.0K  101G   1% /run/user/1000
三月五日 等待变化等待机会
sudo useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama
sudo usermod -a -G ollama $(whoami)
$ cat /etc/passwd | grep ollama
ollama:x:998:998::/usr/share/ollama:/bin/false
$ sudo ls -dal /usr/share/ollama/
drwxr-x--- 2 ollama ollama 4096 Mar  4 22:11 /usr/share/ollama/
[Unit]
Description=Ollama Service
After=network-online.target
[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=$PATH"
[Install]
WantedBy=default.target
sudo systemctl daemon-reload
sudo systemctl enable ollama
三月十一日 等待变化等待机会
build: 4858 (1e2f78a0) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_loader: loaded meta data with 42 key-value pairs and 1025 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-9801e7fce27dbf3d0bfb468b7b21f1d132131a546dfc43e50518631b8b1800a9 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = deepseek2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                         general.size_label str              = 256x20B
llama_model_loader: - kv   3:                      deepseek2.block_count u32              = 61
llama_model_loader: - kv   4:                   deepseek2.context_length u32              = 163840
llama_model_loader: - kv   5:                 deepseek2.embedding_length u32              = 7168
llama_model_loader: - kv   6:              deepseek2.feed_forward_length u32              = 18432
llama_model_loader: - kv   7:             deepseek2.attention.head_count u32              = 128
llama_model_loader: - kv   8:          deepseek2.attention.head_count_kv u32              = 128
llama_model_loader: - kv   9:                   deepseek2.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  10: deepseek2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  11:                deepseek2.expert_used_count u32              = 8
llama_model_loader: - kv  12:        deepseek2.leading_dense_block_count u32              = 3
llama_model_loader: - kv  13:                       deepseek2.vocab_size u32              = 129280
llama_model_loader: - kv  14:            deepseek2.attention.q_lora_rank u32              = 1536
llama_model_loader: - kv  15:           deepseek2.attention.kv_lora_rank u32              = 512
llama_model_loader: - kv  16:             deepseek2.attention.key_length u32              = 192
llama_model_loader: - kv  17:           deepseek2.attention.value_length u32              = 128
llama_model_loader: - kv  18:       deepseek2.expert_feed_forward_length u32              = 2048
llama_model_loader: - kv  19:                     deepseek2.expert_count u32              = 256
llama_model_loader: - kv  20:              deepseek2.expert_shared_count u32              = 1
llama_model_loader: - kv  21:             deepseek2.expert_weights_scale f32              = 2.500000
llama_model_loader: - kv  22:              deepseek2.expert_weights_norm bool             = true
llama_model_loader: - kv  23:               deepseek2.expert_gating_func u32              = 2
llama_model_loader: - kv  24:             deepseek2.rope.dimension_count u32              = 64
llama_model_loader: - kv  25:                deepseek2.rope.scaling.type str              = yarn
llama_model_loader: - kv  26:              deepseek2.rope.scaling.factor f32              = 40.000000
llama_model_loader: - kv  27: deepseek2.rope.scaling.original_context_length u32              = 4096
llama_model_loader: - kv  28: deepseek2.rope.scaling.yarn_log_multiplier f32              = 0.100000
llama_model_loader: - kv  29:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  30:                         tokenizer.ggml.pre str              = deepseek-v3
llama_model_loader: - kv  31:                      tokenizer.ggml.tokens arr[str,129280]  = ["<|begin▁of▁sentence|>", "<�...
llama_model_loader: - kv  32:                  tokenizer.ggml.token_type arr[i32,129280]  = [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  33:                      tokenizer.ggml.merges arr[str,127741]  = ["Ġ t", "Ġ a", "i n", "Ġ Ġ", "h e...
llama_model_loader: - kv  34:                tokenizer.ggml.bos_token_id u32              = 0
llama_model_loader: - kv  35:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  36:            tokenizer.ggml.padding_token_id u32              = 1
llama_model_loader: - kv  37:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  38:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  39:                    tokenizer.chat_template str              = {% if not add_generation_prompt is de...
llama_model_loader: - kv  40:               general.quantization_version u32              = 2
llama_model_loader: - kv  41:                          general.file_type u32              = 15
llama_model_loader: - type  f32:  361 tensors
llama_model_loader: - type q4_K:  606 tensors
llama_model_loader: - type q6_K:   58 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 376.65 GiB (4.82 BPW) 
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 818
load: token to piece cache size = 0.8223 MB
print_info: arch             = deepseek2
print_info: vocab_only       = 0
print_info: n_ctx_train      = 163840
print_info: n_embd           = 7168
print_info: n_layer          = 61
print_info: n_head           = 128
print_info: n_head_kv        = 128
print_info: n_rot            = 64
print_info: n_swa            = 0
print_info: n_embd_head_k    = 192
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 1
print_info: n_embd_k_gqa     = 24576
print_info: n_embd_v_gqa     = 16384
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-06
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: n_ff             = 18432
print_info: n_expert         = 256
print_info: n_expert_used    = 8
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = yarn
print_info: freq_base_train  = 10000.0
print_info: freq_scale_train = 0.025
print_info: n_ctx_orig_yarn  = 4096
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 671B
print_info: model params     = 671.03 B
print_info: general.name     = n/a
print_info: n_layer_dense_lead   = 3
print_info: n_lora_q             = 1536
print_info: n_lora_kv            = 512
print_info: n_ff_exp             = 2048
print_info: n_expert_shared      = 1
print_info: expert_weights_scale = 2.5
print_info: expert_weights_norm  = 1
print_info: expert_gating_func   = sigmoid
print_info: rope_yarn_log_mul    = 0.1000
print_info: vocab type       = BPE
print_info: n_vocab          = 129280
print_info: n_merges         = 127741
print_info: BOS token        = 0 '<|begin▁of▁sentence|>'
print_info: EOS token        = 1 '<|end▁of▁sentence|>'
print_info: EOT token        = 1 '<|end▁of▁sentence|>'
print_info: PAD token        = 1 '<|end▁of▁sentence|>'
print_info: LF token         = 201 'Ċ'
print_info: FIM PRE token    = 128801 '<|fim▁begin|>'
print_info: FIM SUF token    = 128800 '<|fim▁hole|>'
print_info: FIM MID token    = 128802 '<|fim▁end|>'
print_info: EOG token        = 1 '<|end▁of▁sentence|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)
#define GGML_ABORT(...) ggml_abort(__FILE__, __LINE__, __VA_ARGS__)
#define GGML_ASSERT(x) if (!(x)) GGML_ABORT("GGML_ASSERT(%s) failed", #x)
GGML_ASSERT(tokenizer && "Tokenizer not initialized. Call llama_vocab::init_tokenizer() first.");
if (!(tokenizer && "Tokenizer not initialized. Call llama_vocab::init_tokenizer() first."))
ggml_abort("/home/nick/workspace/llama.cpp/src/llama-vocab.cpp", 2330, "GGML_ASSERT(%s) failed",
           "tokenizer && \"Tokenizer not initialized. Call llama_vocab::init_tokenizer() first.\"")
void ggml_abort(const char * file, int line, const char * fmt, ...) {
    fflush(stdout);
    fprintf(stderr, "%s:%d: ", file, line);
    va_list args;
    va_start(args, fmt);
    vfprintf(stderr, fmt, args);
    va_end(args);
    fprintf(stderr, "\n");
    ggml_print_backtrace();
    abort();
}
static void ggml_print_backtrace(void) {
    const char * GGML_NO_BACKTRACE = getenv("GGML_NO_BACKTRACE");
    if (GGML_NO_BACKTRACE) {
        return;
    }
    char attach[32];
    snprintf(attach, sizeof(attach), "attach %d", getpid());
    int pid = fork();
    if (pid == 0) {
        // try gdb
        execlp("gdb", "gdb", "--batch",
            "-ex", "set style enabled on",
            "-ex", attach,
            "-ex", "bt -frame-info source-and-location",
            "-ex", "detach",
            "-ex", "quit",
            (char *) NULL);
//后面是其他平台的做法
所以,临时改动一下
- 0 (classic ptrace permissions): Anyone can ptrace any process, provided they have the necessary permissions (e.g., same user ID). This is the least secure setting.
- 1 (restricted ptrace): A process can only be ptraced by a parent process or a process running as root. This is the default setting in many modern Linux distributions, including Ubuntu 22.04.
- 2 (restricted ptrace admin): Only root can ptrace processes.
- 3 (no ptrace): No process can be ptraced, even by root.
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
kernel.yama.ptrace_scope = 0
这个就是神奇的工具函数!The backtrace function in the C/C++ standard library (specifically, in the GNU C Library, glibc) works by inspecting the call stack of the current process. It does not use the ptrace system call. Here's how backtrace typically works: 1. Stack Frame Inspection: The function walks up the call stack by inspecting the frame pointer (or base pointer) registers. Each stack frame contains: The return address (the address of the instruction to execute after the function returns). Local variables and function arguments. A pointer to the previous stack frame. backtrace uses these frame pointers to trace the sequence of function calls that led to the current point in the program. 2. Symbol Resolution (Optional): The backtrace_symbols function (which often accompanies backtrace) takes the addresses obtained from backtrace and attempts to resolve them to function names, source file names, and line numbers. This symbol resolution typically involves: Reading debugging information from the executable file (e.g., DWARF information). Using symbol tables to map addresses to function names. 3. No ptrace Involvement: backtrace operates within the context of the current process. It inspects the process's own memory (the call stack). ptrace is used for debugging other processes or for controlling the execution of a child process. backtrace doesn't need to do that. Therefore, backtrace does not require any special privileges or capabilities. Key Points: backtrace is a self-inspection mechanism. It relies on the structure of the call stack, which is a fundamental part of how programs execute. Symbol resolution (using backtrace_symbols) enhances the output of backtrace by providing human-readable information. In summary: backtrace is a valuable tool for debugging and profiling, and it works by directly examining the process's call stack, without the need for the ptrace system call.
/* Store up to SIZE return address of the current program state in
   ARRAY and return the exact number of values stored.  */
extern int backtrace (void **__array, int __size) __nonnull ((1));
/* Return names of functions from the backtrace list in ARRAY in a newly
   malloc()ed memory block.  */
extern char **backtrace_symbols (void *const *__array, int __size)
     __THROW __nonnull ((1));
/* The nonnull function attribute marks pointer parameters that
   must not be NULL.  This has the name __nonnull in glibc,
   and __attribute_nonnull__ in files shared with Gnulib to avoid
   collision with a different __nonnull in DragonFlyBSD 5.9.  */
#ifndef __attribute_nonnull__
# if __GNUC_PREREQ (3,3) || __glibc_has_attribute (__nonnull__)
#  define __attribute_nonnull__(params) __attribute__ ((__nonnull__ params))
# else
#  define __attribute_nonnull__(params)
# endif
#endif
#ifndef __nonnull
# define __nonnull(params) __attribute_nonnull__ (params)
#endif
所以,这里的宏是可以使用从1开始计数的参数位置来强制编译器检查参数不为空!神奇啊!一天学习一点就不得了了,何况学习了这么多!1. __nonnull ((1)) Attribute: Purpose: This attribute tells the compiler that the first parameter (__array) of the backtrace function must not be a null pointer. Mechanism: It leverages the __attribute__ ((__nonnull__ params)) GCC compiler extension. params: The (1) within the __nonnull ((1)) specifies that the first parameter (parameter index 1) is the one that should not be null. If there were multiple non-null parameters, you could list their indices (e.g., __nonnull ((1, 3))). 2. Macro Expansion: __nonnull(params): This macro simply expands to __attribute_nonnull__ (params). __attribute_nonnull__(params): This macro checks if the compiler supports the __nonnull__ attribute: If using GCC 3.3 or later, or if the glibc header indicates __nonnull__ support, it expands to __attribute__ ((__nonnull__ params)). Otherwise, it expands to nothing (effectively removing the attribute). 3. Compiler's Role: Compile-Time Warnings: When the __nonnull__ attribute is active, the compiler can generate warnings if it detects that the function is called with a null pointer as the first argument. Optimization: The compiler might also use this information to optimize code, as it can assume that the parameter is never null.
三月十三日 等待变化等待机会
三月十四日 等待变化等待机会
// Original function
void real_function() {
    // ... implementation ...
}
// Weak alias
void old_function() __attribute__ ((weak, alias ("real_function")));
- In this example, old_function is a weak alias for real_function.
- If a user defines a strong symbol named old_function, that definition will be used.
- Otherwise, real_function will be used.
那么这样意味着我如果要使用这个功能,我需要先把巨大的模型文件搬运到那个mount里,而且还不是简单文件拷贝,这个还真的不容易啊!The Core Misunderstanding: hugetlbfs Is Not a Regular File System: This is the most crucial point. hugetlbfs is fundamentally different from regular file systems like ext4 or xfs. It's a memory-backed file system, meaning that files within it reside entirely in RAM (specifically, HugePages). You do not copy files to hugetlbfs in the same way you copy them to a disk-based file system. The Correct Usage of hugetlbfs: Allocation Within hugetlbfs: The primary purpose of hugetlbfs is to provide a mechanism for applications to allocate memory directly from HugePages. You create files within hugetlbfs to reserve and manage regions of HugePages. The files themselves do not contain data in the traditional sense; they represent memory regions. mmap and hugetlbfs: The typical usage pattern is to: Create a file within hugetlbfs with a size that matches the desired HugePages allocation. mmap that file with the MAP_HUGETLB flag. This establishes a direct mapping between the application's virtual memory and the allocated HugePages. Data Population: After mmap, you can then populate the mapped memory region with data. This data might come from another file, from network input, or be generated by the application. The important thing is that the data resides in RAM, specifically in the HugePages allocated through hugetlbfs. Why Copying Fails: No Disk Storage: hugetlbfs does not have an underlying disk storage mechanism. When you try to copy a file, the cp command attempts to write data to disk, which is not possible. Memory Allocation, Not File Copying: The kernel interprets file creation and writes within hugetlbfs as memory allocation requests. If the allocation fails (e.g., due to insufficient or non-contiguous HugePages), you'll get an error.
三月十五日 等待变化等待机会
三月十六日 等待变化等待机会
$ ls /boot/efi/EFI/ubuntu/ BOOTX64.CSV grub.cfg grubx64.efi mmx64.efi shimx64.efi $ cat /boot/efi/EFI/ubuntu/grub.cfg search.fs_uuid f00dad3b-90a9-4155-aa92-0077432099f6 root hd0,gpt2 set prefix=($root)'/grub' configfile $prefix/grub.cfg $ blkid /dev/mapper/ubuntu--vg-lv--0: UUID="8e86f709-3b0d-40a2-8f9f-e9878a74a1f2" BLOCK_SIZE="4096" TYPE="ext4" /dev/mapper/ubuntu--vg-ubuntu--lv: UUID="34d3a3d3-29b1-4075-bf5e-707faa840b24" BLOCK_SIZE="4096" TYPE="ext4" /dev/sda2: UUID="f00dad3b-90a9-4155-aa92-0077432099f6" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="723d5e62-c97d-44a0-80b7-dd18f67eab4b" /dev/sda3: UUID="FrVw7a-ePu5-VBY8-H1MY-z2aC-3ZNO-PcRCm2" TYPE="LVM2_member" PARTUUID="df45327f-5439-4cba-8347-065ce38e7ce2" /dev/sda1: UUID="DE16-179A" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="8f734327-9435-42a0-8d1e-e810171e6f76"
三月十七日 等待变化等待机会
static_assert(sizeof(off_t) == 8);
tgt_fd = open(argv[2], O_CREAT|O_RDWR|O_EXCL, 0666);
...
void* ptr = mmap(NULL, tgtSize, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_HUGETLB, tgt_fd, 0);
不过我目前不想去写一个while loop去把这部分读的过程完善。以后再写吧。On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or be‐ cause we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal. See also NOTES. On error, -1 is returned, and errno is set appropriately. In this case, it is left unspecified whether the file position (if any) changes.
三月十八日 等待变化等待机会
echo "# hugecp" >> README.md
git init
git add README.md
git commit -m "first commit"
git branch -M main
git remote add origin https://github.com/nickhuang99/hugecp.git
git push -u origin main
三月十九日 等待变化等待机会
Yes, the 671B model in your llama.cpp output is indeed quantized. Here's how we can tell:
    File Type: The line print_info: file type = Q4_K - Medium indicates that the model has been quantized to the Q4_K format, specifically the Medium variant.   
GIT_LFS_SKIP_SMUDGE=1 git clone git@hf.co:deepseek-ai/DeepSeek-R1
### 关键时间消耗点
1. **模型加载时间(主要瓶颈)**:
   - 若使用 **4-bit量化**(约335GB):
     - **NVMe SSD(3.5GB/s)**:约 **2分钟**
     - **SATA SSD(500MB/s)**:约 **11分钟**
     - **HDD(150MB/s)**:约 **38分钟**
   - 若使用 **未量化FP16模型(1.3TB)**:
     - 您的内存容量(1.5TB)刚好满足,但需要更长时间加载
2. **内存初始化与预热**:
   - 约 **1-5分钟**(取决于CPU并行化能力)
3. **上下文初始化**:
   - 约 **30秒-2分钟**
### 总预估时间范围
- **最佳场景**(NVMe SSD + 4-bit量化):**3-7分钟**
- **典型场景**(SATA SSD + 4-bit量化):**12-18分钟**
- **最差场景**(HDD + FP16模型):**40分钟+**
### 优化建议
1. **强制量化**:务必使用 `--n-gpu-layers 0` 禁用GPU加速
2. **内存预分配**:添加 `--mlock` 参数锁定内存
3. **并行加载**:设置 `--threads N`(N=物理核心数)
4. **使用mmap加速**:通过 `--mmap` 启用内存映射
三月二十日 等待变化等待机会
$ sudo numastat -p 7188
Per-node process memory usage (in MBs) for PID 7188 (llama-cli)
                           Node 0          Node 1           Total
                  --------------- --------------- ---------------
Huge                    386048.00            0.00       386048.00
Heap                        40.40            0.00           40.40
Stack                        0.04            0.00            0.04
Private                  19582.44            4.27        19586.71
----------------  --------------- --------------- ---------------
Total                   405670.88            4.27       405675.15
$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
node 0 size: 774008 MB
node 0 free: 318183 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
node 1 size: 774086 MB
node 1 free: 1564 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 
void update_progress(int progress) {
        int bar_length = 40; // Modify this to change the bar's length
        int filled_length = (int)(bar_length * progress / 100.0);
        char bar[bar_length + 1]; // +1 for the null terminator
        for (int i = 0; i < bar_length; i++) {
                if (i < filled_length) {
                        bar[i] = '=';
                } else {
                        bar[i] = '-';
                }
        }
        bar[bar_length] = '\0'; // Null-terminate the string
        printf("\r[%s] %d%%", bar, progress);
        fflush(stdout); // Ensure output is written immediately
}
$ ~/Downloads/hugecp /mnt/hugepages/deepseek-r1-numa-node0 /mnt/hugepages/deepseek-r1-no-numa
[===-------------------------------------] 8%
三月二十一日 等待变化等待机会
浅浅的clone,否则就一直结束不了。
GIT_LFS_SKIP_SMUDGE=1 git clone https://cnb.cool/ai-models/deepseek-ai/DeepSeek-R1-GGUF/DeepSeek-R1-Q8_0.git
cd DeepSeek-R1-Q8_0/
git lfs pull
这里对于这个公式要简单的理解一下,就是你每次对于下一个词的概率如果都是100%的话,那么Perplexity就是1,因为指数总和是0。而信心来自于概率大小,比如只有50%的话,总和就是1,而perplexity就是2,所以说越大信心越差啊。 另一个概念是siinusoidal position,以下是DeepSeek的解释:Perplexity is a measurement used primarily in the fields of natural language processing (NLP) and information theory to evaluate how well a probability model predicts a sample. In simpler terms, it assesses how "perplexed" or uncertain a model is when making predictions.
Key Points:
In NLP: Perplexity is commonly used to evaluate language models. It measures how well the model predicts a sequence of words. A lower perplexity indicates that the model is more confident and accurate in its predictions, while a higher perplexity suggests greater uncertainty.
Mathematical Definition: For a language model, perplexity is defined as the exponential of the cross-entropy loss. If the model assigns a probability to a sequence of words, the perplexity is:
Here, is the number of words in the sequence.
Interpretation: A perplexity of means that the model is as confused as if it had to choose uniformly and independently among possibilities. For example, a perplexity of 1 means the model is perfectly confident (ideal case), while higher values indicate worse performance.
Applications: Perplexity is widely used to compare different language models, tune hyperparameters, and assess the quality of text generation systems.
In summary, perplexity is a useful metric for understanding the performance of probabilistic models, especially in NLP, where it quantifies how well a model predicts a given sequence of words.
"Sinusoidal position" typically refers to a method of encoding positional information in sequences, commonly used in neural networks, particularly in transformer models like those in natural language processing (NLP). This technique is known as sinusoidal positional encoding.Key Points:
Purpose:
In models like transformers, the input data (e.g., words in a sentence) is processed in parallel rather than sequentially. This means the model doesn't inherently know the order of elements in the sequence.
Sinusoidal positional encoding is used to inject information about the position of each element in the sequence into the model, helping it understand the order of elements.
How It Works:
For each position in the sequence and each dimension of the embedding, a unique sinusoidal function is applied:
Here, is the dimensionality of the embedding, and is the position in the sequence.
This creates a unique pattern for each position, allowing the model to distinguish between different positions in the sequence.
Properties:
Uniqueness: Each position has a unique encoding.
Relative Position: The sinusoidal nature of the encoding allows the model to learn relative positions more easily, as the encoding for a position can be derived from the encoding for position using trigonometric identities.
Smoothness: The encoding varies smoothly across positions, which helps the model generalize better.
Applications:
Sinusoidal positional encoding is widely used in transformer-based models like BERT, GPT, and others for tasks such as machine translation, text generation, and more.
Example:
For a sequence of words in a sentence, each word's embedding is combined with its sinusoidal positional encoding to provide the model with both the word's meaning and its position in the sentence.
In summary, sinusoidal positional encoding is a technique to provide positional information to models that process sequences in parallel, ensuring they can understand and utilize the order of elements in the sequence.