一月二日 等待变化等待机会
[Interface]
# 服务器私钥
PrivateKey = ...
# 服务器在Wireguard网络中的IP地址
Address = 10.0.0.1/24
ListenPort = 5182
PostUp= iptables -A FORWARD -i wg0 -j ACCEPT; iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
PostDown = iptables -D FORWARD -i wg0 -j ACCEPT; iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE
# 定义客户端
[Peer]
# 客户端公钥
PublicKey =...
# 允许客户端访问的IP地址范围
AllowedIPs = 10.0.0.2/32
routing的配置:(这个应该是默认的)
$ ip route
default via 172.31.16.1 dev eth0 proto dhcp src 172.31.29.227 metric 100
10.0.0.0/24 dev wg0 proto kernel scope link src 10.0.0.1
172.31.0.2 via 172.31.16.1 dev eth0 proto dhcp src 172.31.29.227 metric 100
172.31.16.0/20 dev eth0 proto kernel scope link src 172.31.29.227 metric 100
172.31.16.1 dev eth0 proto dhcp scope link src 172.31.29.227 metric 100
DNS解析配置应该没有改变。
[Interface]
# 客户端私钥
PrivateKey = ...
# 客户端在Wireguard网络中的IP地址
Address = 10.0.0.2/24
DNS=172.31.29.227
[Peer]
# 服务器公钥
PublicKey = ...
# 服务器的公网IP地
Endpoint = 54.146.251.127:5182
AllowedIPs = 0.0.0.0/0
这里我虽然设置了服务器的内网地址DNS=172.31.29.227作为DNS服务器,但是我随后又在DNS配置里设置为wireguard服务器ip所以,这个设置是不需要的!gemini对于这一点非常的肯定以至于我一开始将信将疑。
$ resolvectl
Global
Protocols: -LLMNR -mDNS DNSOverTLS=opportunistic DNSSEC=no/unsupported
resolv.conf mode: foreign
Current DNS Server: 10.0.0.1
DNS Servers: 10.0.0.1
Link 2 (enp0s31f6)
Current Scopes: DNS
Protocols: +DefaultRoute +LLMNR -mDNS DNSOverTLS=opportunistic DNSSEC=no/unsupported
Current DNS Server: 218.85.157.99
DNS Servers: 218.85.152.99 218.85.157.99 fe80::1%23667
Link 3 (wlp109s0)
Current Scopes: none
Protocols: -DefaultRoute +LLMNR -mDNS DNSOverTLS=opportunistic DNSSEC=no/unsupported
Link 766 (wg0)
Current Scopes: none
Protocols: -DefaultRoute +LLMNR -mDNS DNSOverTLS=opportunistic DNSSEC=no/unsupported
而routing的配置是这样子的:
$ ip route
0.0.0.0/1 via 10.0.0.1 dev wg0
default via 192.168.1.1 dev enp0s31f6 proto dhcp metric 100
10.0.0.0/24 dev wg0 proto kernel scope link src 10.0.0.2
54.146.251.127 via 192.168.1.1 dev enp0s31f6
169.254.0.0/16 dev enp0s31f6 scope link metric 1000
192.168.1.0/24 dev enp0s31f6 proto kernel scope link src 192.168.1.6 metric 100
这里我仅仅额外根据wireguard官网说明让所有的流量都走服务器。而这个54.146.251.127 via 192.168.1.1 dev enp0s31f6 并不是wg-quick做的,而是我自己额外需要使用ssh直接访问aws/ec2服务器做的。
sudo ip route add 0.0.0.0/1 dev wg0
sudo ip route add 128.0.0.0/1 dev wg0
最后是服务器端必须要设置NAT和foward,这个是普通外网访问internet必备的,我以前以为这个是vpn自动要去实现的,实际上wireguard作为一个极其简单的vpn软件仅仅实现了最重要的部分,也就是加密通道的部分,其余都要自己去做。这里是谷歌的人工智能gemini的解释,堪称经典:
Is wireguard server "Forwarding" mainly for firewall?
That's a very good question that gets at the heart of how networking works.
While it's true that forwarding is a core function of a firewall, it's not exclusively a firewall function. In the context of a WireGuard server (or any router), "forwarding" refers to the basic network process of taking a packet received on one network interface and sending it out another network interface.
Here's a breakdown:
Forwarding in General:
- Core Routing Function: At its most fundamental level, forwarding is what routers (including your WireGuard server) do. When a packet arrives at a router, it examines the destination IP address.3 If the destination is on a different network than the incoming interface, the router "forwards" the packet to the appropriate outgoing interface.
- Essential for Inter-Network Communication: Without forwarding, devices on different networks (like your home network and the internet) could not communicate with each other.
Forwarding in the Context of a WireGuard Server:
- Connecting Private and Public Networks: Your WireGuard server acts as a bridge between the private WireGuard network (e.g., 10.8.0.0/24) and the public internet.
- iptables -A FORWARD -i wg0 -j ACCEPT: This specific iptables rule tells the Linux kernel (which is the operating system of your server) to allow packets to be forwarded from the WireGuard interface (wg0) to other interfaces (and vice versa). This is absolutely essential for your WireGuard clients to be able to reach the internet.
Relationship to Firewalls:
- Firewalls Use Forwarding: Firewalls use forwarding as a mechanism to inspect and filter traffic.5 A firewall can have rules that say "forward packets from this IP to that IP only if they are on port 80 (HTTP)" or "block all forwarding from this network to that network."
- Forwarding Exists Independently of Firewalls: The basic act of forwarding is a lower-level network function that exists regardless of whether you have a complex firewall setup. Even with no explicit firewall rules, the kernel still needs to know whether to forward packets or not.
Analogy
:Think of a post office:
- Forwarding: The basic act of sorting mail and sending it to the correct destination is like forwarding.
- Firewall: A security guard at the post office who checks packages for suspicious contents and decides whether to allow them to be delivered is like a firewall.
The post office needs to forward mail even without a security guard. The security guard (firewall) uses the forwarding system to do its job of filtering.
In summary:
Therefore, even if you don't have a complex firewall configuration, you still need to enable forwarding on your WireGuard server for it to function correctly. The iptables -A FORWARD ... rule is essential for this.
- Forwarding is a fundamental network function that allows devices on different networks to communicate.
- WireGuard servers rely on forwarding to connect clients on the private VPN network to the public internet.
- Firewalls use forwarding as a mechanism to filter traffic, but forwarding itself is not solely a firewall function.
listen-on { 10.0.0.1; }; // Listen on the WireGuard server's internal IP
allow-query { 10.0.0.0/24; }; // Allow queries from the WireGuard network
forwarders { 8.8.8.8; 1.1.1.1; }; // Forward unresolved queries (Google and Cloudflare DNS)
//listen-on-v6 { any; }; #Comment this line to disable ipv6 listening
我感觉这样子listen-on { 10.0.0.1; };就不会干扰原本wireguard自身的systemd-resolved的运作了。
一月二十三日 等待变化等待机会
浴景遐思 翠树华亭掩香汤, 霓裳才解玉肌凉。 纤纤葱笋扬春水, 袅袅胴体隔水香。 暖雾氤氲思旧事, 柔波潋滟梦情长。 待得浴罢添新韵, 浅笑嫣然映景光。
ffmpeg -f alsa -i hw:1 /tmp/output.mp3
效果不好的原因是我没有找到把麦克风关闭的方法。
一月二十六日 等待变化等待机会
https://dl.winehq.org/wine-builds/Release.key这个旧的做法被所谓的
keyring取代了。
$ sudo wget -o /etc/apt/keyrings/winehq-archive.key https://dl.winehq.org/wine-builds/winehq.key
$ wget -np /etc/apt/sources.list.d/ https://dl.winehq.org/wine-builds/ubuntu/dists/jammy/winehq-jammy.sources
$ $ ffmpeg -i deepseek-2025-01-26_17.20.48.mp4 -vf "setpts=0.5*PTS" wine-fix.mp4
实际上不能靠帧数fps的改动来达到目的因为这样子时间是不变的,只有靠时间戳的改变。
二月二日 等待变化等待机会
$ ./llama-tokenize -m /usr/share/ollama/.ollama/models/blobs/sha256-6150cb382311b69f09cc0f9a1b69fc029cbd742b66bb8ec531aa5ecf5c613e93 --prompt "how are you" 2>&1 | grep general.name
general.name str = DeepSeek R1 Distill Qwen 32B
print_info: general.name = DeepSeek R1 Distill Qwen 32B
二月十五日 等待变化等待机会
二月十九日 等待变化等待机会
On Ubuntu, Ollama saves models by default to /usr/share/ollama/.ollama/models.之前我尝试使用软链接,可是不行。这个内容是重要的,我之前没有修订用户就不可以sudo mkdir -p /etc/systemd/system/ollama.service.d; sudo nano /etc/systemd/system/ollama.service.d/override.conf
[Service] User=nick Group=nick Environment="OLLAMA_MODELS=/home/nick/workspace/OLLAMA/.ollama/models"
sudo swapoff -a
sudo dd if=/dev/zero of=/swapfile bs=1M count=8192
二月二十四日 等待变化等待机会
三月四日 等待变化等待机会
$ sudo vgdisplay
--- Volume group ---
VG Name ubuntu-vg
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 3
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 1
Max PV 0
Cur PV 1
Act PV 1
VG Size <1.82 TiB
PE Size 4.00 MiB
Total PE 476018
Alloc PE / Size 476018 / <1.82 TiB
Free PE / Size 0 / 0
VG UUID fvQTI5-0C8Q-9eVF-VkQ0-fmgh-w5XH-GewwFP
而我的LV的情况是
$ sudo lvdisplay
--- Logical volume ---
LV Path /dev/ubuntu-vg/ubuntu-lv
LV Name ubuntu-lv
VG Name ubuntu-vg
LV UUID 3LexV3-Imjh-NB0K-oGf0-gVbz-PiF8-3S8Non
LV Write Access read/write
LV Creation host, time ubuntu-server, 2025-02-22 07:55:25 +0000
LV Status available
# open 1
LV Size 100.00 GiB
Current LE 25600
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:0
扩容后LV Size变成了最大值1.82 TiB。具体步骤是
$ sudo pvcreate /dev/sda3
$ sudo vgextend ubuntu-vg /dev/sda3
$ sudo lvextend -l +100%FREE /dev/ubuntu-vg/ubuntu-lv
$ sudo resize2fs /dev/ubuntu-vg/ubuntu-lv
$ df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 101G 1.9M 101G 1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv 1.8T 12G 1.8T 1% /
tmpfs 504G 0 504G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sda2 2.0G 246M 1.6G 14% /boot
/dev/sda1 1.1G 6.1M 1.1G 1% /boot/efi
tmpfs 101G 4.0K 101G 1% /run/user/1000
三月五日 等待变化等待机会
sudo useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama
sudo usermod -a -G ollama $(whoami)
这里的好处是服务的用户不是当前的用户,可以杜绝一些服务非法访问。比如我们看看ollama这个用户的权限:
$ cat /etc/passwd | grep ollama
ollama:x:998:998::/usr/share/ollama:/bin/false
它的home我们限制死了,而且是不可登陆的。注意这里的/bin/false等效于/sbin/nologin。而作为ollama用户的home directory:
$ sudo ls -dal /usr/share/ollama/
drwxr-x--- 2 ollama ollama 4096 Mar 4 22:11 /usr/share/ollama/
也只能在group里添加才能访问。
[Unit]
Description=Ollama Service
After=network-online.target
[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=$PATH"
[Install]
WantedBy=default.target
这里如果要添加什么新的环境启动变量,并不推荐直接修改service文件,而是修改一个所谓的override,就是创建 /etc/systemd/system/ollama.service.d/override.conf然后添加。
sudo systemctl daemon-reload
sudo systemctl enable ollama
三月十一日 等待变化等待机会
build: 4858 (1e2f78a0) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_loader: loaded meta data with 42 key-value pairs and 1025 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-9801e7fce27dbf3d0bfb468b7b21f1d132131a546dfc43e50518631b8b1800a9 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = deepseek2
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.size_label str = 256x20B
llama_model_loader: - kv 3: deepseek2.block_count u32 = 61
llama_model_loader: - kv 4: deepseek2.context_length u32 = 163840
llama_model_loader: - kv 5: deepseek2.embedding_length u32 = 7168
llama_model_loader: - kv 6: deepseek2.feed_forward_length u32 = 18432
llama_model_loader: - kv 7: deepseek2.attention.head_count u32 = 128
llama_model_loader: - kv 8: deepseek2.attention.head_count_kv u32 = 128
llama_model_loader: - kv 9: deepseek2.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 10: deepseek2.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 11: deepseek2.expert_used_count u32 = 8
llama_model_loader: - kv 12: deepseek2.leading_dense_block_count u32 = 3
llama_model_loader: - kv 13: deepseek2.vocab_size u32 = 129280
llama_model_loader: - kv 14: deepseek2.attention.q_lora_rank u32 = 1536
llama_model_loader: - kv 15: deepseek2.attention.kv_lora_rank u32 = 512
llama_model_loader: - kv 16: deepseek2.attention.key_length u32 = 192
llama_model_loader: - kv 17: deepseek2.attention.value_length u32 = 128
llama_model_loader: - kv 18: deepseek2.expert_feed_forward_length u32 = 2048
llama_model_loader: - kv 19: deepseek2.expert_count u32 = 256
llama_model_loader: - kv 20: deepseek2.expert_shared_count u32 = 1
llama_model_loader: - kv 21: deepseek2.expert_weights_scale f32 = 2.500000
llama_model_loader: - kv 22: deepseek2.expert_weights_norm bool = true
llama_model_loader: - kv 23: deepseek2.expert_gating_func u32 = 2
llama_model_loader: - kv 24: deepseek2.rope.dimension_count u32 = 64
llama_model_loader: - kv 25: deepseek2.rope.scaling.type str = yarn
llama_model_loader: - kv 26: deepseek2.rope.scaling.factor f32 = 40.000000
llama_model_loader: - kv 27: deepseek2.rope.scaling.original_context_length u32 = 4096
llama_model_loader: - kv 28: deepseek2.rope.scaling.yarn_log_multiplier f32 = 0.100000
llama_model_loader: - kv 29: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 30: tokenizer.ggml.pre str = deepseek-v3
llama_model_loader: - kv 31: tokenizer.ggml.tokens arr[str,129280] = ["<|begin▁of▁sentence|>", "<�...
llama_model_loader: - kv 32: tokenizer.ggml.token_type arr[i32,129280] = [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 33: tokenizer.ggml.merges arr[str,127741] = ["Ġ t", "Ġ a", "i n", "Ġ Ġ", "h e...
llama_model_loader: - kv 34: tokenizer.ggml.bos_token_id u32 = 0
llama_model_loader: - kv 35: tokenizer.ggml.eos_token_id u32 = 1
llama_model_loader: - kv 36: tokenizer.ggml.padding_token_id u32 = 1
llama_model_loader: - kv 37: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 38: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 39: tokenizer.chat_template str = {% if not add_generation_prompt is de...
llama_model_loader: - kv 40: general.quantization_version u32 = 2
llama_model_loader: - kv 41: general.file_type u32 = 15
llama_model_loader: - type f32: 361 tensors
llama_model_loader: - type q4_K: 606 tensors
llama_model_loader: - type q6_K: 58 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Medium
print_info: file size = 376.65 GiB (4.82 BPW)
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 818
load: token to piece cache size = 0.8223 MB
print_info: arch = deepseek2
print_info: vocab_only = 0
print_info: n_ctx_train = 163840
print_info: n_embd = 7168
print_info: n_layer = 61
print_info: n_head = 128
print_info: n_head_kv = 128
print_info: n_rot = 64
print_info: n_swa = 0
print_info: n_embd_head_k = 192
print_info: n_embd_head_v = 128
print_info: n_gqa = 1
print_info: n_embd_k_gqa = 24576
print_info: n_embd_v_gqa = 16384
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-06
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: n_ff = 18432
print_info: n_expert = 256
print_info: n_expert_used = 8
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 0
print_info: rope scaling = yarn
print_info: freq_base_train = 10000.0
print_info: freq_scale_train = 0.025
print_info: n_ctx_orig_yarn = 4096
print_info: rope_finetuned = unknown
print_info: ssm_d_conv = 0
print_info: ssm_d_inner = 0
print_info: ssm_d_state = 0
print_info: ssm_dt_rank = 0
print_info: ssm_dt_b_c_rms = 0
print_info: model type = 671B
print_info: model params = 671.03 B
print_info: general.name = n/a
print_info: n_layer_dense_lead = 3
print_info: n_lora_q = 1536
print_info: n_lora_kv = 512
print_info: n_ff_exp = 2048
print_info: n_expert_shared = 1
print_info: expert_weights_scale = 2.5
print_info: expert_weights_norm = 1
print_info: expert_gating_func = sigmoid
print_info: rope_yarn_log_mul = 0.1000
print_info: vocab type = BPE
print_info: n_vocab = 129280
print_info: n_merges = 127741
print_info: BOS token = 0 '<|begin▁of▁sentence|>'
print_info: EOS token = 1 '<|end▁of▁sentence|>'
print_info: EOT token = 1 '<|end▁of▁sentence|>'
print_info: PAD token = 1 '<|end▁of▁sentence|>'
print_info: LF token = 201 'Ċ'
print_info: FIM PRE token = 128801 '<|fim▁begin|>'
print_info: FIM SUF token = 128800 '<|fim▁hole|>'
print_info: FIM MID token = 128802 '<|fim▁end|>'
print_info: EOG token = 1 '<|end▁of▁sentence|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)
我准备等补充了64GX8=512G内存再来尝试,因为量化一方面损失精度,又占用硬盘空间。以上包含了很多deepseek的有趣的设定参数。
#define GGML_ABORT(...) ggml_abort(__FILE__, __LINE__, __VA_ARGS__)
#define GGML_ASSERT(x) if (!(x)) GGML_ABORT("GGML_ASSERT(%s) failed", #x)
不要小看这个宏,这个是大家都这么做的,因为使用的时候可以添加额外的输出信息:
GGML_ASSERT(tokenizer && "Tokenizer not initialized. Call llama_vocab::init_tokenizer() first.");
这里的&&保证了它作为GGML_ABORT的输出的附加部分,展开就是:
if (!(tokenizer && "Tokenizer not initialized. Call llama_vocab::init_tokenizer() first."))
ggml_abort("/home/nick/workspace/llama.cpp/src/llama-vocab.cpp", 2330, "GGML_ASSERT(%s) failed",
"tokenizer && \"Tokenizer not initialized. Call llama_vocab::init_tokenizer() first.\"")
这里我们再来学习一下ggml_abort,这个也是经典的c语言里如何展开不定参数的标准做法。注意这里是输出到stderr而不是stdout,这个是大家约定成俗的惯例。
void ggml_abort(const char * file, int line, const char * fmt, ...) {
fflush(stdout);
fprintf(stderr, "%s:%d: ", file, line);
va_list args;
va_start(args, fmt);
vfprintf(stderr, fmt, args);
va_end(args);
fprintf(stderr, "\n");
ggml_print_backtrace();
abort();
}
而这里的ggml_print_backtrace又是一个经典的例子,我记得我在十几年前看到Paypal里输出backtrace是非常佩服的,不过要把输出结果放到服务端,因为是remote gdb的debug吧?总之那个更加的麻烦。
static void ggml_print_backtrace(void) {
const char * GGML_NO_BACKTRACE = getenv("GGML_NO_BACKTRACE");
if (GGML_NO_BACKTRACE) {
return;
}
char attach[32];
snprintf(attach, sizeof(attach), "attach %d", getpid());
int pid = fork();
if (pid == 0) {
// try gdb
execlp("gdb", "gdb", "--batch",
"-ex", "set style enabled on",
"-ex", attach,
"-ex", "bt -frame-info source-and-location",
"-ex", "detach",
"-ex", "quit",
(char *) NULL);
//后面是其他平台的做法
这里都是经典的手法,首先得到当前进程的pid使用getpid(),然后fork()里去调用gdb使用attach这个pid,这里学习使用gdb的--batch模式。不过美中不足的是,在ubuntu下这个操作是受限的,需要sudo的权限,所以,不知道我怎么能够得到当前操作系统的权限来做这个。这个是gemini的解释做法。在
/proc/sys/kernel/yama/ptrace_scope里定义的几种模式
所以,临时改动一下
- 0 (classic ptrace permissions): Anyone can ptrace any process, provided they have the necessary permissions (e.g., same user ID). This is the least secure setting.
- 1 (restricted ptrace): A process can only be ptraced by a parent process or a process running as root. This is the default setting in many modern Linux distributions, including Ubuntu 22.04.
- 2 (restricted ptrace admin): Only root can ptrace processes.
- 3 (no ptrace): No process can be ptraced, even by root.
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
永久改动/etc/sysctl.conf
kernel.yama.ptrace_scope = 0
再执行sudo sysctl -p。这样子就可以使用backtrace了。而更加的进一步的是直接使用glibc里的backtrace/backtrace_symbols这些函数是不需要用到ptrace syscall就没有权限的烦扰
这个就是神奇的工具函数!The backtrace function in the C/C++ standard library (specifically, in the GNU C Library, glibc) works by inspecting the call stack of the current process. It does not use the ptrace system call. Here's how backtrace typically works: 1. Stack Frame Inspection: The function walks up the call stack by inspecting the frame pointer (or base pointer) registers. Each stack frame contains: The return address (the address of the instruction to execute after the function returns). Local variables and function arguments. A pointer to the previous stack frame. backtrace uses these frame pointers to trace the sequence of function calls that led to the current point in the program. 2. Symbol Resolution (Optional): The backtrace_symbols function (which often accompanies backtrace) takes the addresses obtained from backtrace and attempts to resolve them to function names, source file names, and line numbers. This symbol resolution typically involves: Reading debugging information from the executable file (e.g., DWARF information). Using symbol tables to map addresses to function names. 3. No ptrace Involvement: backtrace operates within the context of the current process. It inspects the process's own memory (the call stack). ptrace is used for debugging other processes or for controlling the execution of a child process. backtrace doesn't need to do that. Therefore, backtrace does not require any special privileges or capabilities. Key Points: backtrace is a self-inspection mechanism. It relies on the structure of the call stack, which is a fundamental part of how programs execute. Symbol resolution (using backtrace_symbols) enhances the output of backtrace by providing human-readable information. In summary: backtrace is a valuable tool for debugging and profiling, and it works by directly examining the process's call stack, without the need for the ptrace system call.
/* Store up to SIZE return address of the current program state in
ARRAY and return the exact number of values stored. */
extern int backtrace (void **__array, int __size) __nonnull ((1));
/* Return names of functions from the backtrace list in ARRAY in a newly
malloc()ed memory block. */
extern char **backtrace_symbols (void *const *__array, int __size)
__THROW __nonnull ((1));
注意这里有一个宏__nonnull
/* The nonnull function attribute marks pointer parameters that
must not be NULL. This has the name __nonnull in glibc,
and __attribute_nonnull__ in files shared with Gnulib to avoid
collision with a different __nonnull in DragonFlyBSD 5.9. */
#ifndef __attribute_nonnull__
# if __GNUC_PREREQ (3,3) || __glibc_has_attribute (__nonnull__)
# define __attribute_nonnull__(params) __attribute__ ((__nonnull__ params))
# else
# define __attribute_nonnull__(params)
# endif
#endif
#ifndef __nonnull
# define __nonnull(params) __attribute_nonnull__ (params)
#endif
而这个宏的作用是编译期的检查,gemini解释说这个是一个extension
所以,这里的宏是可以使用从1开始计数的参数位置来强制编译器检查参数不为空!神奇啊!一天学习一点就不得了了,何况学习了这么多!1. __nonnull ((1)) Attribute: Purpose: This attribute tells the compiler that the first parameter (__array) of the backtrace function must not be a null pointer. Mechanism: It leverages the __attribute__ ((__nonnull__ params)) GCC compiler extension. params: The (1) within the __nonnull ((1)) specifies that the first parameter (parameter index 1) is the one that should not be null. If there were multiple non-null parameters, you could list their indices (e.g., __nonnull ((1, 3))). 2. Macro Expansion: __nonnull(params): This macro simply expands to __attribute_nonnull__ (params). __attribute_nonnull__(params): This macro checks if the compiler supports the __nonnull__ attribute: If using GCC 3.3 or later, or if the glibc header indicates __nonnull__ support, it expands to __attribute__ ((__nonnull__ params)). Otherwise, it expands to nothing (effectively removing the attribute). 3. Compiler's Role: Compile-Time Warnings: When the __nonnull__ attribute is active, the compiler can generate warnings if it detects that the function is called with a null pointer as the first argument. Optimization: The compiler might also use this information to optimize code, as it can assume that the parameter is never null.
三月十三日 等待变化等待机会
三月十四日 等待变化等待机会
// Original function
void real_function() {
// ... implementation ...
}
// Weak alias
void old_function() __attribute__ ((weak, alias ("real_function")));
- In this example, old_function is a weak alias for real_function.
- If a user defines a strong symbol named old_function, that definition will be used.
- Otherwise, real_function will be used.
那么这样意味着我如果要使用这个功能,我需要先把巨大的模型文件搬运到那个mount里,而且还不是简单文件拷贝,这个还真的不容易啊!The Core Misunderstanding: hugetlbfs Is Not a Regular File System: This is the most crucial point. hugetlbfs is fundamentally different from regular file systems like ext4 or xfs. It's a memory-backed file system, meaning that files within it reside entirely in RAM (specifically, HugePages). You do not copy files to hugetlbfs in the same way you copy them to a disk-based file system. The Correct Usage of hugetlbfs: Allocation Within hugetlbfs: The primary purpose of hugetlbfs is to provide a mechanism for applications to allocate memory directly from HugePages. You create files within hugetlbfs to reserve and manage regions of HugePages. The files themselves do not contain data in the traditional sense; they represent memory regions. mmap and hugetlbfs: The typical usage pattern is to: Create a file within hugetlbfs with a size that matches the desired HugePages allocation. mmap that file with the MAP_HUGETLB flag. This establishes a direct mapping between the application's virtual memory and the allocated HugePages. Data Population: After mmap, you can then populate the mapped memory region with data. This data might come from another file, from network input, or be generated by the application. The important thing is that the data resides in RAM, specifically in the HugePages allocated through hugetlbfs. Why Copying Fails: No Disk Storage: hugetlbfs does not have an underlying disk storage mechanism. When you try to copy a file, the cp command attempts to write data to disk, which is not possible. Memory Allocation, Not File Copying: The kernel interprets file creation and writes within hugetlbfs as memory allocation requests. If the allocation fails (e.g., due to insufficient or non-contiguous HugePages), you'll get an error.
三月十五日 等待变化等待机会
三月十六日 等待变化等待机会
$ ls /boot/efi/EFI/ubuntu/ BOOTX64.CSV grub.cfg grubx64.efi mmx64.efi shimx64.efi $ cat /boot/efi/EFI/ubuntu/grub.cfg search.fs_uuid f00dad3b-90a9-4155-aa92-0077432099f6 root hd0,gpt2 set prefix=($root)'/grub' configfile $prefix/grub.cfg $ blkid /dev/mapper/ubuntu--vg-lv--0: UUID="8e86f709-3b0d-40a2-8f9f-e9878a74a1f2" BLOCK_SIZE="4096" TYPE="ext4" /dev/mapper/ubuntu--vg-ubuntu--lv: UUID="34d3a3d3-29b1-4075-bf5e-707faa840b24" BLOCK_SIZE="4096" TYPE="ext4" /dev/sda2: UUID="f00dad3b-90a9-4155-aa92-0077432099f6" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="723d5e62-c97d-44a0-80b7-dd18f67eab4b" /dev/sda3: UUID="FrVw7a-ePu5-VBY8-H1MY-z2aC-3ZNO-PcRCm2" TYPE="LVM2_member" PARTUUID="df45327f-5439-4cba-8347-065ce38e7ce2" /dev/sda1: UUID="DE16-179A" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="8f734327-9435-42a0-8d1e-e810171e6f76"
三月十七日 等待变化等待机会
static_assert(sizeof(off_t) == 8);
这个放在代码里让我安心!
tgt_fd = open(argv[2], O_CREAT|O_RDWR|O_EXCL, 0666);
...
void* ptr = mmap(NULL, tgtSize, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_HUGETLB, tgt_fd, 0);
不过我目前不想去写一个while loop去把这部分读的过程完善。以后再写吧。On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or be‐ cause we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal. See also NOTES. On error, -1 is returned, and errno is set appropriately. In this case, it is left unspecified whether the file position (if any) changes.
三月十八日 等待变化等待机会
echo "# hugecp" >> README.md
git init
git add README.md
git commit -m "first commit"
git branch -M main
git remote add origin https://github.com/nickhuang99/hugecp.git
git push -u origin main
三月十九日 等待变化等待机会
Yes, the 671B model in your llama.cpp output is indeed quantized. Here's how we can tell:
File Type: The line print_info: file type = Q4_K - Medium indicates that the model has been quantized to the Q4_K format, specifically the Medium variant.
GIT_LFS_SKIP_SMUDGE=1 git clone git@hf.co:deepseek-ai/DeepSeek-R1
### 关键时间消耗点
1. **模型加载时间(主要瓶颈)**:
- 若使用 **4-bit量化**(约335GB):
- **NVMe SSD(3.5GB/s)**:约 **2分钟**
- **SATA SSD(500MB/s)**:约 **11分钟**
- **HDD(150MB/s)**:约 **38分钟**
- 若使用 **未量化FP16模型(1.3TB)**:
- 您的内存容量(1.5TB)刚好满足,但需要更长时间加载
2. **内存初始化与预热**:
- 约 **1-5分钟**(取决于CPU并行化能力)
3. **上下文初始化**:
- 约 **30秒-2分钟**
### 总预估时间范围
- **最佳场景**(NVMe SSD + 4-bit量化):**3-7分钟**
- **典型场景**(SATA SSD + 4-bit量化):**12-18分钟**
- **最差场景**(HDD + FP16模型):**40分钟+**
### 优化建议
1. **强制量化**:务必使用 `--n-gpu-layers 0` 禁用GPU加速
2. **内存预分配**:添加 `--mlock` 参数锁定内存
3. **并行加载**:设置 `--threads N`(N=物理核心数)
4. **使用mmap加速**:通过 `--mmap` 启用内存映射
三月二十日 等待变化等待机会
$ sudo numastat -p 7188
Per-node process memory usage (in MBs) for PID 7188 (llama-cli)
Node 0 Node 1 Total
--------------- --------------- ---------------
Huge 386048.00 0.00 386048.00
Heap 40.40 0.00 40.40
Stack 0.04 0.00 0.04
Private 19582.44 4.27 19586.71
---------------- --------------- --------------- ---------------
Total 405670.88 4.27 405675.15
内存也是使用numa来绑定的
$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
node 0 size: 774008 MB
node 0 free: 318183 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
node 1 size: 774086 MB
node 1 free: 1564 MB
node distances:
node 0 1
0: 10 20
1: 20 10
可是效果并不明显。
void update_progress(int progress) {
int bar_length = 40; // Modify this to change the bar's length
int filled_length = (int)(bar_length * progress / 100.0);
char bar[bar_length + 1]; // +1 for the null terminator
for (int i = 0; i < bar_length; i++) {
if (i < filled_length) {
bar[i] = '=';
} else {
bar[i] = '-';
}
}
bar[bar_length] = '\0'; // Null-terminate the string
printf("\r[%s] %d%%", bar, progress);
fflush(stdout); // Ensure output is written immediately
}
另一个好处是我现在如果要绑定numa,我直接从现有的hugetlbfs里拷贝速度飞快。
$ ~/Downloads/hugecp /mnt/hugepages/deepseek-r1-numa-node0 /mnt/hugepages/deepseek-r1-no-numa
[===-------------------------------------] 8%
三月二十一日 等待变化等待机会
浅浅的clone,否则就一直结束不了。
GIT_LFS_SKIP_SMUDGE=1 git clone https://cnb.cool/ai-models/deepseek-ai/DeepSeek-R1-GGUF/DeepSeek-R1-Q8_0.git
cd DeepSeek-R1-Q8_0/
git lfs pull
当然前提是你要安装git lfs。
而它的原生的没有量化的模型是在https://cnb.cool/ai-models/deepseek-ai/DeepSeek-R1.git,我也想实验一下这个。
这里对于这个公式要简单的理解一下,就是你每次对于下一个词的概率如果都是100%的话,那么Perplexity就是1,因为指数总和是0。而信心来自于概率大小,比如只有50%的话,总和就是1,而perplexity就是2,所以说越大信心越差啊。 另一个概念是siinusoidal position,以下是DeepSeek的解释:Perplexity is a measurement used primarily in the fields of natural language processing (NLP) and information theory to evaluate how well a probability model predicts a sample. In simpler terms, it assesses how "perplexed" or uncertain a model is when making predictions.
Key Points:
In NLP: Perplexity is commonly used to evaluate language models. It measures how well the model predicts a sequence of words. A lower perplexity indicates that the model is more confident and accurate in its predictions, while a higher perplexity suggests greater uncertainty.
Mathematical Definition: For a language model, perplexity is defined as the exponential of the cross-entropy loss. If the model assigns a probability to a sequence of words, the perplexity is:
Here, is the number of words in the sequence.
Interpretation: A perplexity of means that the model is as confused as if it had to choose uniformly and independently among possibilities. For example, a perplexity of 1 means the model is perfectly confident (ideal case), while higher values indicate worse performance.
Applications: Perplexity is widely used to compare different language models, tune hyperparameters, and assess the quality of text generation systems.
In summary, perplexity is a useful metric for understanding the performance of probabilistic models, especially in NLP, where it quantifies how well a model predicts a given sequence of words.
"Sinusoidal position" typically refers to a method of encoding positional information in sequences, commonly used in neural networks, particularly in transformer models like those in natural language processing (NLP). This technique is known as sinusoidal positional encoding.Key Points:
Purpose:
In models like transformers, the input data (e.g., words in a sentence) is processed in parallel rather than sequentially. This means the model doesn't inherently know the order of elements in the sequence.
Sinusoidal positional encoding is used to inject information about the position of each element in the sequence into the model, helping it understand the order of elements.
How It Works:
For each position in the sequence and each dimension of the embedding, a unique sinusoidal function is applied:
Here, is the dimensionality of the embedding, and is the position in the sequence.
This creates a unique pattern for each position, allowing the model to distinguish between different positions in the sequence.
Properties:
Uniqueness: Each position has a unique encoding.
Relative Position: The sinusoidal nature of the encoding allows the model to learn relative positions more easily, as the encoding for a position can be derived from the encoding for position using trigonometric identities.
Smoothness: The encoding varies smoothly across positions, which helps the model generalize better.
Applications:
Sinusoidal positional encoding is widely used in transformer-based models like BERT, GPT, and others for tasks such as machine translation, text generation, and more.
Example:
For a sequence of words in a sentence, each word's embedding is combined with its sinusoidal positional encoding to provide the model with both the word's meaning and its position in the sentence.
In summary, sinusoidal positional encoding is a technique to provide positional information to models that process sequences in parallel, ensuring they can understand and utilize the order of elements in the sequence.