backfill oct

benthecoder · benthecoder · commit afed545d17ed · 2025-10-21T01:59:57.000-05:00
diff --git a/posts/011025.md b/posts/011025.md
@@ -0,0 +1,7 @@
+---
+title: 'b10 h100s'
+tags: 'journal'
+date: 'Oct 1, 2025'
+---
+
+had the opportunity of playing around with b10 inference. you can tweak the min and max replicas, and concurrency targets, and h100's cost .133 dollars per minute, which adds up to ~191.5 per day per replica. if you spin up 20 of these it's 3800 a day. is that expensive or not. i'm not sure. but that is vc money being put to good use for sure, especially given the use case.
diff --git a/posts/021025.md b/posts/021025.md
@@ -0,0 +1,41 @@
+---
+title: 'guided decoding'
+tags: 'journal, llm'
+date: 'Oct 2, 2025'
+---
+
+got to finetune llama 8b and 70b and gemma models for the task. 
+
+i'm actually doing finetuning rather than just writing prompts now. it's really fun.
+
+also looked into [guided decoding](https://guideddecoding.github.io/) which is how u can guarantee llms output valid structured data
+
+traditional decoding samples from full vocab:
+
+`p(token_t | context)` -> softmax over entire vocab
+
+this means the model can generate anything, sometimes you get valid json, sometimes not
+
+the solution: guided decoding masks invalid tokens at each
+
+`p(token_t | context, constraints)` -> softmax over valid_tokens
+
+how it works: finite state automaton (FSA)- basically a lookup table that says, in state x, these are the valid tokens.
+
+1. compile constraints -> FSA (one time, cached). your json schema becomes a state machine
+2. during generation:
+   - checks current FSA state
+   - lookup which tokens are valid
+   - mask everything else
+   - sample from valid tokens only
+3. after each token:
+   - update fsa state
+   - repeat
+
+vLLM supports three backends for this
+
+- outlines: good for regex
+- lm-format-enforcer: character level
+- xgrammar: optimized for nested structures
+
+the con? overhead is a 5-15% slower generation
diff --git a/posts/031025.md b/posts/031025.md
@@ -0,0 +1,9 @@
+---
+title: 'model evals'
+tags: 'journal'
+date: 'Oct 3, 2025'
+---
+
+the models were finetuned today. with an h100, it only took a few hours for 7b, less so for gemma 4b . the 70b on the other hand is a beefy one. without fsdp or deepspeed which would've sped up finetuning considerably, which is tough to setup.
+
+with claude code, making plots and performance reports is so easy. adhoc scripts are all one shotted.
diff --git a/posts/041025.md b/posts/041025.md
@@ -0,0 +1,11 @@
+---
+title: 'poetry, pyenv, direnv'
+tags: 'journal'
+date: 'Oct 4, 2025'
+---
+
+made prs for different parts of my code. it wasn't until i started at oe that i realize the importance of compartmentalizing parts of my code so its easier to review.
+
+also set up poetry development environment with pyenv, and direnv for auto-activation. python env is still a headache. i now just use uv. i believe uv solves everything
+
+went to pickup the pottery we made at mud studio. they actually turned out so well. i would love to do it again with T
diff --git a/posts/051025.md b/posts/051025.md
@@ -0,0 +1,9 @@
+---
+title: 'aus -> miami'
+tags: 'journal'
+date: 'Oct 5, 2025'
+---
+
+flew southwest to miami. the flight was 3 hours. i sketched out all components i needed to put everything together. i've been flying a lot lately. one of the biggest perks working here is the amount of exposure to new things. i have never been to texas nor florida just 2 months ago. and now i live in austin, and i might even relocate to miami. things are moving fast, and i'm not sure i'm even keeping up with who i'm becoming.
+
+me and R got to east miami and we went for sushi soon after. it was at a mall beside our hotel. the food was food truck quality, but it was a nice hangout with him. the hotel is nice. it has a balcony where you can oversee other fancy high rise apartments with a large balcony as well. miami reminds me of malaysia a lot.
diff --git a/posts/061025.md b/posts/061025.md
@@ -0,0 +1,13 @@
+---
+title: 'miami office'
+tags: 'journal'
+date: 'Oct 6, 2025'
+---
+
+i got room service. acai bowl and some sausages and a smoothie.
+
+had a chat with e about the plan and updates. the goal was to speed up and try to fit 70b in one h100 instead of two. tried out online dynamic quantization with vllm (fp8). it was actually slower than before and also still required the same amount of memory. pointless.
+
+next explored qlora training with bitsandbytes. kept facing 0.0 grad_norm issues and tried to debug the entire day and couldn't figure out why. a problem for another day to solve.
+
+at night all of us went to the rooftop and everyone shared about their past failures and projects and i felt lucky to be here at a stage where things are more stable and where growth is skyrocketing, but also wished i was part of the early experience and just building fun experimental features and making mistakes. i'm at a point where i feel like i'm the new guy that doesn't fit in anywhere, and it's hard to bond when i have little in common with everyone. but i'm sure i will find my place eventually. insecurity and overthinking stems from thinking too much about myself. i just need to relax and by myself.
diff --git a/posts/071025.md b/posts/071025.md
@@ -0,0 +1,7 @@
+---
+title: 'asking for help'
+tags: 'journal'
+date: 'Oct 7, 2025'
+---
+
+finding it hard to reach out to people and ask for help when everyone is busy with their own projects. also finding it hard to communicate and chat with people around me because i'm not that familiar with the language and culture yet. i can feel my brain constantly going, you should participate in this convo, but what should i say, should i say it now? if i say it would i be weird? i'm already new, do i want this to be their impression of me? my overthinking muscle goes hyperactive and i just end up wanting to stick my head under the ground. i am an awkward boy. i'm still learning to accept that fact, and also be more confident in taking little stabs at conversing and making jokes. it is an art. i'm practicing imitation learning 24/7.
diff --git a/posts/081025.md b/posts/081025.md
@@ -0,0 +1,125 @@
+---
+title: 'partial'
+tags: 'python'
+date: 'Oct 8, 2025'
+---
+
+understanding partial() with async/futures
+
+## the basic idea
+
+partial "freezes" args in a fn so you don't have to pass them every time
+
+```py
+from functools import partial
+
+def add(a, b, c):
+    return a + b + c
+
+add_5_and_10 = partial(add, 5, 10)
+add_5_and_10(3)  # returns 18 (same as add(5, 10, 3))
+```
+
+## the problem: fetching from multiple APIs
+
+imagine you need to fetch user data from 3 different API endpoints at the same time
+
+here's the messy way:
+
+```py
+import asyncio
+from functools import partial
+from concurrent.futures import ThreadPoolExecutor
+
+def fetch_data(user_id, api_endpoint, timeout=30, retry=3, api_key="secret"):
+    return f"Data from {api_endpoint} for user {user_id}"
+
+async def get_user_data_messy(user_id):
+    executor = ThreadPoolExecutor()
+    loop = asyncio.get_event_loop()
+    
+    # repetition
+    future1 = loop.run_in_executor(
+        executor,
+        lambda: fetch_data(user_id, "profile", 30, 3, "secret")
+    )
+    future2 = loop.run_in_executor(
+        executor,
+        lambda: fetch_data(user_id, "orders", 30, 3, "secret")
+    )
+    future3 = loop.run_in_executor(
+        executor,
+        lambda: fetch_data(user_id, "reviews", 30, 3, "secret")
+    )
+    
+    results = await asyncio.gather(future1, future2, future3)
+    return results
+```
+
+the clean way with partial:
+
+```py
+async def get_user_data_clean(user_id):
+    executor = ThreadPoolExecutor()
+    loop = asyncio.get_event_loop()
+    
+    # common way
+    fetcher = partial(
+        fetch_data,
+        user_id=user_id,
+        timeout=30,
+        retry=3,
+        api_key="secret"
+    )
+    
+    endpoints = ["profile", "orders", "reviews"]
+    
+    futures = [
+        loop.run_in_executor(executor, partial(fetcher, api_endpoint=ep))
+        for ep in endpoints
+    ]
+    
+    results = await asyncio.gather(*futures)
+    return results
+```
+
+## why the double partial
+
+```py
+loop.run_in_executor(executor, partial(fetcher, api_endpoint=ep))
+```
+
+here's what's actually happening:
+
+```py
+# first partial: lock in the common stuff
+fetcher = partial(fetch_data, user_id=user_id, timeout=30, retry=3, api_key="secret")
+
+# second partial: add the specific endpoint
+profile_fetcher = partial(fetcher, api_endpoint="profile")
+
+# now profile_fetcher() is a zero-argument callable
+# calling it is the same as: fetch_data(user_id, "profile", 30, 3, "secret")
+```
+
+## seeing it run
+
+```py
+import time
+
+def fetch_data(user_id, api_endpoint, timeout=30, retry=3, api_key="secret"):
+    time.sleep(1)  # pretend this is an API call
+    return f"Data from {api_endpoint} for user {user_id}"
+
+async def main():
+    start = time.time()
+    results = await get_user_data_clean(12345)
+    print(f"completed in {time.time() - start:.2f}s")
+    print(results)
+    # completed in 1.01s  (all 3 APIs ran at the same time)
+    # ['Data from profile for user 12345', 
+    #  'Data from orders for user 12345',
+    #  'Data from reviews for user 12345']
+
+asyncio.run(main())
+```
diff --git a/posts/091025.md b/posts/091025.md
@@ -0,0 +1,7 @@
+---
+title: 'mia -> sf'
+tags: 'journal'
+date: 'Oct 9, 2025'
+---
+
+flight in the early evening. i went walking around the hotel and got an acai bowl for breakfast. i found i like eating these, the sugar content is worrying though. i checked out, left my bag, and went to work out of capitol one cafe. then went to a slop bowl restaurant for lunch. it then started drizzling, lightly then it came all at once. just like malaysia. i decided to leave for the airport early and then it turned out to be an hour long instead of 20 minutes. i had a premonition perhaps. upon arrival and entering security, i was victim of an incredibly annoying flaw of the miami airport – the checkpoints are segmented by concourse. which means if you went through security for concourse G, you can't enter concourse H. this was my first time having my bags and body checked twice, before i finally got to the right gate. luckily I still had time to get food for my 6 hour and 30 min journey to sf. arriving at W's house after the grueling flight, words came pouring out of my mouth. i trauma dumped for an hour or two. all the emotions and feelings pent up inside while i worked and worked.
diff --git a/posts/101025.md b/posts/101025.md
@@ -0,0 +1,48 @@
+---
+title: 'performance client'
+tags: 'rust, python, sf'
+date: 'Oct 10, 2025'
+---
+
+learned about batch calls using the b10 [performance client](https://github.com/basetenlabs/truss/tree/main/baseten-performance-client) today 
+
+the problem we solve: even with async, you're bottlenecked by
+
+- python's GIL (no true parallelism)
+- no smart batching
+- no request hedging (p99 latency kills you)
+
+what is request hedging?
+
+imagine you send a request, and 99% of it come back in 100ms, but 1% takes 5s due to network or slow replica
+
+request hedging is : after Xms, send a duplicate request, whichever finishes first wins, slow one gets cancelled
+
+it's like calling an uber, a waymo, and a lyft, and whichever arrives first, you get on, the rest you cancel. (wouldn't that be a great app)
+
+the catch: it costs extra requests. you can cap this at a budget with b10
+
+
+```py
+from baseten_performance_client import PerformanceClient
+
+client = PerformanceClient(
+    base_url="https://api.openai.com",
+    api_key="your-key"
+)
+
+texts = ["doc " + str(i) for i in range(100000)]
+
+response = client.embed(
+    input=texts,
+    model="text-embedding-3-small",
+    batch_size=128,              # pack by count
+    max_chars_per_request=50000, # or by chars (hits limit first)
+    max_concurrent_requests=256,
+    hedge_delay=0.5              # send duplicate after 0.5s
+)
+```
+
+---
+
+went to the ferry building and tried lunette, the cambodian restaurant. the pork noodle soup was decent, esp for the $28 price tag, i had high expectations after watching that yt video, but i cannot trust youtubers. went to the main library and picked up two books from the bookstore. then worked out of there for a few hours. walked to ikea to get some meatballs. then worked out of saluhall, a modern food hall with tasteful decor and lights. i sat there getting more work done before i rushed to pickup the chicken rice i ordered from Gai and Rice and to catch my waymo home.
diff --git a/posts/111025.md b/posts/111025.md
@@ -0,0 +1,13 @@
+---
+title: 'j food in sf'
+tags: 'journal'
+date: 'Oct 11, 2025'
+---
+
+meals
+
+- breakfast: grains drink
+- lunch: udon mugizo - tonkatsu deluxe w shrimp tempura
+- dinner: echigo home cook - pork katsu curry
+
+stared out of the window in a waymo just appreciating how unique the SF buildings are. taking deep breaths. i felt like i was on a race track the past few weeks, and now i get to take a pit stop. went to japantown to get udon because i craved it. then walked around filmore st, saw some cute kittens for adoption, then got matcha at peets coffee and the plan was to get some work done, but the wifi was bad. so we took the bus home, a decision forced by W. the rest of the evening was filled up with packing my leftover stuff from our apartment. i got a cheap luggage from amazon just for this. and i found it was still not enough space to bring everything back. i have a surprising amount of stuff. stuff that i don't really need. dinner was j food again, because i can't get enough of it.
diff --git a/posts/121025.md b/posts/121025.md
@@ -0,0 +1,11 @@
+---
+title: 'sunday school'
+tags: 'journal'
+date: 'Oct 12, 2025'
+---
+
+went back to the chinese methodist church. everyone who saw me was happy to see me. and it goes both ways. i came to the english service slightly late. i was tasked to help out at sunday school. from my observations of these 7-10 graders, kids nowadays speak just like the internet, and i'm not sure if i'm a fan of that. it's jarring to see how they can't stay seated, or answer a question seriously, any chance they get, they want to blurt out a meme or crass joke. it was an unpleasant experience. there were a few golden moments, when i helped the younger girls flip the kids bible to the right page, and watching them try to read it, and stopping at the words she wasn't sure how to read. that was a heart warming fuzzy moment.
+
+after fulfilling my duties, i went to the library to Get Work Done. and then ordered from Hon's Wun-Tun house, clams with sake and dumplings. i took the food to the transamerica park, it was incredble. though the alcohol content the clams gave me a headache. i walked pass my old apartment, an entire years worth of memories flooded back. i was stuck in that cold damp hole for an entire year. i felt lucky being moved to austin. my QoL definitely improved. although the money probably played a factor in it.
+
+i was supposed to catch my flight with my 50 lbs luggage, but turns out, you can't check your bag 45 mins before a flight. lesson learned. i now know that you have to check in at least an hour and 15 mins before, minimum if you want to check a bag and fly. my flight was moved by a day, and i went back to W's apartment feeling like a failure. maybe i was just sour that things didn't go my way and it was all my fault.
diff --git a/posts/131025.md b/posts/131025.md
@@ -0,0 +1,13 @@
+---
+title: 'sf -> aus'
+tags: 'journal'
+date: 'Oct 13, 2025'
+---
+
+i did standup at W's apartment, i mustered up the courage to demo my trial app, but it was pretty unnecessary. nobody reacted besides E. ig i was too stiff and had no humor, and i didn't build a relationship with the people on my team. i need to stop being so shy.
+
+my japanese food craving was still ongoing, and i got una-gyu don for lunch. after that i went out for a walk around valencia street. i went to dog-eared books and spent about 17 minutes in there perusing, before i left to stroll around and observe the fall leaves on display, and halloween decorations around the street. it soon started to drizzle, and i went to stonemill matcha to get an early dinner before i go to the airport. i order 30 min before they close, and finished my matcha while i waited. the rain grew heavier by the second. by the time my order was done, i had to brave it by sheltering under store fronts, protecting my hot and crusty pork katsu curry. i waited the longest i ever had for my waymo, taking up to 20 minutes. as i rushed home, i moved all the bags out of W's house and was ready to say goodbye once again. i then proceeded to make the dumbest move by leaving my food behind the locked door. once again, my lack of planning and memory backfires. i leave the curry for W as a parting gift i suppose.
+
+i decide to have chicken pho at the airport, it was saltier than i imagined, and made me queesy. i also got the coconut tofu curry from proper food which i am obsessed with. for the flight home i watched office spaces, a classic movie as i've heard, and after watching, i fully agree and understand. and amazed how it's still so relevant today. like the sillicon valley show.
+
+i had a lovely conversation with my driver, who came to austin 15 years ago. must've been a huge change for her. coming home i decided to unpack everything, and ended up sleeping at 4am.