Topic

#llm

134 articles found

Engineering LLMOps: Building Robust CI/CD Pipelines for LLM Applications on Google Cloud

The transition of Large Language Models (LLMs) from experimental notebooks to production-grade applications requires more than just a well-crafted prompt. As enterprises integrate Generative AI into their core workflows, the need for stability, scalability, and reproducibility becomes paramount. Thi

#cloud #dev.to

· 3 days ago· Dev.to

ccgate: Delegate Claude/Codex permission prompts to an LLM (~97% automated for me)

TL;DR ccgate is an OSS CLI that delegates permission prompts in Claude Code and Codex CLI to a separate LLM (Haiku by default). Outcomes are allow / deny / fallthrough. deny returns a reason the agent can act on. Genuinely ambiguous calls bubble back to the user. In my own usage, ~97% of permission

#cloud #dev.to

· 3 days ago· Dev.to

LLM on EKS: Serving with vLLM

Last year, I mentioned that I'm interested in learning how to serve LLMs in production. At first it was just curiosity, but over time I wanted to actually try building something—not just reading about it. This post is a small step in that direction: serving an LLM using vLLM, deployed on Amazon EKS,

#cloud #dev.to

· 3 days ago· XDA Developers

I replaced ChatGPT and Claude with this powerful local LLM and saved over $20 a month while gaining full control

Qwen3.6 runs on my old GPU and does what ChatGPT does for free

#mobile #xda-developers

· 3 days ago· MIT Technology Review

The Download: a new Christian phone network, and debugging LLMs

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. A new US phone network for Christians aims to block porn and gender-related content A new US-wide cell phone network marketed to Christians is set to launch next

#ai #mit-technology-review

· 3 days ago· MediaNama

LLM-based recommenders, ads, and AI agents: Inside Meta’s growth strategy in Q1 FY26 earnings call

Meta expands AI training with richer user data, boosting Reels engagement and video time. LLM-driven recommendations, Muse models, and AI agents power ads, content discovery, and future monetization. The post LLM-based recommenders, ads, and AI agents: Inside Meta’s growth strategy in Q1 FY26 earnin

#india #medianama

· 4 days ago· Dev.to

Building an AI Agent Harness from Scratch: The Architecture Between LLM and Agent

Everyone talks about the model. Nobody talks about the harness. Give Claude Sonnet or GPT-4o a chat interface and you get a conversational AI. Wrap it in a loop that can call external tools, maintain state across turns, enforce budget limits, and validate its own outputs — and you get an agent. The

#cloud #dev.to

· 4 days ago· Dev.to

LLM needs to generate/search in order to compare. Silently answer my question is a lie.

LLM works by generating words out of nothing. If context window is 0 then it can't guide to to something it doesn't have. So it needs something prior in the context window. makes sense, you need something to compare something you can't compare nothing to something. So Our goal is to use LLM to learn

#cloud #dev.to

· 4 days ago· Dev.to

AI Agent Orchestration & Applied LLMs: Code Search, Workflow Optimization, Document Processing

AI Agent Orchestration & Applied LLMs: Code Search, Workflow Optimization, Document Processing Today's Highlights Today's top stories highlight practical advancements in AI agent orchestration and applied LLM capabilities for real-world workflows. We feature innovations in efficient code search for

#cloud #dev.to

· 4 days ago· Dev.to

BitForge: Run LLMs on Microcontrollers

I got GPT-2 running on an Arduino! Here's the quantization pipeline. Process: Q4_K_M quantization via llama.cpp Memory-mapped flash for weight storage Optimized matvec for ARM Cortex-M KV cache quantization Results: Arduino Nano 33 BLE: 3 tokens/sec ESP32-S3: 15 tokens/sec Raspberry Pi Pico: 8 token

#cloud #dev.to

· 4 days ago· Dev.to

KVQuant: Run 70B LLMs on 8GB RAM with 4-bit KV Cache Quantization

I compressed GPT-2 to run on an Arduino! Here's how I did it with KVQuant. The Problem: LLMs need huge memory for key-value caches during inference. The Solution: 4-bit KV cache quantization that reduces memory 4x with <1% accuracy loss. Results: GPT-2: 512MB → 128MB (4x reduction) LLaMA-7B: 8GB → 2

#cloud #dev.to

· 4 days ago· Dev.to

React won't die because LLMs won't let it

Every framework war comes to an end. But what if the winner of that war is the one who determines how new code is created going forward? I can't stop pondering why React won't die. Not the "it's still sufficient" kind of way, but in the "it might be structurally impossible for it to lose" kind of wa

#cloud #dev.to

· 4 days ago· MIT Technology Review

This startup’s new mechanistic interpretability tool lets you debug LLMs

The San Francisco–based startup Goodfire just released a new tool, called Silico, that lets researchers and engineers peer inside an AI model and adjust its parameters—the settings that determine a model’s behavior—during training. This could give model makers more fine-grained control over how this

#ai #mit-technology-review

· 4 days ago· XDA Developers

I've been running some of the biggest open-weight LLMs for free on Nvidia's cloud

You can use big models for free, though there aren't any promises on speed.

#mobile #xda-developers

· 4 days ago· Dev.to

KVQuant: Run 70B LLMs on 8GB RAM with Real-Time KV Cache Compression

I built KVQuant because I wanted to run 70B parameter models on my gaming laptop. The problem? Even with 4-bit quantization, a 128K context window needs 256GB RAM just for the KV cache. When you run an LLM, the memory bottleneck is not the model weights - it is the KV cache. Model Weights (4-bit) KV

#cloud #dev.to

· 5 days ago· Dev.to

Lemonade v10.3: Run Local LLMs, Image Gen, and Speech on Your Own GPU for Free

If you are building AI-powered apps and feeling the cost of cloud API bills — or the anxiety of sending user data off-device — Lemonade is worth your time. Lemonade is an open-source local AI server (3.7k stars, sponsored by AMD) that runs LLMs, image generation, speech-to-text, and text-to-speech e

#cloud #dev.to

· 5 days ago· Dev.to

Retrieval Augmented Localization Cuts LLM Terminology Errors 17-45%

Production localization translates isolated paragraphs and strings. A CI/CD pipeline diffs against the previous version and retranslates what changed — a UI string, a tooltip, a modified paragraph. Each request arrives at the LLM in isolation — without the surrounding page, without the document's fu

#cloud #dev.to

· 5 days ago· SecurityWeek

Fresh LiteLLM Vulnerability Exploited Shortly After Disclosure

The vulnerability allows attackers to read data from a LiteLLM proxy’s database and potentially modify it. The post Fresh LiteLLM Vulnerability Exploited Shortly After Disclosure appeared first on SecurityWeek.

#security #securityweek

· 6 days ago· Dev.to

Postmortem: How a Biased LLM Introduced Discriminatory Code in Our Hiring Platform

In Q3 2024, our hiring platform’s automated resume screener rejected 37% more female candidates for backend engineering roles than male candidates with identical qualifications. The root cause? A biased LLM-generated regex we shipped to production in a 10-minute rush deploy. Ghostty is leaving GitHu

#cloud #dev.to

· 6 days ago· Dev.to

Claude Code with Local LLMs and ANTHROPIC_BASE_URL: Ollama, LM Studio, llama.cpp, vLLM

Native Anthropic endpoints, tool-call compatibility, and context-window sizing for local Claude Code. Last tested: April 2026. See Changelog at the bottom. Goal Use MacBook Air Gemma 4 26B-A4B Q4, 32K context, LM Studio or Ollama MacBook Pro Gemma 4 26B-A4B Q4 / UD-Q4, 64K context, llama.cpp or LM S

#cloud #dev.to

← PreviousPage 2 of 7Next →