Topic

#llm

136 articles found

We ran Qwen3.6-27B on $800 of consumer GPUs, day one: llama.cpp vs vLLM

Originally published at llmkube.com/blog/qwen3-6-27b-bakeoff. Cross-posted here for the dev.to audience. A Kubernetes-native bake-off on 2× RTX 5060 Ti, with reproducible manifests and a cost-per-token number neither cloud nor OSS FinOps tools will tell you. This is a runtime comparison, not a model

#cloud #dev.to

· 11 days ago· Dev.to

Stop Configuring the Same LLMs Over and Over: Introducing LLMC

As I dive deeper into the world of LLMs and AI Agents, I found myself trapped in a tedious loop: every time I tried a new tool, I spent an hour repeating the same setup process. I'd find the models that actually work for my workflow, only to manually copy those configurations into every new agent I

#cloud #dev.to

· 11 days ago· Dev.to

LLM OCR Benchmarks, Claude Code Context Issues, & Cloud GPU Pricing Tool

LLM OCR Benchmarks, Claude Code Context Issues, & Cloud GPU Pricing Tool Today's Highlights Today's highlights include an open-source framework benchmarking LLMs for OCR, revealing cost-saving potential with older models. Additionally, deep technical issues with Claude Code's context management have

#cloud #dev.to

· 11 days ago· Gizmodo

‘Fullmetal Alchemist’ Is the Greatest Anime of All Time

In an era overflowing with new anime, no series has managed to outrun the shadow of 'Fullmetal Alchemist: Brotherhood,' the show that defined what 'peak' actually means.

#technology #gizmodo

· 11 days ago· XDA Developers

Local LLMs changed how I use Home Assistant, and now my smart devices actually listen

Local LLMs made my Home Assistant setup far more responsive than any app or integration

#mobile #xda-developers

· 11 days ago· Decrypt

Tencent's New Hy3 AI Model Is the Most Efficient Chinese LLM No One's Talking About

Tencent just open-sourced Hy3 preview, a new model that punches above its weight on coding agents, reasoning, and search—built in under three months.

#crypto #decrypt

· 11 days ago· Dev.to

Building an LLM Tool Calling Workflow with DigitalOcean and Connected Databases

This article was originally written by Shamim Raashid (Senior Solutions Architect) and Anish Singh Walia (Senior Technical Content Strategist) Intent-driven data interfaces give users flexible access to data through natural language, while your application keeps strict control over queries. The guar

#cloud #dev.to

· 11 days ago· XDA Developers

I built a local LLM server I can access from anywhere, and it uses a Raspberry Pi

It may not replace ChatGPT, but it's good enough for edge projects

#mobile #xda-developers

· 11 days ago· Towards Data Science

Using a Local LLM as a Zero-Shot Classifier

A practical pipeline for classifying messy free-text data into meaningful categories using a locally hosted LLM, no labeled training data required. The post Using a Local LLM as a Zero-Shot Classifier appeared first on Towards Data Science.

#ai #towards-data-science

· 12 days ago· Dev.to

I Got Tired of Re-explaining My Codebase to Claude Code Every Session. So I Built llmwiki.

The context-switch tax If you juggle multiple projects across different clients and stacks, you know the tax. You switch from the billing API to the notification service and spend 20 minutes re-reading code just to remember how it's wired together. You onboard onto a new client codebase and the arch

#cloud #dev.to

· 12 days ago· Dev.to

CrabTrap: I Put an LLM-as-a-Judge Proxy in Front of My Production Agent and Here's What Happened

CrabTrap: I Put an LLM-as-a-Judge Proxy in Front of My Production Agent and Here's What Happened I was staring at my agent logs at 10pm when I saw a response that made my stomach drop: the model had returned a code block with rm -rf wrapped in markdown. It wasn't malicious — it was a directory clean

#cloud #dev.to

· 12 days ago· Dev.to

Beyond the "Brute Force Beauty": A Modular, Brain-Inspired LLM Architecture (Thoughts on grand models: Part 3)

Beyond “Violent Aesthetics”: A Self-Corrected Modular, Brain-Inspired LLM Architecture Preface Through repeated debates with peers and AI assistants, I gradually realized that my original idea confused hypotheses with established facts in neuroscience, and analogies with implementable solutions. How

#cloud #dev.to

· 12 days ago· DeepLearning.AI Updates

Seeking arXiv cs.CL endorsement, local LLM clinical NLP benchmark (Ollama, 5 models)

Hey, I am an independent researcher looking for a cs.CL endorsement for my first arXiv paper. What I did: Ran 5 open-weight models locally via Ollama (Q4_K_M) on an L40S — Phi-3.5-mini, Mistral-7B, BioMistral-7B, Llama-3.1-8B, and Llama-3.3-70B, across 4 different FHIR serialisation strategies for m

#ai #deeplearning.ai-updates

· 13 days ago· Dev.to

I Built a Swarm Agent RAG System Inspired by Karpathy's LLM Wiki

Most RAG systems use a single retriever to search a vector database. It works — until your knowledge base has code, images, tables, and text all mixed together. One retriever can't specialize in all of them. So I built rag-swarm — a multimodal RAG system where specialized swarm agents search in para

#cloud #dev.to

· 13 days ago· DeepLearning.AI Updates

What's the best LLM out there right now?

We have all moved on from the early magic of ChatGPT as the LLM space has many players like Claude, Grok and Gemini today. Which one is your favorite and why? 1 post - 1 participant Read full topic

#ai #deeplearning.ai-updates

· 13 days ago· XDA Developers

I’d do these 5 things differently if I started self-hosting LLMs today

From trial-and-error to a cleaner local AI workflow.

#mobile #xda-developers

· 13 days ago· XDA Developers

I turned my phone into a local LLM server, and it handles vision, voice, and tool calls

Local LLMs have come so far that you can now run one on your phone.

#mobile #xda-developers

· 13 days ago· Dev.to

Your LLM call isn't atomic, it's a conversation paused mid-sentence

Your LLM call isn't atomic, it's a conversation paused mid-sentence. https://lnkd.in/dbzAAjN2 ‎ ‎ If you're still grep-and-guessing through prompts, this can save you the month I lost.

#cloud #dev.to

· 13 days ago· Cloudflare Blog

Unweight: how we compressed an LLM 22% without sacrificing quality

Running LLMs across Cloudflare’s network requires us to be smarter and more efficient about GPU memory bandwidth. That’s why we developed Unweight, a lossless inference-time compression system that achieves up to a 22% model footprint reduction, so that we can deliver faster and cheaper inference th

#cloud #cloudflare-blog

· 13 days ago· InfoQ

Presentation: Dynamic Moments: Weaving LLMs into Deep Personalization at DoorDash

Sudeep Das and Pradeep Muthukrishnan explain the shift from static merchandising to dynamic, moment-aware personalization at DoorDash. They share how LLMs generate natural-language "consumer profiles" and content blueprints, while traditional deep learning handles last-mile ranking. This hybrid appr

#cloud #infoq

← PreviousPage 5 of 7Next →