Never Hit Your Claude Usage Limit Again (save money)

AI Edge · 2026-04-27 · YouTube Playlist
https://www.youtube.com/watch?v=e1k754hjeEE

📄 Zusammenfassung

Dieses Video von "Never Hit Your Claude Usage Limit Again (save money)" enthaelt keine Beschreibung und kein Transkript. Bitte das Video direkt auf YouTube aufrufen fuer mehr Informationen.

📝 Transkript

Let's be honest with each other. Clawed usage limits absolutely suck. I'm sure you know the feeling. You're deep in a clawed prompting session and all of a sudden, bang, you've hit your usage limit. And it doesn't matter if you're on the $20 plan, the $100 plan, or even the $200 plan, which I'm on, it still seems to be an issue. Up until 3 weeks ago, I was still running into rate limits on the $200 plan. And this was happening just after an hour session of Claude code. So, I started questioning, why is this actually happening? And I started digging deep into how Claude actually works. And I worked out that I was using Claude wrong. And since then, for the last 3 weeks, I haven't hit a single Claude usage limit. And I literally use Claude 6 hours a day. So in today's video, I'm going to give you my full framework to never hit your Claude usage limits again. After watching this video, you're going to be able to extract the maximum value out of your Claude subscription. And some of these tricks I haven't heard a single other creator talk about yet. If you don't know who I am, I'm Mars Deutsche. I'm 25 years old and I've been using ChachiBT since it first came out 3 years ago. Since then, I've been going deep down the AI rabbit hole and I've used AI to help power my agency business to over $20 million in revenue and my digital product business to over $5 million in profit. As an AI enthusiast, I created this channel to help share my learnings with you. So hopefully you can also use AI to achieve your goals as well. All right, so the first step to never hitting your claw limits again is all in your workflow. A really simple thing you can fix is being very intentional with how you prompt Claude. Most people figure out what they want whilst talking to Claude. And although it's a great brainstorming partner, don't use the most powerful model like Opus 4.7 for brainstorming. Make sure you really think about your prompt first before you use Claude. Or if you have to brainstorm with Claude, simply use a cheaper model, which we'll go through later in this video. Model selection is a huge trick that most people are missing. And if you're just using the default model, that's probably one of your biggest issues. The next thing you need to understand is how Claude actually works. Certain tasks take up more tokens. For example, if I just chat to Claude, hey Claude, how are you? And ask it questions and go back and forth. This actually isn't using up many tokens. Where you start running into token issues is whenever Claude has to build something. So if you're creating a drawing, if you're coding, if you're creating an artifact, a dashboard, that's what burns through tokens. So you're actually better off before building something doing more planning. So going back and forth to make sure the build is going to be right. Let's say you're building a finance dashboard. Going back and forth to make sure the written architecture is fully correct before you physically build it because having to rebuild the dashboard is going to chew up much more tokens than simply speaking about building the dashboard. That's a major learning for me now. So whenever I go and I build something, I always plan beforehand. If you're a coder, you can actually go into claude code specifically and you can click plan mode. This will by default do no code and only focus on planning. So if you're building stuff on claude code, going into plan mode first, doing all of your planning before you actually code is going to save you a lot of tokens. So that simple mindset shift just from having the understanding of how tokens are actually burnt makes a big difference. All right, now let's get into one of the major needle movers. Chat length. Long chats are a silent killer. If you use a single chat and you keep talking to it, you have to understand how Claude works. It keeps having to go and scrape thousands of prior messages to understand the necessary context for giving you a response. So, what I prefer to do is actually set up projects. I have one for this YouTube channel, AI Edge, for example. And every time I start a new task, like a video analysis or a new video script, I open up a brand new chat. This way, Claude is using up less tokens trying to understand the context. And the other really powerful thing you can do here is in your project folders, put in your memory and instructions. So, tell Claude exactly what the purpose of these charts are. For example, for AI Edge, it has a memory of exactly what AI Edge is, my goals with the business, my goals with the channel. Then, in instructions, I tell Claude exactly what I want from it, how I want it to format its answers. Here's a trick. If you're really trying to extract the maximum value from each plan in your instructions, you can actually write in something like be cognizant of response length. Try and speak in a concise manner. Try and act in an efficient manner. If you just write that in to your instructions, your responses by default whenever you're in that project will actually be shorter, thus saving you tokens. Oh, and by the way, so you remember all of this stuff going forward, there'll be a free link in the description below, which contains all of the tips from this video in a single PDF that you can then actually put into claw to help it configure your chats. So, you simply claim that by going to the link in the description, signing up for the AI newsletter, and you'll unlock the Instagram where the assets can be found inside the Google Drive. So, to summarize the chat setup, I'll say it like this. Three highly focused chats beats one long chat every day of the week if you set up your memory correctly. So, I've spoken about the memory system for the project folders themselves if you're just using the chats. Something that you may want to consider that I've done another video on my second brain video is actually setting up custom memory cuz you have to understand you don't have full control if you're just using the chats over which memory Claude saves. It's essentially a black box. Yes, it is technically updating the memory, but you don't know exactly what and how it's updating. So the best workaround for this is to actually create a local folder on your computer like your AI brain folder which stores all of your local memory. So in a folder you want to configure it something like this. Have an instructions.m MD file that basically has the instructions for Claude to follow. You'll notice there's a very important line here. Whenever new information surfaces during a session, the relevant MD file must be updated. Don't let facts slip through. This is so important because the instructions are actually getting the memory to auto update because without this claude is very random in terms of what it actually adds to the memory file. If you have a line like this in instructions, it means memory is constantly updating. Don't worry because in the free guide below, I'm going to give you the exact prompt to set up your memory and folder system in the correct way. And if you want to watch the dedicated video on that after, I've got a video on my second brain guide which also connects this to Obsidian. But for now, all you need to know is that you need an instructions file and a memory MD file. Then if you go back into Claude and you utilize co-work, the cool thing about co-work is you can work inside a specific folder. So I can choose my AI brain folder and every single time I'm typing a prompt that's connected to that folder on co-work, it's going to be tapping in to the instructions and all of the specific memory to that folder. And the great thing about this is that you actually own your own memory folder. It's not in a clawed black box. So if you do want to ever switch models in the future or if you want to plug it into another tool, you actually own your memory. I personally think memory sovereignty, so owning your own memory is the most important thing in AI because memory is transportable. If you have your memory, you're going to be a very effective AI user. The people that I find that aren't very effective with AI don't own their own memory and don't have the systems in place to update their memory. The good thing is it's super simple and I've actually developed a prompt for it to save you time. All right, that was layer 1, your workflow. Let's talk about layer two and this is just so important and I don't see enough people doing it. This is model stacking. If you are using Opus 4.7, the most expensive model to ask questions, do basic scraping and basic research, you're just burning through tokens. There are other models that are 90% as capable on these basic tasks that you can default to and then only use Opus 4.7 for the work where you really need a smart model to shine. I'll show you what I mean. So if you go in to create a new chat on claude, you can select your model. So Opus 4.7 is clearly the smartest. So if I was making a big business decision, if I was coding a financial dashboard, and if I was analyzing really important data, of course, yes, I will use this still with the framework that I showed you in step one. However, the other models are still pretty good, specifically Sonnet. Sonnet for most real work is great. If you're analyzing a PDF, if you want to update some text, if you want to do a bit of writing, Sonnet, in my opinion, is fantastic. And Haiku is still pretty capable for quick tasks like extracting data, scraping. There are probably people that use OpenClaw that are watching this video. I do too. For certain workflows, Haiku is a really quick and easy way to scrape data from the internet. And the way I do it is I use the cheaper models for the scraping for the repetitive tasks. And then I'll use Opus 4.7 as my curator. It's the model that curates all of the research from these other models. If you're not using open claw, you can still implement this strategy on the desktop application of claude itself through simply being cognizant of what model you're using sonet 4.6 if you think that the task is less important or slightly easier and then just manually switching to opus 4.7 when you think the task is slightly more advanced on claw code as well. If you're a coder, this is something to be very cognizant of. There's actually an effort level that you can select. If you're on extra high or max, you're going to chew through tokens. Yes, you're going to get a better result, but let's say you're only using claude code to update local folders. You're not going to need max effort. You could get away with medium or even low in some cases. I would only use extra high or max effort. If you're doing a task where you really need precision and output and claude code, just like the chats, allows you to select the model. Now, I'm not going to lie, Opus 4.7 is way better than the other models, but if you need something quickly vibecoded, like let's say, you know, a little dashboard or just a interface or a web portal or something relatively easy, you don't need to use it. I would just prioritize using it for the more advanced use cases. And here's the reality as well. You don't just need to use Claude. I use Quen and Kimmy on my OpenClaw. I use Grock for real-time news, which is built into my X subscription. I use Gemini as well, which is built into my Google subscription. You don't just need to use Claude. So, just be cognizant of when you're using the model. If you're really running into issues and let's say you're on one of the lower plans, just make sure you use Claude for the work where you need it. Understand where Claude shines. that really shines when it comes to interactive dashboards, when it comes to coding. But if you just want a little bit of news, you can use Grock. You can have another cheaper model. You can even use GPT for voice prompting and brainstorming. I do that all the time. And then I use Claude as my curator because I know what it's good at. So just understand the strengths of each model and combine multiple models in your workflow. I actually did a tools video on this where I break down my use case for my top seven AI models and in that video I break down the strengths of each model. And I feel like that's also one of the reasons I was able to reduce my token expenditure because I've started leaning on other models for things where those models genuinely shine. Because although Claude is great, other models actually do some things better. So keep that in mind next time you're doing a task. All right, now you understand the second step, which is selecting the right model for your use case. Let's talk about something underrated, tool splitting. The important thing to understand is that although Claude is one application, it's not just one product. You have the chat, you have co-work, and you have Claude code. And you need to treat each individually. And you even have extra products now like Claude design with its own separate usage limit. Now, Claude treats the token limits for Claude code and the Claude chat the same with design being separate. But this is where you need to understand who you are as a user because if you are a heavy Claude code user and you're chewing through your overall plan with Claude Code, then what you can actually do is hook up the API to Claude Code. It'll run separately through the API. You need to go onto Anthropic's website to the API section to create one. And then this will actually now be separate from your chat. If you're a normal user who only uses Claude code sometimes, you can get away with having it under your main plan. The only thing I'd urge is that you actually go into claude.ai/ settings/usage and actually look at where your usage is. So just being aware of where your limits are. For example, right now I've used up 12% of all models and it resets on Friday at 9:00 a.m. So I'm all good. But I've actually run out of claw design because I was going crazy on that yesterday and that's a completely separate topic because claw design is so usage intensive. And now if I want more tokens for claw design, I actually need to go in and I need to purchase pay as you go usage credits they call it for claw design. This is also a strategy where instead of you having to upgrade your entire plan, if you notice that you're about to hit your limits in, let's say, 24 hours and you just need to do a few more prompts, you can actually just buy more usage credits instead of upgrading and spending the extra $100, jumping from $100 to $200. You might only need, you know, $5 to 10 extra dollars worth of credits to get you over that hump. So, claw.ai/upgrade. This will allow you to manage your plan. So just being aware of it I think is a big thing and actually checking in on your limits so you know you're not blindsided when something happens and this can also change your behavior. If you notice you're at 75% and you need the model for work tomorrow then maybe you know you won't go crazy vibe coding application the night before. I think that awareness is really key and although it's annoying it's something I think a lot of people are missing now. Unfortunately Claude is actually getting stingier with their limits. This is a conversation for another day, but energy and compute is a real problem for them. And that's why we've actually seen degraded performance across Claude over the last month or so. You've probably noticed Claude in some cases has become dumber. And although the new models are great, like 4.7, it's also really usage intensive. And they've actually taken it a step further. And now, as you can see in the pro plan, you no longer have access to Claude code. You can only get the Claude code plan from the $100 a month plan or the $200 a month plan. So, $17 subscription users no longer have access to Claude code. I kind of get why they're doing this. They are obviously making a bet that serious users who want to use Claude code are willing to pay $100 and more and that they need to conserve energy somewhere. So, they would rather, you know, ice out the pro cohort. But it still does suck. And this is just something we need to accept about the current state of AI. Prices are not going to get better. They are only going to get worse. So, I do feel like a video like this actually might be one of the most important ones of the year. So you can save money longer term because this problem is only going to get worse. You need to tweak your usage habits now unless you've just got money to burn because these features are only going to become more restrictive. I mean, we're already seeing features like claw design which are amazing. The average person can't afford. I blew through my entire plan in a 12-hour session. And if I want to keep using it now, I'm probably going to spend $50 to $100 a day if I want to use it seriously. So it is a problem. And if you're on a budget, you might want to reconsider, as I mentioned in the last step, which tools you're actually using. And if you do use an interface like Hermes agent or openclaw, you might want to consider even running local source models. If you want to frontload an investment into hardware like a solid Mac studio that can actually run local models, then you're literally running an open source model on your computer and then you're not actually paying a subscription, but you have to bear the upfront hardware cost. So your architecture will depend on what you want to get out of AI. But I think now we actually really need to be cognizant of where our subscription money is going. I still think Claude is the best subscription to have overall, but I think no longer is it a viable option for absolutely everything just due to the sheer cost issue that we are experiencing. I think it's good to have other subscriptions or even utilize other free tools to spread the load out. Now, I want to give you one of my biggest tips in the video. If you really want to save usage cost, if you have tasks that you do on a repetitive basis, if there are things that you do in your day-to-day or in your weekly life, simply create Claude skills for them. It's a bit of work up front, but what this will then generate is the ability for you to hit forward slash and then select a skill, for example, creative session, which now loads up my morning brief skill that I follow every day. So, I've frontloaded the work. I've trained Claude to do something, be my morning assistant. And now, whenever I want to do my morning session, I simply enter the skill. You can do the same for script writing. You can train a Claude skill to write like you. You can do the same for portfolio analysis. you can train Claude to analyze the portfolio in the way that you want with the exact parameters that you want. And the reason why this actually saves tokens over time is because Claude knows exactly what to do. You're not constantly correcting it and reprompting, which is what ends up burning tokens. So, I've actually done a full guide on Claude skills, how to develop skills from scratch. I'll leave that in the top corner if you want to check it out. And that's another feature of Claude that you can utilize to significantly reduce your token expenditure. Now, let's say you're in a situation where you've done everything I've told you about in today's video and you are still running into usage limits. Let's just be frank with each other. Maybe it means you need to upgrade your subscription. Now, I don't think you need to jump to the $200 plan if you haven't implemented anything I've spoken about today. But if you have and you're still running into issues, you might just be a power user of AI and that is just the cost of using AI right now. So, if you really want to use Claude, if you love Claude Code, if you love Claude Co-work, and you're still running into issues on the $17 plan, you may have to upgrade to 100. And if you're still running into issues like I was on the $100 plan, you may have to upgrade to 200. That is just the name of the game right now, unfortunately. Of course, as I said, you can use other models in order to alleviate some of that burden. But if you're a Claw Power user, this is just the cost of using the best Frontier model in the world right now. What I will say though is don't just jump into the max plan if you haven't optimized. optimize first and then make subscription upgrades later. That's obviously the best way to go about it. If you want all of my tips condensed into one framework from today's video, make sure to use the link in the description below to claim your free PDF. It's going to contain all the tips and tricks, including the setup prompt for an efficient instructions and memory system. Thanks everyone for your time today. Hopefully, I was able to help you out, save you some money, and help you [music] extract more out of your Claude. I will see you in the next video. Have a lovely rest of your day. Peace out.

📺 Ähnliche Videos

🔮 MiroFish simuliert die Zukunft mit tausenden Agents | Alles was du wissen musst!

Christoph Magnussen · 2026-05-11🇩🇪 DE

Ich habe das fortschrittlichste KI-Tool der aktuellen Zeit entdeckt… Das ist Hermes Agent 🔥

Der KI-Doktor · 2026-05-09🇩🇪 DE

Diese OpenClaw MasterClass Wird Deine Arbeitsweise Für Immer Verändern

Der KI-Doktor · 2026-05-08🇩🇪 DE

Paperclip Is Insane | Full Tutorial

Ferdy․com | Ferdy Korpershoek · 2026-05-06🇬🇧 EN