Uncategorized

Intel’s Latest GPU Drivers Are Half the Size They Used to Be

Drivers are an integral part of the best graphics cards — without them, you only get baseline functionality, with no fancy 3D graphics, video encoding or decoding, upscaling, or all the other things we’ve come to expect. We compared driver download sizes for the latest GPUs in late January and found that Intel was strangely bloated. We must have caught someone’s attention, as the latest 4255 drivers (opens in new tab), which are also WHQL certified, are about half the size of January’s 4090 beta drivers.We mused at the time that maybe Intel was including some unneeded cruft, or maybe it just wasn’t compressing things as much as it could. Whatever the case, there’s been remarkable progress in just two months. I’ve got a collection of just about every Intel Arc driver release since launch. Here are the exact sizes of the downloads (which doesn’t necessarily equate to uncompressed install size, but it’s far easier to check), release dates, and other details.Swipe to scroll horizontallyIntel Arc Drivers Since Public AvailabilityVersionSizeRelease DateNotes3259844 MB8/4/2022First widely available A380 drivers3268846 MB8/22/2022A380 Spider-Man beta drivers34901,365 MB10/11/2022Arc A770/A750 launch drivers34911,365 MB10/17/2022Beta Game On driver for four new games37931,197 MB10/27/2022Beta Game On driver for three new games38021,197 MB11/18/2022Game On driver for four new games, performance optimizations for eight other games39591,210 MB12/8/2022Game On driver for five new games, massive DX9 overhaul39751,211 MB12/13/2022Beta Game On driver for three new games, DirectStorage support40321,214 MB1/3/2023Launch driver for Raptor Lake-S (UHD Graphics 730)40901,237 MB1/24/2023Beta Game On driver for two new games40911,175 MB2/1/2023Launch driver for Raptor Lake-P mobile CPUs, desktop Arc Control mode introduced41231,175 MB2/7/2023Beta Game On driver for two new games41251,175 MB2/16/2023Beta Game On driver for five new games41461,074 MB3/15/2023Game On for two new games, Raptor Lake-U launch driver4148888 MB3/16/2023Beta Game On driver for two new games4255604 MB3/23/2023Game On for RE4 Remake, performance optimizations and major size reductionThere are a few things worth pointing out, like the jump in size when Intel went from just supporting the A380 (and various existing integrated graphics solutions) to the official Arc launch drivers. Why the extra 500+ MB? We’re not sure, but there were a lot of bug fixes and other factors that likely played a role. Later in October, the size dropped by about 170 MB.From then until February, the size of Intel’s Arc drivers remained pretty consistent at around 1.2 GB. Note that late January was when we wrote the piece about how bloated Intel’s drivers seemed to be compared to AMD and Nvidia. By March, the first driver release that month had lopped off about 100MB in size for the Raptor Lake-U laptop launch.The next day, a different driver came out, reducing the size by 186MB, but that was only the beginning. The current 4255 drivers that came out last night dropped another 284MB in girth. The running total of weight loss since October is at 761MB, making Arc a serious contender for the Biggest Loser: they’re 44% of the drivers they once were! While we’re not entirely sure about all of the details, Intel’s driver blog states (opens in new tab) has this to say:”Good things come in small packages — the Intel Arc graphics driver package, specifically. This latest driver release punches above its weight, now down to 604 megabytes from nearly double that when the Intel Arc desktop GPUs launch in October. Our engineers put the old 1.3GB driver download on a diet with smarter compression algorithms. This means faster updates so you can Game On even sooner with less bandwidth consumed, all with zero compromises in performance or features made.”I’m skeptical that the only real change was in compression algorithms. Were there a bunch of TIF or BMP files that got converted to PNG? Because you usually don’t get a 56% reduction in archive size for any moderately compressed starting point. Regardless, smaller downloads are a good thing for anyone with a data cap. Says the guy who downloaded over 300GB of Large Language Models while poking around at chatbots last week. […]

Uncategorized

Nvidia Is Bringing Back the Dual GPU… for Data Centers

Nvidia announced a new dual-GPU product, the H100 NVL, during its GTC Spring 2023 keynote. This won’t bring back SLI or multi-GPU gaming, and won’t be one of the best graphics cards for gaming, but instead targets the growing AI market. From the information and images Nvidia has released, the H100 NVL (H100 NVLink) will sport three NVLink connectors on the top, with the two adjacent cards slotting into separate PCIe slots.It’s an interesting change of pace, apparently to accommodate servers that don’t support Nvidia’s SXM option, with a focus on inference performance rather than training. The NVLink connections should help provide the missing bandwidth that NVSwitch gives on the SXM solutions, and there are some other notable differences as well.Take the specifications. Previous H100 solutions — both SXM and PCIe — have come with 80GB of HBM3 memory, but the actual package contains six stacks, each with 16GB of memory. It’s not clear if one stack is completely disabled, or if it’s for ECC or some other purpose. What we do know is that the H100 NVL will come with 94GB per GPU, and 188GB HBM3 total. We assume the “missing” 2GB per GPU is for ECC now.Power is slightly higher than the H100 PCIe, at 350–400 watts per GPU (configurable), an increase of 50W. Total performance meanwhile ends up being effectively double that of the H100 SXM: 134 teraflops of FP64, 1,979 teraflops of TF32, and 7,916 teraflops FP8 (as well as 7,916 teraops INT8).Basically, this looks like the same core design of the H100 PCIe, which also supports NVLink, but potentially now with more of the GPU cores enabled, and with 17.5% more memory. The memory bandwidth is also quite a bit higher than the H100 PCIe, at 3.9 TB/s per GPU and a combined 7.8 TB/s (versus 2 TB/s for the H100 PCIe, and 3.35 TB/s on the H100 SXM).As this is a dual-card solution, with each card occupying a 2-slot space, Nvidia only supports 2 to 4 pairs of H100 NVL cards for partner and certified systems. How much would a single pair cost, and will they be available to purchase separately? That remains to be seen, though a single H100 PCIe can sometimes be found for around $28,000 (opens in new tab). So $80,000 for a pair of H100 NVL doesn’t seem out of the question. […]

No Picture
Uncategorized

Asus’s RTX 4080 Noctua OC Edition Is Officially Available, Officially Huge

Asus teamed up with Noctua, yet again, to bring a special brown-and-tan-themed version of the RTX 4080 to market. The 4080  already competes with the best graphics cards, albeit at a high price. We have a sample of the Noctua card in hand and will be reviewing it shortly, though we don’t expect a massive difference from the vanilla RTX 4080 Founders Edition in terms of performance — the 4080 ranks fourth in our GPU benchmarks hierarchy in rasterization, and second in ray tracing. Instead, this card is all about aesthetics and, hopefully, relative silence.While the underlying hardware has undoubtedly changed quite a bit, the new RTX 4080 Noctua OC Edition looks nearly the same as the Asus RTX 3070 Noctua Edition we reviewed last year. Except Asus apparently decided a quad-slot card wasn’t quite large enough, so the new RTX 4080 Noctua measures 310 x 145 x 87.5 mm, occupying 4.3 slots worth of case space. It’s a good thing most people don’t plug in expansion cards other than a GPU these days, as only the bottom slot or two on a typical ATX board would still be accessible with this card installed — though you could try for a PCIe riser solution.Image […]

Uncategorized

How to Run a ChatGPT Alternative on Your Local PC

ChatGPT can give some impressive results, and also sometimes some very poor advice. But while it’s free to talk with ChatGPT in theory, often you end up with messages about the system being at capacity, or hitting your maximum number of chats for the day, with a prompt to subscribe to ChatGPT Plus. Also, all of your queries are taking place on ChatGPT’s server, which means that you need Internet and that OpenAI can see what you’re doing.Fortunately, there are ways to run a ChatGPT-like LLM (Large Language Model) on your local PC, using the power of your GPU. The oogabooga text generation webui (opens in new tab) might be just what you’re after, so we ran some tests to find out what it could — and couldn’t! — do, which means we also have some benchmarks.Getting the webui running wasn’t quite as simple as we had hoped, in part due to how fast everything is moving within the LLM space. There are the basic instructions in the readme, the one-click installers, and then multiple guides for how to build and run the LLaMa 4-bit models (opens in new tab). We encountered varying degrees of success/failure, but with some help from Nvidia we finally got things working. And then the repository was updated and our instructions broke, but a workaround/fix was posted today. Again, it’s moving fast!It’s like running Linux and only Linux, and then wondering how to play the latest games. Sometimes you can get it working, other times you’re presented with error messages and compiler warnings that you have no idea how to solve. We’ll provide our version of instructions below for those who want to give this a shot on their own PCs.(Image credit: Toms’ Hardware)It might seem obvious, but let’s also just get this out of the way: You’ll need a GPU with a lot of memory, and probably a lot of system memory as well, should you want to run a large language model on your own hardware — it’s right there in the name. A lot of the work to get things running on a single GPU (or a CPU) has focused on reducing the memory requirements.Using the base models with 16-bit data, for example, the best you can do with an RTX 4090, RTX 3090 Ti, RTX 3090, or Titan RTX — cards that all have 24GB of VRAM — is to run the model with seven billion parameters (LLaMa-7b). That’s a start, but very few home users are likely to have such a graphics card, and it runs quite poorly. Thankfully, there are other options.Loading the model with 8-bit precision cuts the RAM requirements in half, meaning you could run LLaMa-7b with many of the best graphics cards — anything with at least 10GB VRAM could potentially suffice. Even better, loading the model with 4-bit precision halves the VRAM requirements yet again, allowing for LLaMa-13b to work on 10GB VRAM. (You’ll also need a decent amount of system memory, 32GB or more most likely — that’s what we used, at least.)Getting the models isn’t too difficult at least, but they can be very large. LLaMa-13b for example consists of 36.3 GiB download for the main data (opens in new tab), and then another 6.5 GiB for the pre-quantized 4-bit model (opens in new tab). Do you have a graphics card with 24GB of VRAM and 64GB of system memory? Then the 30 billion parameter model (opens in new tab) is only a 75.7 GiB download, and another 15.7 GiB for the 4-bit stuff. There’s even a 65 billion parameter model, in case you have an Nvidia A100 40GB PCIe (opens in new tab) card handy, along with 128GB of system memory (well, 128GB of memory plus swap space). Hopefully the people downloading these models don’t have a data cap on their internet connection.Testing Text Generation Web UI PerformanceIn theory, you can get the text generation web UI running on Nvidia’s GPUs via CUDA, or AMD’s graphics cards via ROCm. The latter requires running Linux, and after fighting with that stuff to do Stable Diffusion benchmarks earlier this year, I just gave it a pass for now. If you have working instructions on how to get it running (under Windows 11, though using WSL2 is allowed) and you want me to try them, hit me up and I’ll give it a shot. But for now I’m sticking with Nvidia GPUs.TOM’S HARDWARE EQUIPMENTI also encountered some fun errors when trying to run the llama-13b-4bit models on older Turing architecture cards like the RTX 2080 Ti and Titan RTX. Everything seemed to load just fine, and it would even spit out responses and give a tokens-per-second stat, but the output was garbage. So the only testing right now is for Ampere and Ada Lovelace cards that have at least 10GB of VRAM. That’s still nine different GPUs, though the performance seems to depend on many other factors besides just the raw GPU number crunching prowess.Update: Starting with a fresh environment while running a Turing GPU seems to have worked. We’ll have Turing results shortly, and will recheck a few of the other numbers to ensure things remained consistent.For these tests, we used a Core i9-12900K running Windows 11. You can see the full specs in the boxout. We used reference Founders Edition models for most of the GPUs, though there’s no FE for the 4070 Ti, 3080 12GB, or 3060, and we only have the Asus 3090 Ti.In theory, there should be a pretty massive difference between the fastest and slowest GPUs in that list. In practice, at least using the code that we got working, other bottlenecks are definitely a factor. It’s not clear whether we’re hitting VRAM latency limits, CPU limitations, or something else, but your CPU definitely plays a role. We tested an RTX 4090 on a Core i9-9900K and the 12900K, for example, and the latter was almost twice as fast.Given the rate of change happening with the research, models, and interfaces, it’s a safe bet that we’ll see plenty of improvement in the coming days. So, don’t take these performance metrics as anything more than a snapshot in time. We may revisit the testing at a future date, hopefully with additional tests on non-Nvidia GPUs.We ran oogabooga’s web UI with the following, for reference. More on how to do this below.python server.py –gptq-bits 4 –model llama-13bText Generation Web UI BenchmarksAgain, we want to preface the charts below with the following disclaimer: These results don’t necessarily make a ton of sense if we think about the traditional scaling of GPU workloads. Normally you end up either GPU compute constrained, or limited by GPU memory bandwidth, or some combination of the two. There are definitely other factors at play with this particular AI workload, and we have some additional charts to help explain things a bit.We ran the test prompt 30 times on each GPU, with a maximum of 500 tokens. We discarded any results that had fewer than 400 tokens (because those do less work), and also discarded the first two runs (warming up the GPU and memory). Then we sorted the results by speed and took the average of the remaining ten fastest results.Generally speaking, the speed of response on any given GPU was pretty consistent, within a 7% range at most on the tested GPUs, and often within a 3% range. That’s on one PC, however; on a different PC with a Core i9-9900K and an RTX 4090, our performance was around 40 percent slower than on the 12900K.Our prompt for the following charts was: “How much computational power does it take to simulate the human brain?”(Image credit: Tom’s Hardware)Our fastest GPU was indeed the RTX 4090, but… it’s not really that much faster than other options. Considering it has roughly twice the compute, twice the memory, and twice the memory bandwidth as the RTX 4070 Ti, you’d expect more than a 9% improvement in performance. That didn’t happen, not even close.The situation with RTX 30-series cards isn’t all that different. The RTX 3090 Ti comes out as the fastest Ampere GPU for these AI Text Generation tests, but it’s tied with the RTX 3090 and RTX 3080 12GB, while the RTX 3080 Ti and RTX 3080 are only slightly behind. Meanwhile, the RTX 3060 still delivers pretty reasonable performance. It’s far less than half the theoretical compute of the 3090 Ti, with just over a third of the memory bandwidth, and yet in our tests it delivered 84% of the performance.Which isn’t to say that everyone interested in getting involved in AI LLMs should run out and buy RTX 3060 or RTX 4070 Ti cards. We recommend the exact opposite, as the cards with 24GB of VRAM are able to handle more complex models, which can lead to better results. And even the most powerful consumer hardware still pales in comparison to data center hardware — Nvidia’s A100 can be had with 40GB or 80GB of HBM2e, while the newer H100 defaults to 80GB. I certainly won’t be shocked if eventually we see an H100 with 160GB of memory, though Nvidia hasn’t said it’s actually working on that.As an example, the 4090 (and other 24GB cards) can all run the LLaMa-30b 4-bit model, whereas the 12GB cards are at their limit with the 13b model. 165b models also exist, which would require at least 80GB of VRAM and probably more, plus gobs of system memory. And that’s just for inference; training workloads require even more memory!(Image credit: Tom’s Hardware)Here’s a different look at the various GPUs, using only the theoretical FP16 compute performance. Now, we’re actually using 4-bit integer inference on the Text Generation workloads, but integer operation compute (Teraops or TOPS) should scale similarly to the FP16 numbers. Also note that the Ada Lovelace cards have double the theoretical compute when using FP8 instead of FP16, but that isn’t a factor here.If there are inefficiencies in the current Text Generation code, those will probably get worked out in the coming months, at which point we could see more like double the performance from the 4090 compared to the 4070 Ti, which in turn would be roughly triple the performance of the RTX 3060.(Image credit: Tom’s Hardware)(Image credit: Tom’s Hardware)These final two charts are merely to illustrate that the current results may not be indicative of what we can expect in the future. Running Stable-Diffusion for example, the RTX 4070 Ti hits 99–100 percent GPU utilization and consumes around 240W, while the RTX 4090 nearly doubles that — with double the performance as well.Long term, we expect the various chatbots — or whatever you want to call these “lite” ChatGPT experiences — to improve significantly. Speaking of which, let’s talk about what sort of information you can get from text-generation-webui.Chatting With Text Generation Web UIImage […]

Uncategorized

Tested: Default Windows VBS Setting Slows Games Up to 10%, Even on RTX 4090

Remember back when Windows 11 launched and there was a concern about how the default of enabling Virtualization Based Security (VBS) and HyperVisor-Enforced Code Integrity (HVCI) might impact performance? There was a lot of noise made, benchmarks were run… and then we all moved on. Flash forward to 2023, and I recently discovered that sometime in the past few months, the PC I use for the GPU benchmarks hierarchy received an update that turned VBS back on. (We have an article on how to disable VBS should you want to).Windows 10 also has this setting and it may also be enabled by default now. Tom’s Hardware Editor-in-Chief Avram Piltch uses Windows 10 Home on his main desktop and found that VBS appeared to be enabled even though he never touched the setting and he had clean installed Windows over the summer.This defaulting to VBS on, everywhere, worried me, because I’m already in the middle of retesting all the pertinent graphics cards for the 2023 version of the GPU hierarchy, on a new testbed that includes a Core i9-13900K CPU, 32GB of DDR5-6600 G.Skill memory, and a Sabrent Rocket 4 Plus-G 4TB M.2 SSD. Needless to say, you don’t put together best-in-class parts only to run extra features that can hurt performance.So I set about testing, and retesting, performance of the fastest graphics card, the GeForce RTX 4090, with and without VBS enabled. After all, we’re now two new CPU generations beyond what we had at the Windows 11 launch, and with faster CPUs and new architectures, perhaps VBS has even less of an impact than before. At the same time, we’re also using new GPUs that deliver substantially more performance than the RTX 3090, which was the fastest GPU back in 2021, which could make CPU bottlenecks and extras like VBS more of a hindrance than before.Windows 11 VBS Test HardwareYou can see our test PC hardware, using Nvidia’s 528.49 drivers (which have now been superseded, thrice). Let’s get straight to the results, with our updated test suite and settings that consist of a battery of 15 games, at four different settings/resolution combinations. We’re going to summarize things in a table, split into average FPS on the left and 1% low FPS (the average FPS of the bottom 1% of frametimes) on the right.To be clear, all of the testing was done on the same PC, over a period of a few days. No game updates were applied, no new drivers were installed, etc. to keep things as apples-to-apples as possible. The one change was to disable VBS (because it was on initially, the Windows 11 default).Each test was run multiple times to ensure consistency of results, which does bring up the one discrepancy: Total War: Warhammer 3 performance is all over the place right now. I don’t recall that being the case in the past, but sometime in February or perhaps early March, things seem to have changed for the worse. (I’m still investigating the cause.)(Image credit: Tom’s Hardware)Taking the high level view of things, perhaps it doesn’t look too bad. Disabling VBS improved performance by up to 5% overall, and that dropped to just 2% at 4K ultra. And if you’re running this level of gaming hardware, we think you’re probably also hoping to run 4K ultra. But even at our highest possible settings, there are still some noteworthy exceptions.The biggest improvement overall comes in Microsoft Flight Simulator, which makes sense as that game tends to be very CPU limited even with the fastest possible processors. Turning off VBS consistently improved performance in our RTX 4090 testing by around 10%, and the 1% lows increased by as much as 15%.Not coincidentally, Flight Simulator is also one of the games that absolutely loves AMD’s large 3D V-Cache on the Ryzen 9 7950X3D. Our CPU tests use a different, less demanding test sequence, but even there the AMD chips with large caches are anywhere from about 20% (Ryzen 7 5800X3D) to 40% (7900X3D) faster than the Core i9-13900K. Perhaps VSB would have less of an impact on AMD’s X3D CPUs, but I didn’t have access to one of those for testing.Another game that tends to bump into CPU bottlenecks at lower settings is Far Cry 6, and it also saw pretty consistent 5% or higher increases in performance — noticeable in benchmarks, but less so in actual gaming. Interestingly, Cyberpunk 2077 with ray tracing enabled also still saw about 5% higher performance. That’s perhaps because the work of building the BVH structures for ray tracing calculations happens on the CPU; many of the other ray tracing games also showed 5% or higher increases.What about games where VBS didn’t matter much if at all? Bright Memory Infinite (the standalone benchmark, not the full game) showed almost no change, and Minecraft only showed a modest improvement at 1080p with our more taxing settings (24 RT render chunk distance). A Plague Tale: Requiem, Borderlands 3, Forza Horizon 5, and Red Dead Redemption 2 also showed less impact, though in some cases the minimum FPS may have changed more.(And again, I’m not really saying anything about Total War: Warhammer 3 as performance fluctuated far too much. Even after more than 20 runs each, with and without VBS, there was no clear typical result. Instead of a bell curve, the results fell into three clumps at the low, mid, and high range, with the 1% lows showing even less consistency. Removing its results only changes the 1% low delta by less than two percent, though.)The biggest deltas are generally at 1080p, and it didn’t seem to matter much whether we were running “medium” or “ultra” settings. That’s probably because ultra settings often hit the CPU harder for other calculations, so it’s not just a case of higher resolution textures or shadows.Windows VBS: The Bottom LineSo, should you leave VBS on or turn it off? It’s not quite that clear cut of a question and answer. The actual security benefits, particularly for a home desktop that doesn’t go anywhere, are probably minimal. And if you’re serious about squeezing every last bit of performance out of your hardware — via improved cooling, overclocking, and buying more expensive hardware — losing 5% just to some obscure “security benefits” probably isn’t worth doing, so disable VBS.Having VBS turned on is now the default for new Windows installations (and I’m pretty sure one of the various Windows Updates that came out in late 2022 may have also switched it back on if it was disabled). So you can argue that Microsoft at least thinks it’s important and it should be left on. However, the fact that Microsoft also has instructions on how to go about disabling it indicates the performance impact can be very real.It’s also worth noting that the 5~10 percent drop in performance remains consistent with what we measured way back in 2021 when Windows 11 first launched. Nearly two years of upgraded hardware later, sporting some of the most potent components money can buy, and we’re still looking at a 5% loss on average in gaming performance.For a lot of people, particularly those with less extreme hardware, the performance penalty while gaming will more likely fall into the low single digit percentage points. But if you’re trying to set a performance record, it could certainly hold you back. And now we’re left wondering what new vulnerabilities and security mitigations will come next, and how much those may hurt performance. […]

Uncategorized

Backblaze Annual Failure Rates for SSDs in 2022: Less Than One Percent

Backblaze, purveyor of cloud storage, has published the statistics (opens in new tab) for the 2,906 SSDs used as boot drives on its storage servers. To be clear, these aren’t only boot drives, as they also read and write log files and temporary files, the former of which can sometimes generate quite a bit of wear and tear. Backblaze has been using SSDs for boot drives starting in 2018, and like its hard drive statistics, it’s one of few ways to get a lot of insight into how large quantities of mostly consumer drives hold up over time.Before we get to the stats, there are some qualifications. First, most of the SSDs that Backblaze uses aren’t the latest M.2 NVMe models. They’re also generally quite small in capacity, with most drives only offering 250GB of storage, plus about a third that are 500GB, and only three that are larger 2TB drives. But using lots of the same model of hardware keeps things simple when it comes to managing the hardware. Anyway, if you’re hoping to see stats for popular drives that might make our list of the best SSDs, you’ll be disappointed.Here are the stats for the past year.(Image credit: Backblaze)While seven of the drive models used have zero failures, only one of those has a significant number of installed drives, the Dell DELLBOSS VD (Boot Optimized Storage Solution). The other six have fewer than 40 SSDs in use, with four that are only installed in two or three servers. It’s more useful to pay attention to the drives that have a large amount of use, specifically the ones with over 100,000 drive days.Those consist of the Crucial MX500 250GB, Seagate BarraCuda SSD and SSD 120, and the Dell BOSS. It’s interesting to note that the average age of the Crucial MX500 (opens in new tab) drives is only seven months, even though the MX500 first became available in early 2018. Clearly, Backblaze isn’t an early adopter of the latest SSDs. Still, overall the boot SSDs have an annualized failure rate below 1%.(Image credit: Backblaze)Stepping back to an even longer view of the past three years, that 1% annualized failure rate remains, with just 46 total drive failure over that time span. Backblaze also notes that after two relatively quick failures in 2021, the MX500 did much better in 2022. Seagate’s older ZA250CM10002 also slipped to a 2% failure rate last year, while the newer ZA250CM10003 had more days in service and fewer failures, so it will be interesting to see if those trends continue.Another piece of data Backblaze looked at is SSD temperature, as reported by SMART (the Dell BOSS doesn’t appear to support this). The chart isn’t zero-based, so it might look like there’s a decent amount of fluctuation at first glance, but in reality the drives ranged from an average of 34.4C up to 35.4C — just a 1C span.(Image credit: Backblaze)Of course that’s just the average, and there are some outliers. There were four observations of a 20C drive, and one instance of a drive at 61C, with most falling in the 25–42 degrees Celsius range. It would be nice if the bell curve seen above also correlated with failed drives in some fashion, but with only 25 total failures during the year, that was not to be — Backblaze called its resulting plot “nonsense.”Ultimately, the number of SSDs in use by Backblaze pales in comparison to the number of hard drives — check the latest Backblaze HDD report (opens in new tab), for example, where over 290,000 drives were in use during the past year. That’s because no customer data gets stored on the SSDs, so they’re only for the OS, temp files, and logs. Still, data from nearly 3,000 drives is a lot more than what any of us (outside of IT people) are likely to access over the course of a year. The HDDs incidentally had an AFR of 1.37%.Does this prove SSDs are more reliable than HDDs? Not really, and having a good backup strategy is still critical. Hopefully, in the coming years we’ll see more recent M.2 SSDs make their way into Backblaze’s data — we’d love to see maybe 100 or so each of some PCIe 3.0 and PCIe 4.0 drives, for example, but that of course assumes that the storage servers even support those interfaces. Given time, they almost certainly will. […]

No Picture
Uncategorized

Video Encoding Tested: AMD GPUs Still Lag Behind Nvidia, Intel

The best graphics cards aren’t just for playing games. Artificial intelligence training and inference, professional applications, video encoding and decoding can all benefit from having a better GPU. Yes, games still get the most attention, but we like to look at the other aspects as well. Here we’re going to focus specifically on the video encoding performance and quality that you can expect from various generations of GPU.Generally speaking, the video encoding/decoding blocks for each generation of GPU will all perform the same, with minor variance depending on clock speeds for the video block. We’ve checked the RTX 3090 Ti and RTX 3050 as an example — the fastest and slowest GPUs from Nvidia’s Ampere RTX 30-series generation — and found effectively no difference. Thankfully, that leaves us with fewer GPUs to look at than would otherwise be required.We’ll test Nvidia’s RTX 4090, RTX 3090, and GTX 1650 from team green, which covers the Ada Lovelace, Turing/Ampere (functionally identical), and Pascal-era video encoders. For Intel, we’re looking at desktop GPUs, with the Arc A770 as well as the integrated UHD 770. AMD ends up with the widest spread, at least in terms of speeds, so we ended up testing the RX 7900 XTX, RX 6900 XT, RX 5700 XT, RX Vega 56, and RX 590. We also wanted to check how the GPU encoders fare against CPU-based software encoding, and for this we used the Core i9-12900K and Core i9-13900K.Video Encoding Test SetupMost of our testing was done using the same hardware we use for our latest graphics card reviews, but we also ran the CPU test on the 12900K PC that powers our 2022 GPU benchmarks hierarchy. As a more strenuous CPU encoding test, we also ran the 13900K with a higher-quality encoding preset, but more on that in a moment.TOM’S HARDWARE TEST EQUIPMENTFor our test software, we’ve found ffmpeg nightly (opens in new tab) to be the best current option. It supports all of the latest AMD, Intel, and Nvidia video encoders, can be relatively easily configured, and it also provides the VMAF (opens in new tab) (Video Multi-Method Assessment Fusion) functionality that we’re using to compare video encoding quality. We did, however, have to use the last official release, 5.1.2, for our Nvidia Pascal tests (the nightly build failed on HEVC encoding).We’re doing single-pass encoding for all of these tests, as we’re using the hardware provided by the various GPUs and it’s not always capable of handling more complex encoding instructions. GPU video encoding is generally used for things like livestreaming of gameplay, while if you want best quality you’d generally need to opt for CPU-based encoding with a high CRF (Constant Rate Factor) of 17 or 18, though that of course results in much larger files and higher average bitrates. There are still plenty of options that are worth discussing, however.AMD, Intel, and Nvidia all have different “presets” for quality, but what exactly these presets are or what they do isn’t always clear. Nvidia’s NVENC in ffmpeg uses “p4” as its default. And switching to “p7” (maximum quality) did little for the VMAF scores, while dropping encoding performance by anywhere from 30 to 50 percent. AMD opts for a “-quality” setting of “speed” for its encoder, but we also tested with “balanced” — and like Nvidia, the maximum setting of “quality” reduced performance a lot but only improved VMAF scores by 1~2 percent. Lastly, Intel seems to use a preset of “medium” and we found that to be a good choice — “veryslow” took almost twice as long to encode with little improvement in quality, while “veryfast” was moderately faster but degraded quality quite a bit.Ultimately, we opted for two sets of testing. First, we have the default encoder settings for each GPU, where the only thing we specified was the target bitrate. Even then, there are slight differences in encoded file sizes (about a +/-5% spread). Second, after consulting with the ffmpeg subreddit (opens in new tab), we attempted to tune the GPUs for slightly more consistent encoding settings, specifying a GOP size equal to two seconds (“-g 120” for our 60 fps videos). AMD was the biggest benefactor of our tuning, trading speed for roughly 5-10 percent higher VMAF scores. But as you’ll see, AMD still trailed the other GPUs.There are many other potential tuning parameters, some of which can change things quite a bit, others which seem to accomplish very little. We’re not targeting archival quality, so we’ve opted for faster presets that use the GPUs, but we may revisit things in the future. Sound off in our comments if you have alternative recommendations for the best settings to use on the various GPUs, with an explanation of what the settings do. It’s also unclear how the ffmpeg settings and quality compare to other potential encoding schemes, but that’s beyond the scope of this testing.Here are the settings we used, both for the default encoding as well as for the “tuned” encoding.AMD:Default: ffmpeg -i [source] -c:v [h264/hevc/av1]_amf -b:v [bitrate] -y [output]Tuned: ffmpeg -i [source] -c:v [h264/hevc/av1]_amf -b:v [bitrate] -g 120 -quality balanced -y [output]Intel:Default: ffmpeg -i [source] -c:v [h264/hevc/av1]_qsv -b:v [bitrate] -y [output]Tuned: ffmpeg -i [source] -c:v [h264/hevc/av1]_amf -b:v [bitrate] -g 120 -preset medium -y [output]Nvidia:Default: ffmpeg -i [source] -c:v [h264/hevc/av1]_nvenc -b:v [bitrate] -y [output]Tuned: ffmpeg -i [source] -c:v [h264/hevc/av1]_nvenc -b:v [bitrate] -g 120 -no-scenecut 1 -y [output]Most of our attempts at “tuned” settings didn’t actually improve quality or encoding speed, and some settings seemed to cause ffmpeg (or our test PC) to just break down completely. The main thing about the above settings is that it keeps the key frame interval constant, and potentially provides for slightly higher image quality.Again, if you have some better settings you’d recommend, post them in the comments; we’re happy to give them a shot. The main thing is that we want bitrates to stay the same, and we want reasonable encoding speeds of at least real-time (meaning, 60 fps or more) for the latest generation GPUs. That said, for now our “tuned” settings ended up being so close to the default settings, with the exception of the AMD GPUs, that we’re just going to show those charts.With the preamble out of the way, here are the results. We’ve got four test videos, taken from captured gameplay of Borderlands 3, Far Cry 6, and Watch Dogs Legion. We ran tests at 1080p and 4K for Borderlands 3 and at 4K with the other two games. We also have three codecs: H.264/AVC, H.265/HEVC, and AV1. We’ll have two charts for each setting and codec, comparing quality using VMAF and showing the encoding performance.For reference, VMAF scores (opens in new tab) follow a scale from 0 to 100, with 60 rated as “fair,” 80 as “good,” and 100 as “excellent.” (None of our results fall below 60, but 40 is “poor” and 20 is “bad.”) In general, scores of 90 or above are desirable, with 95 or higher being mostly indistinguishable from the original source material.Video Encoding Quality and Performance at 1080p: Borderlands 3Image […]

Uncategorized

First PCIe 5.0 M.2 SSDs Are Now Available, Predictably Expensive

We’ve been hearing about PCIe 5.0 for years now, and even though the first PCIe 5.0 capable PCs began shipping with Intel’s 12th Gen Alder Lake CPUs in late 2021, we still haven’t seen any drives for sale… until now. For the past several years, the best SSDs (or at least the fastest) have typically used a PCIe 4.0 interface, and plenty of good drives are still available with ‘just’ a PCIe 3.0 connection. But the fastest SSDs have been hitting the throughput ceiling on a PCIe 4.0 x4 connection for over three years, with only incremental improvements, so it’s high time for something faster.There are multiple M.2 PCIe 5.0 SSDs slated to ship this year, and the first model looks to be the Gigabyte Aorus Gen5 10000, which as the name inventively implies can deliver up to 10,000 MB/s. Earlier rumors suggested the drive would be able to hit 12,000 MB/s reads and 10,000 MB/s writes, so performance was apparently reigned in while getting the product ready for retail.The Gigabyte Aorus SSD uses the Phison E26 controller, which will be common on a lot of the upcoming models. Silicon Motion is working on its new SM2508 controller that may offer higher overall performance, but it’s a bit further out and may not ship this year. The other thing to note with the Aorus is the massive heatsink that comes with the drive, which seems to be the case with all the other Gen5 SSD prototypes we’ve seen as well. Clearly, these new drives are going to get just a little bit warm.The Gigabyte drive is currently listed on Amazon (opens in new tab) and Newegg (opens in new tab), though the latter is currently sold out while the former is only available via a third-party marketplace seller — at a whopping $679.89 for the 2TB model. That’s almost certainly not the MSRP or a reflection of what MSRP might end up being once the drive becomes more widely available, which should happen in the coming month or two.The other PCIe 5.0 M.2 SSD that’s now available is the Inland TD510 2TB, available at Microcenter for just $349.99 — assuming you have a Microcenter within driving distance. Inland is Microcenter’s own brand of drive, and while the cooler that comes with the SSD isn’t quite as large as the Aorus, it does feature a small fan for active cooling. Word is that the fan can be quite loud for something this small, so not a great feature in other words.Like the Aorus 10000, the Inland TD510 uses the Phison E26 controller and has the same 10,000 MB/s reads and 9,500 MB/s writes specification. Where Gigabyte doesn’t currently list random read/write speeds, the Microcenter page lists up to 1.5 million IOPS read and 1.25 million IOPS write for the Inland drive. Both drives also have an endurance rating of 1,400 TBW, with read/write power use of around 11W.How will the drives perform in real-world use? That’s something we can’t assess yet, though we’re working to get these new and upcoming M.2 Gen5 drives in for review. Perhaps with DirectStorage also coming to more games later this year, there may actually be some benefit to the additional speed for more casual users. […]

Uncategorized

Nvidia VSR Testing: AI Upscaling and Enhancement for Video

Nvidia Video Super Resolution — Nvidia VSR — officially becomes available to the public today. First previewed at CES 2023, and not to be confused with AMD’s VSR (Virtual Super Resolution), Nvidia VSR aims to do for video what its DLSS technology does for games. Well, sort of. You’ll need one of Nvidia’s best graphics cards for starters, meaning an RTX 30- or 40-series GPU. Of course, you’ll also want to set your expectations appropriately.By now, everyone should be getting quite familiar with some of what deep learning and AI models can accomplish. Whether it’s text-to-image art generation with Stable Diffusion and the like, ChatGPT answering questions and writing articles, self-driving cars, or any number of other possibilities, AI is becoming part of our everyday lives.The basic summary of the algorithm should sound familiar to anyone with a knowledge of DLSS. Take a bunch of paired images, with each pair containing a low-resolution and lower bitrate version of a higher resolution (and higher quality) video frame, and run that through a deep learning training algorithm to teach the network how to ideally upscale and enhance lower quality input frames into better looking outputs. There are plenty of differences between VSR and DLSS, of course.For one, DLSS gets data directly from the game engine, including the current frame, motion vectors, and depth buffers. That’s combined with the previous frame(s) and the trained AI network to generate upscaled and anti-aliased frames. With VSR, there’s no pre-computed depth buffer or motion vectors to speak of, so everything needs to be done based purely on the video frames. While in theory VSR could use the current and previous frame data, it appears Nvidia has opted for a pure spatial upscaling approach. But whatever the exact details, let’s talk about how it looks.(Image credit: Nvidia)Nvidia provided a sample video showing the before and after output from VSR. If you want the originals, here’s the 1080p upscaled via bilinear sampling source and the 4K VSR upscaled version — hosted on a personal Drive account, so we’ll see how that goes. (Send me an email if you can’t download the videos due to exceeding the bandwidth cap.)We’re going to skirt potential copyright issues and not include a bunch of our own videos, though we did grab some screenshots of the resulting output from a couple of sports broadcasts to show how it works on other content. What we can say is that slow-moving videos (like Nvidia’s samples) provide the best results, while faster-paced stuff like sports is more difficult, as the frame to frame changes can be quite significant. But in general, VSR works pretty well. Here’s a gallery of some comparison screen captures (captured via Nvidia ShadowPlay).Image […]

Uncategorized

Minecraft RTX GPUs Benchmarked: Which Runs It Best in 2023?

Minecraft RTX officially released to the public in April 2020, after first being teased in August 2019 at Gamescom 2019. Since then, a lot of things have changed, with regular updates to the game and a whole slew of ray-tracing capable graphics cards that didn’t exist in early 2020. The best graphics cards now all have DXR (DirectX Raytracing) and VulkanRT, and you can see the full view of how the various GPUs stack up in other games by checking out our GPU benchmarks hierarchy. But how does Minecraft RTX run these days?To be clear, we’re not talking about regular Minecraft here. That can run on a potato, even at high resolutions. For example, the Intel Arc A380 — which as you’ll see below falls well below a playable 30 fps with ray tracing on — plugs along happily at 93 fps at 4K without DXR, using 4xMSAA and a 24 block rendering distance. Or put another way, performance is about ten times higher than what you’ll get with ray tracing enabled.So why would anyone want to turn on ray tracing in the first place, if the performance hit is so severe? First: Look at all the shiny surfaces! Minecraft RTX is a completely different looking game compared to vanilla Minecraft. Second, with cards like the Nvidia RTX 4090 now available, you can still get very good performance even at maxed out settings.We’ve now tested Minecraft RTX on every DXR-capable graphics card, ranging from the lowly AMD RX 6400 and Nvidia RTX 3050 up to the chart topping 4090. We’re using the latest drivers as well: AMD 23.2.2, Intel 31.0.101.4125, and Nvidia 528.49. We’ll have more details on the ray tracing implementation of Minecraft below, but let’s get to the benchmarks first, since that’s likely why you’re here.(Image credit: Nvidia)Minecraft RTX Testing DetailsWe’ve been using Minecraft RTX — or just Minecraft, as it’s normally referred to these days — for a while now in our graphics card reviews. If you want to see just how punishing DXR calculations can be, it’s great for stressing lesser GPUs. We’re now running on a state of the art test system, with a Core i9-13900K with 32GB of DDR5-6600 memory, connected to a 4K 240Hz Samsung Neo G8 32 monitor. Full details on our test PC are in the boxout.Minecraft Test HardwareWe previously tested Minecraft RTX using other CPUs and different memory conditions. While those do matter to some extent, especially if you’re using something like an RTX 4090, we feel that anyone trying to run Minecraft RTX will likely have at least 16GB of memory and a reasonably capable CPU. For these tests, we’re only looking at the impact of the GPU, and we’ve more or less maxed out all of the other hardware to eliminate bottlenecks as much as possible.We’re testing at four settings: 1920×1080 with 8 RT Chunks rendering distance, and 1920×1080, 2560×1440, and 3840×2160 with a 24 RT Chunks rendering distance — the maximum available. We’re focusing on native rendering for the most part, but we do have a couple of DLSS results in the charts to show how that can affect things.Note that in vanilla Minecraft, the rendering distance can be set as high as 96 chunks (it used to be 160, or perhaps that varies by map). Higher rendering distances will put more of a strain on the CPU, but certainly running with 64 chunks isn’t uncommon.Our test map is Portal Pioneers RTX, which includes a helpful benchmarking setup. You can find various generic RTX texture packs to use for your own maps, though you can’t simply enable ray tracing effects without such a pack. That’s because the ray traced rendering requires additional information on the various blocks to tell the engine how they should be rendered — e.g. are they reflective, do they glow, etc.One other important note is that DLSS upscaling can’t be tuned, it’s either on or off with an Nvidia GeForce RTX card. As far as we can tell, for 1080p and lower resolutions, DLSS uses 2x upscaling (Quality mode), 1440p uses 3x upscaling (Balanced mode), and 4K and above use 4x upscaling (Performance mode). The blocky nature of Minecraft does lend itself rather well to DLSS upscaling, however, so even 4x upscaling at 4K still looks very good.New to Minecraft RTX are emmissive properties for some blocks, like the lava shown here. (Image credit: Nvidia)Minecraft RTX Graphics Card Performance We’ve tested virtually every card at every setting that could possibly make sense, including plenty of settings that don’t make sense at all! Basically, we stopped testing higher resolutions and settings once a card dropped below 20 fps, though we did test all of the cards at both 1080p settings just for the sake of completeness. Let’s start with the “easiest” setting, 1920×1080 and an 8 RT chunk render distance.(Image credit: Tom’s Hardware)If you’ve seen results like this before, you won’t be surprised to see that Minecraft RTX runs far better on Nvidia RTX hardware than on anything else — and that’s not even factoring in DLSS upscaling. AMD does offer several GPUs that can now break 60 fps, but even the RX 7900 XTX falls behind the RTX 3080, and the fastest previous gen RX 6950 XT trails the RTX 3060 Ti.To get performance above a steady 60 fps (meaning, the 1% minimum fps is also above 60), you’ll need at least an RTX 2080 Ti, or alternatively an RTX 2060 basically gets there if  you turn on DLSS. If you’re only looking to clear 60 fps average performance, the RTX 2080 Super and above will suffice. For a bare minimum playable experience of over 30 fps (average), the RX 6650 XT and above will suffice.Intel’s Arc Alchemist GPUs clearly have problems with Minecraft RTX. We’ve spoken with Intel about it and they’re aware of the low performance and are working on a fix, but we don’t know when that might come. It’s been a known issue since at least December, and we suspect the cards are currently delivering about half of what they should — the Arc A750, for example, usually lands pretty close to the RTX 3060. Also notice that all three Arc A7 cards are effectively tied, which indicates there’s something else limiting performance as normally the A750 runs about 10% slower than the A770.Finally, let’s talk about DLSS upscaling, with the three sample cards of RTX 2060, RTX 3080, and RTX 4090. The RTX 2060 performance gets a massive boost of 72%, taking it from a borderline 41 fps to a very playable 70 fps. The RTX 3080 sees a decent 52% jump, going from 107 fps to 162 fps. There are limits to what DLSS can do, however, and while the RTX 4090 does improve its average fps by 24%, the 1% lows actually drop a few percent due to the DLSS overhead.(Image credit: Tom’s Hardware)Maxing out the RT Chunks render distance at 24 increases the demands on the GPU a decent amount, at least in areas where you have a relatively unobstructed view. Because the BVH tree — Bounding Volume Hierarchy, used to help optimize the ray tracing calculations — gets constructed on the CPU and passed over to the GPU, some of the hardest hit cards with the increased view distance are the fastest cards. The RTX 4090 performance drops by 31% for example, while the RTX 3070 only loses 15% compared to 8 RT Chunks.If you’re looking for at least 60 fps, the RTX 2080 Ti now marks the bare minimum GPU you’ll need, while 60 fps minimum basically requires an RTX 3080 or faster. Only AMD’s RX 7900 XTX/XT cards can break 60 fps, and in both cases minimums fall quite a bit short of that mark. Dropping your fps target to 30 fps opens the door to more cards, but you’ll still need at least an RTX 2060 to get minimums above 30, or an RTX 3050 to get the average fps to 30.This is another good example of how much faster Nvidia’s GPUs are at ray tracing calculations compared to AMD. In traditional games, the RTX 3050 delivers about 23% lower performance than an RX 6600, but in Minecraft RTX it basically ties the RX 6700 XT — a card that’s 94% faster in our rasterization GPU benchmarks!Intel’s ray tracing hardware normally looks much better than this as well, for example the Arc A750 lands between the RX 6700 XT and RX 6750 XT in Cyberpunk 2077, one of the most demanding DXR games around (outside of Minecraft). Right now, Intel’s Arc A7 cards fall below the RX 6600 and only slightly above the otherwise uninspiring RX 6500 XT. Again, we hope Intel can improve the situation with updated drivers, sooner rather than later.Last, let’s talk DLSS again, which is still in Quality mode here. The gains on the RTX 2060 are 65% now, slightly lower than above so perhaps other bottlenecks are becoming more of a factor. The RTX 4090 at the other extreme sees a 22% increase again, again a touch less than before. Finally, our “middle of the road” RTX 3080 gets a 65% improvement. In other words, the DLSS performance increase appears to be mostly resolution dependent in Minecraft RTX, even though the ray tracing workload increased.(Image credit: Tom’s Hardware)Stepping up to 2560×1440 with maxed out settings at 24 RT Chunks, we’ve dropped the three slowest GPUs from our benchmarks (A380, RX 6400, and RX 6500 XT), but we certainly could have skipped testing a bunch of the other cards. Roughly half of the tested cards can’t keep the 1% lows above 30 fps, and only a handful of cards can even average 60 fps — six to be precise, at least when we’re not using DLSS. And we’re not even looking at a worst-case scenario, as there are certainly more demanding maps available in Minecraft RTX.If you want to fully clear 60 fps, the only card that can do that at native 1440p is the RTX 4090. To average 60 fps or more, you’ll want at least an RTX 3080 Ti. Also note how close the RTX 3080 10GB and RTX 3080 12GB are in performance, which indicates Minecraft RTX hits the RT Cores and GPU shaders far more than it depends on memory bandwidth. We’ll see that pattern continue even at 4K below.At least there are still 22 cards that can still break 30 fps average, which is much better than when Minecraft RTX first became available. Back then, only the RTX 2070 Super, RTX 2080, RTX 2080 Super, and RTX 2080 Ti were able to reach “playable” performance — and the Titan RTX as well, if you want to include that status symbol of GPU-dom. But in our updated tests, the RTX 2070 Super now falls short and you’ll need an RX 6800 XT or faster.DLSS now switches to the “Balanced” algorithm with 3X upscaling, which means it can provide and even more significant boost to performance. The RTX 2060 is basically unplayable at just 22 fps native, but with DLSS it more than doubles performance to hit 46 fps. Even the behemoth RTX 4090 gets a 52% increase in performance thanks to DLSS, going from 102 fps to 156 fps. The RTX 3080 plays middle fiddle again with a 75% increase, jumping from 57 fps to 100 fps and delivering a far better gaming experience.(Image credit: Tom’s Hardware)Wrapping things up with our 4K testing, we’ve naturally dropped most of the GPUs from testing. Where we had 38 GPUs in total at 1080p, we’re down to 19 now, and half of those are again struggling — unless you want to turn on DLSS, which you should if you have an Nvidia card. We just wish you could decide whether to use Performance, Balanced, or Quality mode manually rather than the game devs deciding what’s appropriate.As expected, the RTX 4090 mostly breezes along with just over 60 fps, and it’s the only card to break that barrier. Even 30 fps is a difficult target without upscaling, with the RTX 3080 Ti and above proving sufficient. An interesting aside is that the two RTX 3080 cards, one with 10GB on a 320-bit interface and the other with 12GB on a 384-bit interface, end up with very similar performance. Minecraft RTX at least doesn’t seem to strongly depend on memory bandwidth based on those results. Meanwhile, none of AMD’s GPUs can break 30 fps, and the fastest RX 6000-series cards fall in the high teens.With DLSS now using Performance mode and 4X upscaling, you get a huge boost to framerates by enabling upscaling. The RTX 2060 nearly triples its performance and is mostly playable at just a hair under 30 fps. RTX 3080 improves by 160%, with its upscaled performance passing the native RTX 4090. And the RTX 4090 effectively doubles its performance.While you might think 4X upscaling will leave noticeable artifacts, at 4K and with a game like Minecraft — meaning it’s very blocky by default, and even the text is intentionally pixelated looking — the resulting output still looks almost as good as native rendering. Considering how demanding 4K would otherwise be, DLSS is basically a prerequisite to get good performance out of that resolution in Minecraft. And at least the frames rendered in Minecraft are true frames and not the DLSS 3 Frame Generation frames.God rays, real-time shadows, reflective water and more; Minecraft has never looked so good. (Image credit: Microsoft)Ray Tracing vs. Path Tracing That wraps up our performance test, but to understand why Minecraft RTX is so demanding, we need to briefly describe how it differs from other RTX enabled games. Nvidia says that Minecraft RTX uses ‘path tracing,’ similar to what it did with Quake II RTX, where most other games like Control and Cyberpunk 2077 only use ‘ray tracing.’ For anyone who fully understands the difference between ray tracing and path tracing, you probably just rolled your eyes hard. That’s because Nvidia has co-opted the terms to mean something new and different.In short, Nvidia’s ‘path tracing’ in Minecraft RTX just means doing more ray tracing calculations—bouncing more rays—to provide more effects and a higher quality result. Actually, Nvidia also now lists Portal RTX as the only game with “full RT” if you check out Nvidia’s list of RTX enabled games (opens in new tab) (which can mean games with DXR, DLSS, DLAA, or some combination of those).Path tracing as used in Hollywood films typically means casting a number of rays per pixel into a scene, finding an intersection with an object and determining the base color, then randomly bouncing more rays from that point in new directions. Repeat the process for the new rays until each ray reaches a maximum depth (number of bounces) or fails to intersect with anything, then accumulate the resulting colors and average them out to get a final result.That’s the simplified version, but the important bit is that it can take thousands of rays per pixel to get an accurate result. A random sampling of rays at each bounce quickly scales the total number of rays needed in an exponential fashion. You do however get a ‘fast’ result with only a few hundred rays per pixel—this early result is usually grainy and gets refined as additional rays are calculated.Ray tracing is similar except it doesn’t have the random sampling and integral sums of a bunch of rays. Where you might need tens of thousands of samples per pixel to get a good ‘final’ rendering result with path tracing, ray tracing focuses on calculating rays at each bounce toward other objects and light sources. It’s still complex, and often ray and path tracing are used as interchangeable terms for 3D rendering, but there are some technical differences and advantages to each approach.Without ray tracing, the earlier image now looks pretty dull by comparison.  (Image credit: Microsoft)Doing full real-time path tracing or ray tracing in a game isn’t practical yet, especially not with more complex game environments. Instead, games with ray tracing currently use a hybrid rendering approach. Most of the rendering work is still done via traditional rasterization, which our modern GPUs are very good at, and only certain additional effects get ray traced.Typical ray traced effects include reflections, shadows, ambient occlusion, global illumination, diffuse lighting, and caustics, and each effect requires additional rays. Most games with ray tracing support only use one or two effects, though a few might do three or four. Control uses ray tracing for reflections, contact shadows and diffused lighting, while Hogwarts Legacy can use ray tracing for reflections, shadows, and ambient occlusion.Compare that with Minecraft RTX, where you get ray tracing for reflections, shadows, global illumination, refraction, ambient occlusion, emissive lighting, atmospheric effects, and more. That’s a lot more rays, though it’s still not path tracing in the traditional sense. Unless you subscribe to the ‘more rays’ being synonymous with ‘path tracing’ mindset.(Side note: the SEUS PTGI tool is a different take on path tracing. It depends on Minecraft’s use of voxels to help speed up what would otherwise be complex calculations, and it can run on GTX and AMD hardware. There’s also the RTGI ReShade tool that uses screen space calculations and other clever tricks to approximate path tracing, but it lacks access to much of the data that would be required to do ‘proper’ path tracing.) (Image credit: Nvidia)Closing Thoughts There’s a lot to digest with Minecraft and ray tracing. The game in its ‘normal’ mode can run on everything from potatoes—along with smartphones and tablets—up through the beefiest of PCs. Cranking up the render distance in the past could cause a few oddities and put more of a load on the CPU, but extreme framerates aren’t really needed. Minecraft RTX significantly ups the ante in terms of the GPU hardware requirements.The lighting, reflections, and other graphical enhancements definitely make a big difference, both in performance as well as visuals. Minecraft has never looked so pretty! Nor has it looked so dark when you’re mining underground — bring lots of torches. The core survival and exploration gameplay hasn’t changed, of course, but makers who spend their time building intricate Minecraft worlds now have a host of new tools available.It’s now almost three years since Minecraft RTX first became available to the public, and a lot has changed in that time. Where formerly only RTX 20-series GPUs were available, we now have the RTX 30- and 40-series cards, AMD’s RX 6000- and 7000-series GPUs, and even the handful of Intel Arc offerings — which are still direly in need of a driver fix to make Minecraft remotely playable. As you can see from our testing, Nvidia’s GPUs still generally provide the best experience, and DLSS delivers another level of performance that currently can’t be matched. It’s too bad the game hasn’t been updated with FSR 2 or XeSS support. […]