Something Big is Happening in AI [article]

7,232 Views | 109 Replies | Last: 1 day ago by jh0400
bagger05
How long do you want to ignore this user?
AG
https://tech.yahoo.com/ai/articles/something-big-happening-ai-most-142200510.html

This article has been making the rounds.

Worth reading, but if you don't want to:

- Author is a CEO of some AI company (not like OpenAI or Anthropic but a company who builds on top of those platforms)

- In recent months he's seen the AI tools quickly go from "helps me work faster" to "can do my job better than me"

- Argues that people in jobs like his that are so close to the epicenter are the first to feel the earthquake

- The stuff everyone has been talking about that has seemed like hype for so long (if it's done at a computer, AI will be able to do it) is now real


My very limited experience with this stuff makes me believe this article is on point. It used to be that the tools were basically a brain and a mouth. You could talk to it. Now these tools have hands. They can actually DO things (virtually). What they can do are mostly limited by the tools you are willing to give it access to.

Because these tools are now very good at writing software and executing commands on your computer, anyone who has a "wouldn't it be cool if…" thought can simply tell the AI what they want and the AI has the ability to build, test, and iterate.

I think it's both exciting and a bit scary.
Diggity
How long do you want to ignore this user?
AG
This guy is a known clown/fraudster hyping his own book.

I'm really not sure why everyone hopped on his nonsense post.

https://venturebeat.com/ai/reflection-70b-model-maker-breaks-silence-amid-fraud-accusations
bagger05
How long do you want to ignore this user?
AG
I don't know anything about the author. Doesn't surprise me that he's promoting something. Also doesn't surprise me if he's a tool.

Doesn't mean what he's saying is wrong.

I don't think his comparison to COVID is right. I don't think it will proliferate anywhere close to that quickly.

But I do think the article is pretty on point in a lot of ways.


I've had a personal productivity tool I've been wishing existed for the last 10 years. With about $40 worth of subscriptions I was able to build it myself in a weekend. I have some other ideas… "man it would be awesome if MyFitnessPal did ____…" and I'm confident that the tools that are available today can make it real.

At work, it's more like "man I wish my account managers would do _____…" and I can see how the tools can make that stuff happen, too.

Author might be a doofus but I think his points are very valid.
Diggity
How long do you want to ignore this user?
AG
Sounds like we agree for the most part.

Just sick of hearing about this guys dumb post like he just nailed his Ninety-five Theses to the digital wall or something.
AgCMT
How long do you want to ignore this user?
AG
I work in the tech space and have been utilizing AI in various ways over the past year or so. At first it was using AI to summarize meetings and to help draft emails. Now we have integrated AI into support and various other customer facing departments.

The holdup to go fully integrated is the security aspect of it. We are unable to put our own AI engine on our laptops or any end point that connects to our network. The safeguards of protecting IP are just not there yet, but they are not too far away. Or at least the comfort level of integrating AI has started to ease up.

I keep my ipad with Grok running in the background. It works as an assistant for me all day. It analyzes my sales calls, manages my calendar and meetings, but could do so much more if I were able to run it on my work laptop and tie it into our CRM and Office 365.

If you aren't educating yourself on AI use cases and functionality, you should. In my opinion, the OP's article is not inaccurate. I feel that the impact of AI will be on par with the Industrial Revolution. Especially if you take into account AI and robotics. I've seen people say "learn to weld," but do you think you can weld better than a robot?

If you are working in Graphic Design, Marketing, or Video Editing - you need to start learning a new career path. I can create a logo, marketing campaign and video in under 5 minutes on just about any AI platform. My 11 year old daughter just created a new logo for her new bracelet making business in about 30 seconds.

It's coming. The only limitations are going to be the resources to maintain it. I also would imagine that the lawyers will have a say in its ability to perform legal tasks...but you can drop any contract into ChatGPT now and it can provide an in depth summary and answer any questions you may have.
Milwaukees Best Light
How long do you want to ignore this user?
AG
Diggity said:

Sounds like we agree for the most part.

Just sick of hearing about this guys dumb post like he just nailed his Ninety-five Theses to the digital wall or something.

NJ Santarcangelo is smiling in heaven at your reference.
PeekingDuck
How long do you want to ignore this user?
AG
AgCMT said:

It's coming. The only limitations are going to be the resources to maintain it.

This is the most important part of the whole equation. There's already a load gap and I'm not exactly sure how we solve it in the near term.
bagger05
How long do you want to ignore this user?
AG
Security is going to be one big hurdle. The other is data.

The AI tools can be really helpful if you can give it access to data. And the better quality of the data the better.

AI can help with the data piece of it. If you have all your information in a dozen different spreadsheets that aren't super clean, the AI can do a pretty decent job of reading and interpreting each of them and getting them into some form that they can make useful. Reading a bunch of notes collected on napkins and scrap paper in a shoebox? Doable but more complicated.

The days of needing all of your data to be in a sophisticated ERP or CRM are going to be over soon. You can let people work in the ways that are easiest for them and the computers can handle getting the inputs clarified and organized properly.

This is basically what ERPs and CRMs do now. They give you a more user friendly interface than entering things directly into a computer-friendly database. The level up with AI is that it's going to lower the learning curve from "you have to learn how to input into the CRM and be disciplined to do it consistently" to "just remember to talk to it after your calls" and that will quickly evolve into "just go about your business and it'll watch you and take care of capturing all the data and you don't have to do anything."

Like you said, the limit to doing this right now isn't the AI capability; it's more about data security. But honestly in pretty short order you could get all the functionality of Salesforce out of a good AI interface and the basic Microsoft suite.

Exciting times.
Diggity
How long do you want to ignore this user?
AG
I was pretty proud of that.
bagger05
How long do you want to ignore this user?
AG
PeekingDuck said:

AgCMT said:

It's coming. The only limitations are going to be the resources to maintain it.

This is the most important part of the whole equation. There's already a load gap and I'm not exactly sure how we solve it in the near term.

Agreed on this as well. At some point the computer science problems go away and we are left with mechanical engineering and thermodynamics problems. If everyone in the world was using these tools then where's the energy going to come from?

This might be the brake that slows all of this down.
YouBet
How long do you want to ignore this user?
AG
bagger05 said:

PeekingDuck said:

AgCMT said:

It's coming. The only limitations are going to be the resources to maintain it.

This is the most important part of the whole equation. There's already a load gap and I'm not exactly sure how we solve it in the near term.

Agreed on this as well. At some point the computer science problems go away and we are left with mechanical engineering and thermodynamics problems. If everyone in the world was using these tools then where's the energy going to come from?

This might be the brake that slows all of this down.


Posted this on one of the other numerous running AI threads but this is definitely the constraint.

Elon said as much a few weeks ago at Davos that later this year we will have chips sitting on shelves because there is nothing to plug them into. The physical data center rollouts can't keep up with the advancement in AI software, and then the energy grid won't be able to power all of the data centers needed to run the software.

There are only a handful of states in the US that can bring these data centers online without collapsing the grid and people are now fighting against data centers on top of all of this.

Europe has mostly thrown its lot in with the green energy myth so they can't really handle these data centers either. France could likely do it considering the majority of their power generation is nuclear, but that may be about it.
LMCane
How long do you want to ignore this user?
I still don't see how in the near term even a Claude or CHATGPT 6.0 will be able to:

find commercial invoices from the parent company in Israel. understand the transaction. understand whether to use a license exception or license exemption or DSP-5 or DSP-73.

go into the US department of state automated system (which requires identrust certificate) and create a license, then review the license, then submit the license (when all submitters must be a qualified Empowered Official under ITAR 120.67)

then arrange with the freight forwarder the shipments, then discuss with headquarters changes in the transmittal letter.

there are so many moving parts to every single day of imports, usage of DPAS authority for manufactured parts, decisions about where and how to ship-

how can AI do that now?
bagger05
How long do you want to ignore this user?
AG
This is actually a pretty good use case to demonstrate what the tools are capable of right now (and highlight a few areas where maybe you would still want a person to do the work).

I just got back from the gym but I'll write up how I'd tackle this (as someone who is a rank amateur).
Diggity
How long do you want to ignore this user?
AG
this is one example of why I remain skeptical of this latest groupthink exercise that AI is eating every industry (in 12-18 months no less).

Of course there's going to be disruption by these AI tools for a ton of industries and jobs. This game where analysts hop from sector to sector to decide which industry is now obsolete is just silly though. Let's see some of these things actually happen before we get too excited.

I need more than "i vibe coded a super sweet RPG this weekend"



Trucking industry stocks were shellacked on Thursday, with shares of freight companies like CH Robinson (down 15%) and Expeditors International (down 13%) getting smoked. The news? An interesting technology development out of a company better known for karaoke products.
  • A white paper published by Algorhythm Holdings a company that previously produced consumer karaoke products and also owns 80% of AI logistics company SemiCab said that its SemiCab AI platform lets customers scale freight volumes by 300% to 400%.
  • Fears that AI will disrupt the freight forwarding and brokerage industry appear to be driving the sell-off, while Algorhythm's stock shot up 30% on the news.
  • Other victims of the sell-off included JB Hunt (down 5%) and Old Dominion Freight (down 5%).
  • The market reaction mirrors last week's AI-led sell-off in software stocks prompted by some impressive performance out of Claude Cowork, and the similar recent sell-off seen in gaming companies following Google's launch of its Project Genie AI tool.
Charismatic Megafauna
How long do you want to ignore this user?
AG
My experience so far is that these things tend to be wrong a lot, like even with basic summarization of data they're fed, and they also make a lot of stuff up if they don't have the answer in 8 milliseconds. They are also training of the same Internet that they are feeding misinformation into at an alarming rate (unintentionally and intentionally) so it's going to be quite a while before LLMs can be trusted 100% with high consequence decision making.
LMCane the first part of your problem is actually a pretty good application of LLMs. 8 or so years ago companies like thoughttrace were marketing their "ai" solutions pretty heavy to look for specific clauses in stacks of contracts and they were really bad at it. Now they are pretty darn good, and can do in seconds what would take a team of 10 people a couple weeks. Now if 100% accuracy is required you get the model to pull the clause from each contract for you to verify, so now you still need an experienced professional but probably just one for a couple days (instead of 10 for 2 weeks). That's the terrifying part for me, the moderately skilled technical work that pays pretty well (i.e. lot of middle class jobs) being cut by 90-95% as entire teams are replaced by one person and a LLM, and one or two LLM administrators supporting the entire company (or lots of companies if the a-holes have it their way and saas it all)
I Am A Critic
How long do you want to ignore this user?
All that because of a "white paper"?
Username checks out.
rononeill
How long do you want to ignore this user?
Milwaukees Best Light said:

Diggity said:

Sounds like we agree for the most part.

Just sick of hearing about this guys dumb post like he just nailed his Ninety-five Theses to the digital wall or something.

NJ Santarcangelo is smiling in heaven at your reference

RIP AMDG
Diggity
How long do you want to ignore this user?
AG
it doesn't take much right now. Investors are quite skittish.

LMCane
How long do you want to ignore this user?
pretty sure this is why CRM has been getting destroyed in the market the last two weeks

seems they are worried that ERP/software companies will get demolished by Anthropic bots that can do everything much cheaper
LMCane
How long do you want to ignore this user?
isn't that the secret to PALANTIR's success?

amalgamating vast amounts of unorganized data and pulling it all together faster than any team of humans possibly can?
Aggie71013
How long do you want to ignore this user?
AG
Except the success of Palantir still relies on a large number of humans. Sales pitch is better than the product.
bagger05
How long do you want to ignore this user?
AG
LMCane said:

I still don't see how in the near term even a Claude or CHATGPT 6.0 will be able to:

find commercial invoices from the parent company in Israel. understand the transaction. understand whether to use a license exception or license exemption or DSP-5 or DSP-73.

go into the US department of state automated system (which requires identrust certificate) and create a license, then review the license, then submit the license (when all submitters must be a qualified Empowered Official under ITAR 120.67)

then arrange with the freight forwarder the shipments, then discuss with headquarters changes in the transmittal letter.

there are so many moving parts to every single day of imports, usage of DPAS authority for manufactured parts, decisions about where and how to ship-

how can AI do that now?

Long post, but want to be thorough. The reason I say this is a pretty good use case is because you have a decent process built already.

Something that has helped my thinking is not really thinking of AI in terms of "Claude or ChatGPT 6.0" but a team of specialists, each of which can be trained and equipped based on what you want it to do. This is a sketch of how I think this could be done today (obviously I don't know your business so some of this is probably wrong but I think it will give you the idea).

The first specialist we will call israel_bot.
- It's only job is to look at a group of commercial invoices, pick out the ones from Israel, and then hand the Israeli invoices to another specialist.
- In order to do this, we will equip israel_bot with a playbook. This playbook tells it how it would distinguish an invoice from Israel from other invoices. So imagine if it was a person and you were creating a binder that had detailed instructions for how you do this.
- That binder you made for a person might include info like "if you have trouble with deciding whether it's from Israel or Jordan, reference this documentation in the appendix (or guide from the internet or whatever)." You can provide israel_bot with these types of instructions that it would only reference if it needs them just like a person.
- The last part of Israel_bot's job is to hand the invoice to the next specialist, license_bot

license_bot
- This guy's only job is to determine whether this particular invoice requires an exception, exemption, DSP-5, or DSP-73. And then pass on the document to the next specialist once it's reached its conclusion.
- Just like with israel_bot, we create a playbook that instructs license_bot how to decide.
- And just like iIsrael_bot, this playbook can include additional context it might need to access in some situations but not others.
- The last step is for license_bot to indicate what category it belongs to and hand it to the next specialist.

prep_bot
- It sounds like this submission to department of state is something we wouldn't trust to an AI to do because it's either unsafe or illegal or both. So instead of having an AI that would do that work, we will just have a bot that will set the table for the human to do this work.
- prep_bot's job is to make it as easy as possible for the human to do the work in the DoS system. I don't know what that entails but I'll guess.
- prep_bot generates a document that has all of the information that is going to be needed to do the work in the DoS system.
- Based on the work that license_bot did, prep_bot knows what the license decision is and I'm going to guess that means it knows what forms are going to need to be filled in the DoS system, the data that needs to be put into those fields, phone number of the DoS help desk, links to supporting documetnation that might be needed, etc.
- Similar to the other bots, you just write prep_bot's playbook to get it to serve up all of the information that the human might need.

prep_bot_2
- Whatever the outputs are from that work in the DoS system, the human can give those to prep_bot_2.
- Sounds like the conversations with the freight forwarders and HQ also need to be handled by a human. So prep_bot_2 is going to do something similar as prep_bot.
- prep_bot_2 looks at the original invoice, the DoS paperwork, and any other helpful documentation that would equip it to create a document that would have all the info they would need to have productive conversations.


Nothing I described above requires "hard plumbing" of one data system to another. The capabilities available today with off-the-shelf tools like Claude Code can fetch data from any system that is connected to your computer. Whether you should trust it to do that is a different question. Security on this stuff is still very iffy.


FUTURE IMPROVEMENTS:
- You're always going to be limited by what the tools can do and what you TRUST them to do. Both of these things are going to be moving targets.
- Maybe right now you don't trust prep_bot_2 to do anything other than get a human ready to make phone calls. But someday in the future maybe you'd trust it to draft up RFQ emails the person could send out quickly. Or maybe at some point you trust it to connect to your vendor management system, analyze recent activities from the freight forwarders, and send out the RFQs automatically.
- Someday way in the future maybe the systems are trustworthy enough to actually execute orders automatically.
- Even stuff like that "automatically execute orders" are capabilities that exist today through traditional automation. This is happening in the background every time you order something on Amazon. The difference with these systems is that you don't have to necessarily directly connect one data system to the other. Just like a purchasing manager is filling this gap for my company today, an AI specialist with the right playbook could potentially do it very soon.
Diggity
How long do you want to ignore this user?
AG
my gripe with the tools I'm allowed to use at my company is that the data is complete BS in a significant amount of cases. I keep hearing that this is all solved with new versions, and maybe that's true...but I don't have access to those yet.

I can do a simple web scrape of target groups, and go as far as giving the actual landing page to the LLM. Most of the time, I get somewhat accurate results on 75-90% of the group, but occasionally it will just make up a list of targets and their affiliations (the dreaded hallucinations).

I have no way of knowing this has happened unless I manually go in and check each target. I have to ask the LLM to do the search again, and tell it that it ****ed everything up...and then it gives me correct results (mostly).

I would hope that at some point this will improve, but until it can handle a relatively easy exercise like this, how would I trust these LLM's to do any heavy lifting/analysis for me?

bagger05
How long do you want to ignore this user?
AG
LMCane said:

isn't that the secret to PALANTIR's success?

amalgamating vast amounts of unorganized data and pulling it all together faster than any team of humans possibly can?

Fundamentally I think so, but my understanding is that this is pretty different than anything I'm talking about.

My wife worked for an AI Data company similar to Palantir, and their focus was on HUGE companies that enormous amounts of data across a ton of assets. Think of power and utility companies, chemical manufacturing, oil refineries, stuff like that.

She left 18 months ago so things could be very different now. But at the time, the big challenge for those types of companies was actually getting the data. There was a pretty big gap between what executives (who were really excited about AI data characterization) thought was available and what actually was in real life. Silly example, but they didn't realize that the information about what the pressure was in a storage tank got into the system by a guy taking a picture of it with his iphone then texting that to someone in the office who would write it on a piece of paper and fax it to headquarters.

Also at the time, the standard for the quality of the data was pretty high. Basically they could pull data out of relatively organized systems like ERPs, so before they could do any real work they had to get the data into some kind of a system their AI data characterization software could work with. I expect this is probably pretty different now and not as hard as it used to be.

But even at a single site as big and complex as an oil refinery, answering the question "what data do we have and where is it?" isn't a simple question to answer. It requires a good deal of coordination among human beings which is always difficult. Especially with executive sponsors who have wacky expectations.


Now at a company like mine with a couple dozen people, figuring out what data we have and where it lives is much more straightforward. And what I would even want to get out of that data is a much simpler use case.

I'm not looking for complex analysis. More like "look through my CRM and my emails from the last 30 days, examine what deals I have open, examine each account's activity this year compared to last year, make a recommendation about who I should call first and give me talking points for that conversation."

Six months ago, my options were to contact my CRM provider and get them to help me implement whatever features they have that will do that (thousands of dollars and months of hassle and ass-pain) or find someone good at Microsoft Power Automate or some other tool like it to build this out.

Today it's much closer to "tell the AI what you want." Again, limiting factor today is marrying these awesome capabilities with security concerns.


Sorry for the long posts, but this stuff is all really interesting to me.
bagger05
How long do you want to ignore this user?
AG
Diggity said:

my gripe with the tools I'm allowed to use at my company is that the data is complete BS in a significant amount of cases. I keep hearing that this is all solved with new versions, and maybe that's true...but I don't have access to those yet.

I can do a simple web scrape of target groups, and go as far as giving the actual landing page to the LLM. Most of the time, I get somewhat accurate results on 75-90% of the group, but occasionally it will just make up a list of targets and their affiliations (the dreaded hallucinations).

I have no way of knowing this has happened unless I manually go in and check each target. I have to ask the LLM to do the search again, and tell it that it ****ed everything up...and then it gives me correct results (mostly).

I would hope that at some point this will improve, but until it can handle a relatively easy exercise like this, how would I trust these LLM's to do any heavy lifting/analysis for me?



I think the future state is that you wouldn't be entrusting this type of work to an LLM. The LLM would basically build you a custom software application that would be designed to accomplish that specific task. And as part of this process, it would also test itself. So it would click on every single link and if one didn't work, it would go back on its own and fix the software to prevent it from giving you bad links.
Diggity
How long do you want to ignore this user?
AG
it would seem reasonable that the LLM's in place now could/should do that kind of error checking now. There is some analysis needed for the exercise as every group formats the needed info in a different manner, so I don't think it's as simple as just telling python to scrape the site.

My (perhaps naive) hope was that these modern LLM's were capable of this.

Again, I'm super skeptical when I watch these demos of LLM agent's pulling all sorts of external data to validate excel models (in an autonomous fashion) when I can't get the same breed of LLM's to do hyper specific tasks.

I'm no programmer, so maybe I'm downplaying the complexity of this dumb project, but color me unimpressed so far.

Here's another example of the masterful work I get from our LLM.

I gave it a PE group that we're talking to and asked it to map their current portfolio. They ran the locations through GIS and decided to show me the locations on an XY Graph



Thanks!
bagger05
How long do you want to ignore this user?
AG
Quote:

Again, I'm super skeptical when I watch these demos of LLM agent's pulling all sorts of external data to validate excel models (in an autonomous fashion) when I can't get the same breed of LLM's to do hyper specific tasks.

I think the difference comes down to a couple things.

When you're talking to an LLM, it's going to look for answers in basically the entire internet worth of information that it was trained on. So even when you give it a hyper specific task, it's got a ton of places to go look for the answer. This isn't always helpful.

Imagine you've got a person working for you that is extremely intelligent but completely inexperienced. You tell them "I need you to find me all of the companies in Dallas who have pulled more than one building permit in the last 12 months" and you drop them off at Dallas City Hall. Lots of ways that intelligent person with no experience could mess that up.

When you're talking about something that's an agent, it's like you take that same extremely intelligent person with no experience, but instead of dropping them off at City Hall you put them in the proper records office with a step-by-step set of instructions.

Not that long ago everyone was talking about prompt engineering all the time, this was what they were trying to accomplish with basic LLMs. A really good prompt in ChatGPT could make sure that your guy knows not to go looking for building permits in the DMV.

These agent demos you're watching are combining really good prompt engineering (via these "playbooks" I was talking about earlier) and the ability to actually DO stuff on your computer via the command terminal. So in that demo you're referring to, under the hood was a pretty specific set of instructions.
TXTransplant
How long do you want to ignore this user?
Your point about the refinery and where is the data is a very good one.

I work in the chemicals industry, and there almost always a gap between what any model predicts and what actually occurs in real life.

A simple example would be a chemical reactor making a product. In order to determine if that product is "on-spec", you have to physically collect a sample and take that sample to lab to analyze it on a piece of equipment (likely doing a fair amount of sample prep) that operates on a totally different computer system from the plant. That's not to say you can't have online analyzers, but that's really not a fit for a lot of applications.

Then someone/something has to take the results from that lab analysis, compare the results to what's expected, and if it's different, go back to the plant and figure out why (which is not always so obvious).

You can train a model to use any number of inputs to predict what the output of that reactor should be, and as long as that prediction matches the output, it's great. But you cannot rely on it always matching, no matter what the model says. In the end, what really matters are the physical attributes of the actual product, which AI (at least at this time) cannot certainly determine or verify.

As a side note, this industry as a whole is NOT technologically advanced. Not saying this couldn't change, but the industry as a whole lags way behind when it comes to keeping up with technology.

And as you mentioned, there are IP issues. AI works best with more data, but no company wants to share their data, because then it will no longer be their data. So, you're working only with the data your entity generates.

Heck, I even work in a field (patents) where AI should be helpful. But because of the advanced technical nature of the subject matter, it's really not all that helpful. For example, I don't have access to an AI tool that can read and identify chemical structures. I do have access to a search engine that can do this, but it's not integrated with Copilot, ChatGPT, etc (not to say it couldn't be, it's just not). A lot of the information I need access to is at various patent offices - as far as I know, there is no common AI or search tool that aggregates all of that information (in part because many offices won't let you view anything without logging in via an account). I do have access to a database (that we pay for) that does some of this patent office searching, but it also has it's limitations. On any given day, I might search 4-5 unique/separate databases of information (internal and external to my company) to get what I need to do my job.

This just goes back to your point about "where is the data", and at least in my field, it's literally all over the place.
Diggity
How long do you want to ignore this user?
AG
right, but even when I give it the exact landing page for the target group and tell it to scrape that page, I still get weird results. That just blows my mind.

bagger05
How long do you want to ignore this user?
AG
Diggity said:

right, but even when I give it the exact landing page for the target group and tell it to scrape that page, I still get weird results. That just blows my mind.



We had a computer science AI guy that presented to my CEO group and he walked us through how LLMs are designed to work, and it lined up with a lot of my experience (including stuff like what you're describing).

Note: I'm regurgitating what I remember; I'm not a computer science guy so some of this might be a bit off. I pasted what's below into one of these LLMs I'm telling you not to trust and it said it's basically accurate, so take that for what it's worth lol.


Broadly speaking, when you talk to an LLM, it generates a list of things it could say back to you that seem reasonable based on its training, then it looks at the top X% of options, and then it picks a random answer from that top tier of possibilities.

So if you say "Hello", based on its training and the context it has, it thinks "what would make sense for me to say back?" and it generates a list. The top tier includes things like good afternoon, how's it going, nice to see you, etc. And then it picks randomly.

This kinda mimics the way humans talk to people. You're not programmed with if/then logic that says "if someone says hello, you say good morning." This is why LLMs can appear to be creative. If you ask it the same question multiple times, it will give you different answers. Most of the time it will give you good outputs because it's designed to generate outputs that sound right. But overall they're not really designed to give you a "right answer."

I think that nowadays there are certain things that will trigger an LLM to think more along the lines of "find the right answer" - this is what "thinking models" are designed to do via testing outputs before they serve them to you. We didn't get into that in this presentation.

For what you're describing, presumably if you asked an LLM to scrape the webpage 10 times you'd want the same output every time. Problem is that the LLM is basically designed to give you different answers the same way it's programmed to sometimes say "hello" and other times say "how's it going."

What IS very good at generating the same result every time is software. Code IS basically built on if/then logic that will give you the same output 10 times in a row.

And just like with a conversation and there being multiple ways to greet you, there are multiple ways to write code that still generate the same output. So an LLM can be pretty good at writing code. If you asked it to write code 10 times there would be variations between those 10 codes, the result all 10 of those codes generated would produce the same results (just taking different paths to get there).

Another advantage when it comes to writing code is that coding languages are much smaller than the English language. Maybe there's 1000 ways to say "hi" in English, but only 10 ways to say "open this directory" in whatever coding language it's using. And in many cases, the training when it comes to code generates extremely high probabilities. So maybe "hi" is the highest probability greeting at 80%, but for a certain aspect of code there's a command that's a 99.999% probability. So that's another reason the LLMs can be better at generating consistent results with code than they are with just talking to you.
Charismatic Megafauna
How long do you want to ignore this user?
AG
bagger05 said:

the same way it's programmed to sometimes say "hello" and other times say "how's it going."


This is another thing that annoys me about LLMs, a couple months ago I read an article about how much bandwidth and energy were being consumed by people replying "thank you" to these dumb things, now they're burning compute on choosing a greeting. (Or starting responses with "of course" or "yes, absolutely") Of course the answer is probably sales/to encourage adoption
Diggity
How long do you want to ignore this user?
AG
gotta cover your bases

Charismatic Megafauna
How long do you want to ignore this user?
AG
I'm a goner. The best they get from me is "thanks for nothing, I'll figure it out"
fulshearAg96
How long do you want to ignore this user?
AG
Large language models running inside corporations differ from generalpopulation models. They have to prioritize data privacy, security, compliance, and data quality, whereas public models are trained on a much broader, less validated data set like the open internet. From a consumer perspective, that helps explain the lack of confidence in free tools like ChatGPT, even though they do work very well with repetition and good prompts.

The author is accurate that with data science and data engineering agents there is a lot less coding and manual plumbing required. More of the value is in reviewing, editing, and steering these systems using domain knowledge. That shift is evident.

Man I wish I owned a plumbing company.






northeastag
How long do you want to ignore this user?
AG
Watching the golf course behind my house getting completely reconstructed. Nothing but big mountains of dirt being shoved around.

And getting ready to remodel the guest bathroom next week. Gutting and re-installation of everything.

I get it that AI can whip out a marketing presentation lickity split, but it's just hard to imagine it pulling this other stuff off.
Last Page
Page 1 of 4
 
×
subscribe Verify your student status
See Subscription Benefits
Trial only available to users who have never subscribed or participated in a previous trial.