Shantanu mukherjee biography of rory
Towards Geo-Culturally Grounded LLM Generations
 Piyawat Lertvittayakumjorn, David Kinney, 
Vinodkumar Prabhakaran, Donald Martin, Sunipa Dev
Google  Washington University in St. Louis 
{piyawat,vinodkpg,dxm,sunipadev}@google.com, kinney@wustl.edu
Abstract
Generative large language models (LLMs) have been demonstrated to have gaps in diverse, cultural knowledge across the globe. We investigate the effect of retrieval augmented generation and search-grounding techniques on the ability of LLMs to display familiarity with a diverse range of national cultures. Specifically, we compare the performance of standard LLMs, LLMs augmented with retrievals from a bespoke knowledge base (i.e., KB grounding), and LLMs augmented with retrievals from a web search (i.e., search grounding) on a series of cultural familiarity benchmarks. We find that search grounding significantly improves the LLM performance on multiple-choice benchmarks that test propositional knowledge (e.g., the norms, artifacts, and institutions of national cultures), while KB grounding’s effectiveness is limited by inadequate knowledge base coverage and a suboptimal retriever. However, search grounding also increases the risk of stereotypical judgments by language models, while failing to improve evaluators’ judgments of cultural familiarity in a human evaluation with adequate statistical power. These results highlight the distinction between propositional knowledge about a culture and open-ended cultural fluency when it comes to evaluating the cultural familiarity of generative LLMs.
Towards Geo-Culturally Grounded LLM Generations
Piyawat Lertvittayakumjorn, David Kinney,Vinodkumar Prabhakaran, Donald Martin, Sunipa DevGoogle Washington University in St. Louis{piyawat,vinodkpg,dxm,sunipadev}@google.com, kinney@wustl.edu
1 Introduction
Contemporary large language models (LLMs) are pre-trained on huge corpora of natural language text Radford et al  Two of Bollywood’s most beloved playback singers, Shaan and KK, are coming to the UAE tomorrow night for their first-ever joint performance. Let’s look at the stars and what the crowd can expect during their Salaam Dubai 2016 show at Dubai Tennis Stadium. Krishnakumar Kunnath, also known KK KK’s start in the industry could be considered as a bit of a slog – he sang more than 3,500 jingles before getting his first shot at playback singing in the 1996 romantic Tamil drama Kadhal Desam. The breakthrough KK made his first big impression the same year with Chhod Aaye Hum Woh Galiyan from the sleeper hit film Maachis – the track is still a radio mainstay in India. He released his debut solo album Pal in 1999. It did well, earning him a Star Screen Award for the best solo album, with three songs topping the Indian charts for months. Career highlights KK's first smash-hit playback song was Tadap tadap Ke iss Dil Se from the 1999 Salman Khan and Aishwarya Rai film Hum Dil De Chuke Sanam. It earned him his first Filmfare nomination for best playback singer. He has sung hundreds of songs in Hindi, Telugu, Bengali, Tamil, Kannada and Malayalam. He was most recently heard on screen with the song Tu Jo Mila in last year's Salman Khan film, Bajrani Bhaijan. The future KK is part of the soundtrack of upcoming Akshay Kumar drama Airlift, and due for release next Thursday. Partly shot in Sharjah and Ras Al Khaimah. the film is based on the true story of the mass evacuation of Indians from Kuwait in 1990 during the Iraq-Kuwait war. Shantanu Mukherjee, also known as Shann For Shann, vocal talent runs in the family. He is the grandson of lyricist Jahar Mukherjee and the son of the late music director Manas Mukherjee. His mother and sister were also renowned singers. He dabbled in singing from an early age, recording jingles and cover versions before, at the age of 17, he got his  Utkarsh Tiwari*Aryan Seth*Adi Mukherjee We introduce DebateBench, a novel dataset consisting of an extensive collection of transcripts and metadata from some of the world’s most prestigious competitive debates. The dataset consists of British Parliamentary debates from prestigious debating tournaments on diverse topics, annotated with detailed speech-level scores and house rankings sourced from official adjudication data. We curate 256 speeches across 32 debates with each debate being over 1 hour long with each input being an average of 32,000 tokens. Designed to capture long-context, large-scale reasoning tasks, DebateBench provides a benchmark for evaluating modern large language models (LLMs) on their ability to engage in argumentation, deliberation, and alignment with human experts. To do well on DebateBench, the LLMs must perform in-context learning to understand the rules and evaluation criteria of the debates, then analyze 8 seven minute long speeches and reason about the arguments presented by all speakers to give the final results. Our preliminary evaluation using GPT o1, GPT-4o, and Claude haiku, shows that LLMs struggle to perform well on DebateBench, highlighting the need to develop more sophisticated techniques for improving their performance. DebateBench: A Challenging Long Context Reasoning Benchmark For Large Language Models Utkarsh Tiwari*  Aryan Seth*  Adi MukherjeeKaavya MerKavishDhruv KumarBirla Institute of Technology and Science, Pilanif20212221, f20220052@pilani.bits-pilani.ac.in The reasoning capabilities of Large Language Models (LLMs) have been extensively evaluated across a variety of domains, including STEM problem-solving Cobbe et al. (2021); Clark et al. (2018); Arora et al. (2023)  .A deeper look at Bollywood’s KK and Shann before they team up in Dubai
DebateBench: A Challenging Long Context Reasoning Benchmark For Large Language Models
Kaavya MerKavishDhruv Kumar
Birla Institute of Technology and Science, Pilani 
f20212221, f20220052@pilani.bits-pilani.ac.in Abstract
1 Introduction